US20080222092A1 - Automatically determining optimization frequencies of queries with parameter markers - Google Patents

Automatically determining optimization frequencies of queries with parameter markers Download PDF

Info

Publication number
US20080222092A1
US20080222092A1 US12/125,146 US12514608A US2008222092A1 US 20080222092 A1 US20080222092 A1 US 20080222092A1 US 12514608 A US12514608 A US 12514608A US 2008222092 A1 US2008222092 A1 US 2008222092A1
Authority
US
United States
Prior art keywords
query
execution
bind
bind value
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/125,146
Other versions
US7987178B2 (en
Inventor
Fabian Hueske
Volker Gerhard Markl
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US12/125,146 priority Critical patent/US7987178B2/en
Publication of US20080222092A1 publication Critical patent/US20080222092A1/en
Application granted granted Critical
Publication of US7987178B2 publication Critical patent/US7987178B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation

Definitions

  • the present invention relates to a method and system for automatically determining optimization frequencies of queries with parameter markers.
  • a programmer or a database administrator (DBA) managing a database system manually selects a reoptimization mode for queries having parameter markers.
  • This selection of an optimal reoptimization mode depends not only on the query itself, but also on the bind values of the query's parameter marker(s). As these bind values can be unknown or change over time, an attempt to select an optimal reoptimization mode is a non-trivial procedure that may result in selecting a sub-optimal reoptimization mode.
  • This burdensome manual process of selecting a reoptimization mode decreases the productivity of the programmer or DBA and increases the total cost of ownership of the database system. Further, a selection of a sub-optimal reoptimization mode slows down a program or the entire database system.
  • the present invention provides a computer-implemented method of automatically determining an optimization frequency of a query having one or more parameter markers, the method comprising:
  • each query execution plan associated with one or more bind value sets of a plurality of bind values sets;
  • each pair of execution costs including a first execution cost and a second execution cost, the first execution cost being a cost of executing the query with a bind value set of the plurality of bind value sets via a first query execution plan of the plurality of query execution plans and the second execution cost being a cost of optimally executing the query with the bind value set via a second query execution plan of the plurality of query execution plans;
  • optimization frequency is optimizing the query once as a result of a first determination by the determining that no difference of the plurality of differences exceeds the predefined threshold value
  • optimization frequency is reoptimizing the query each time the query is executed as a result of a second determination by the determining that at least one difference of the plurality of differences exceeds the predefined threshold value.
  • the present invention provides a computer-implemented method of determining an optimization frequency of a query having one or more parameter markers, the method comprising:
  • each bind value set including one or more bind values and associated with one or more parameter markers of a query
  • each measurement set selected from the group consisting of one or more selectivity measurements and one or more cardinality measurements;
  • each query execution plan capable of optimally executing the query with one or more bind value sets of the plurality of bind value sets
  • each distance d 1 is a maximum distance between any pair of measurement sets associated with query execution plans P i and Q i of the plurality of query execution plans
  • the query execution plan P i is an optimal query execution plan associated with the bind value set (p 1 , . . . , pn) i
  • the query execution plan Q i is an optimal query execution plan associated with the bind value set (p 1 , . . . , pn) i , and wherein the i ⁇ 1;
  • the D 1 i is a difference between the cost C 1 i and an optimal execution cost OC 1 i of the first set of execution costs and the D 2 i is a difference between the cost C 2 i and an optimal execution cost OC 2 i of the first set of execution costs
  • the OC 1 i is a cost of optimally executing the query via the query execution plan Q i with bind value set (q 1 , . . . , qn) i
  • the OC 2 i is a cost of optimally executing the query via the query execution plan P i with bind value set (p 1 , . . . , pn) i ;
  • an optimization frequency wherein the optimization frequency is selected from the group consisting of optimizing the query once and reoptimizing the query each time the query is executed;
  • optimization frequency is the optimizing the query once as a result of a first determination, via the determining the one or more pairs of differences, that no difference of the one or more pairs of differences exceeds a predefined threshold value
  • optimization frequency is the reoptimizing the query each time the query is executed as a result of a second determination, via the determining the one or more pairs of differences, that at least one difference of the one or more pairs of differences exceeds the predefined threshold value.
  • the present invention provides a technique for using selectivity or cardinality information to automatically determine the reoptimization mode of a query having parameter markers, thereby improving the productivity of DBAs and reducing the total cost of ownership of databases.
  • FIG. 1 is a block diagram of a system for automatically determining optimization frequencies of queries with parameter markers, in accordance with embodiments of the present invention.
  • FIG. 2 is a flow diagram of a process of automatically determining optimization frequencies of queries with parameter markers in the system of FIG. 1 , in accordance with embodiments of the present invention.
  • FIG. 3A is a flow diagram of a process of collecting bind values of parameter markers in the process of FIG. 2 , in accordance with embodiments of the present invention.
  • FIG. 3B is an example of a binary dump file converted into a human readable format from which query information and bind values are extracted in the process of FIG. 3A , in accordance with embodiments of the present invention.
  • FIG. 3C are examples of code for creating database tables that store the query information and bind values in the process of FIG. 3A , in accordance with embodiments of the present invention.
  • FIG. 3D is an entity-relationship model of the database tables that store the query information and bind values in the process of FIG. 3A , in accordance with embodiments of the present invention.
  • FIG. 4A is a process of analyzing a query execution plan space within the process of FIG. 2 , in accordance with embodiments of the present invention.
  • FIG. 4B is an example of a table from which query information and bind value sets are requested and obtained in the process of FIG. 4A , in accordance with embodiments of the present invention.
  • FIG. 4C is an example of a query in which parameter markers are replaced by bind values, where the query's text is obtained from the table in FIG. 4B , in accordance with embodiments of the present invention.
  • FIG. 4D is an example of an Explain query generated from the query in FIG. 4C , in accordance with embodiments of the present invention.
  • FIG. 4E is an exemplary plan space diagram illustrating a query execution plan space being analyzed by the process of FIG. 4A , in accordance with embodiments of the present invention.
  • FIG. 4F is an exemplary plan space diagram illustrating a sorting out of bind value sets for determining a cost difference in the process of FIG. 4A , in accordance with embodiments of the present invention.
  • FIG. 4G is an exemplary plan space diagram illustrating a determination of pairs of bind value sets having a maximum distance in selectivity for determining a cost difference in the process of FIG. 4A , in accordance with embodiments of the present invention.
  • FIG. 5 is a computing system that includes components of the system of FIG. 1 and implements the processes of FIGS. 2 , 3 A and 4 A, in accordance with embodiments of the present invention.
  • the present invention provides a technique for using selectivity information to automatically determine an optimization frequency (a.k.a. reoptimization mode) of a query having one or more parameter markers.
  • the selectivity information is obtained for multiple instances of parameter marker values either through query feedback, the system catalog, or by drawing random samples.
  • the present invention determines whether a query execution plan is sensitive to the selectivity of the parameter marker values.
  • the technique disclosed herein creates a graph of the selectivity space and associates each parameter marker bind value set in this space with a query execution plan. By taking the largest distances in the selectivity space, the technique disclosed herein determines whether one query execution plan is sufficient to cover the selectivity space, or whether multiple query execution plans are required. Further, the technique provides a recommendation to either optimize the query once or to reoptimize the query for every set of parameter marker bind values.
  • Parameter marker a placeholder for a variable within a query. To provide parameter markers with values, variables are bound to the parameter markers. A bind value replaces a parameter marker at query execution time. A bind value of a parameter marker is known only at query execution time.
  • Plan space diagram a diagram that shows the location of all different query execution plans for a query having two parameter markers in the space defined by the selectivities of the bind values of the parameter markers.
  • Frequency diagram a bar chart showing the frequency distribution of the selectivities of the bind values of a parameter marker (e.g., one axis of the chart plots selectivity and the other axis plots frequency in percent).
  • Selectivity A ratio or percentage of data sets that are sorted out by a predicate of a database query.
  • a predicate is a condition in a WHERE clause applied to a database table and a selectivity indicates the percentage of rows of the table that satisfy the condition.
  • a default selectivity assumes a uniform data distribution in an affected database column and is defined as:
  • Reoptimization mode Defines when and how often a database optimizer is called to select a query execution plan for a query with one or more parameter markers. Reoptimization modes are also referred to as optimization frequencies and include ReOpt None, ReOpt Once and ReOpt Always.
  • ReOpt None A reoptimization mode in which queries are optimized once before their first execution during a query preparation process. Since parameter marker bind values are unknown at optimization time and known only at query execution time, the optimizer cannot estimate their selectivities using statistical information. Instead, the optimizer determines a default selectivity for each parameter marker predicate. Using the default selectivities, the optimizer selects a query execution plan that is cached and used for all executions of the query. ReOpt None results in an optimal query execution plan only if the data is uniformly distributed in all affected database columns. DB2® uses ReOpt None as the default reoptimization mode.
  • ReOpt Once: A reoptimization mode in which a query is optimized only once, before the query's first execution and using the query's first bind value set to estimate the selectivities of the parameter marker predicates.
  • the optimizer chooses a query execution plan based on execution cost estimations for the estimated selectivities.
  • the resulting query execution plan is cached and used for all executions of the query.
  • ReOpt Once provides a savings in optimization costs, but causes high execution costs if the cached query execution plan is suboptimal for a set of bind values.
  • ReOpt Once is efficient only if the selectivities of the parameter marker predicates for all subsequent bind value sets do not differ significantly from the aforementioned selectivities determined for the first bind value set.
  • ReOpt always: A reoptimization mode in which a query is reoptimized before every execution of the query, each time using a current bind value set (i.e., one or more bind values associated with the current query execution) to estimate current selectivities of the parameter marker predicates. For each reoptimization, the optimizer selects the query execution plan that is optimal for the current bind value set based on the optimizer's knowledge of the data distribution according to available database statistics. ReOpt Always is expensive in terms of optimization costs.
  • a current bind value set i.e., one or more bind values associated with the current query execution
  • FIG. 1 is a block diagram of a system for automatically determining optimization frequencies of queries with parameter markers, in accordance with embodiments of the present invention.
  • System 100 includes an operational database 102 , a binary dump file 104 , and extract & transform software tool 106 and parameter marker (PM) tables 108 .
  • Operational database 102 dumps query execution information in binary dump file 104 .
  • the query execution information is associated with a query that has one or more parameter markers.
  • Extract & transform tool 106 extracts query text, parameter marker values and query execution time information from binary dump file 104 .
  • the extracted data is stored in tables 108 .
  • the query text, parameter marker values and query execution time information are collected from a source other than binary dump file 104 and are stored in a structure (e.g., plain files) other than PM tables 108 .
  • System 100 also includes a software-based query execution plan space analyzer 110 , database explain tables 112 , optional frequency and plan space diagrams 114 and reoptimization advice 116 .
  • a query execution plan space analyzer is also referred to simply as a plan space analyzer.
  • Plan space analyzer 110 combines parameter marker data from tables 108 with explain information from explain tables 112 , optionally generates frequency and/or plan space diagrams, and determines a recommendation 116 for a reoptimization mode (i.e., an optimization frequency).
  • the recommended reoptimization mode is stored in a computer-usable or computer-readable medium (not shown), transmitted to a user (not shown) of system 100 or displayed onscreen or in a hard copy format. The process for determining a recommended reoptimization mode is described below relative to FIGS. 2 , 3 A and 4 A.
  • FIG. 2 is a flow diagram of a process of automatically determining optimization frequencies of queries with parameter markers in the system of FIG. 1 , in accordance with embodiments of the present invention.
  • the optimization frequency determination process of FIG. 2 begins at step 200 .
  • a software tool e.g., extract & transform tool 106 in FIG. 1
  • the bind values are written into binary dump file 104 (see FIG. 1 ) (e.g., a RDSMon dump file provided by a DB2® feedback warehouse architecture) and the bind values are subsequently extracted from the binary dump file in step 202 by extract & transform tool 106 (see FIG. 1 ).
  • a software tool that replaces extract & transform tool 106 in FIG. 1 uses single column statistics (e.g., information about frequent values and quantiles of a table's column) from system catalog tables to collect the bind values in step 202 .
  • single column database statistics e.g., information about frequent values and quantiles of a table's column
  • Using single column database statistics in step 202 advantageously allows bind values to be collected according to their selectivities, thereby permitting the bind values to be used to analyze the whole query execution plan space, as described below relative to FIG. 4A .
  • a query execution plan space is also referred to simply as a plan space.
  • database statistics in step 202 makes it possible to collect bind values for queries that never executed.
  • step 202 using single column statistics in step 202 requires up-to-date statistics and refreshing such statistics is expensive. Further, collecting bind values from single column statistics assumes independence of the columns to which the parameter markers refer. To request values from single column statistics, it is necessary to identify the tables and columns to which the parameter marker predicates of a query refer. Therefore, a complex task of parsing of the query must be performed.
  • a software tool that replaces extract & transform tool 106 in FIG. 1 randomly selects bind value sets in step 202 .
  • the bind value sets are selected from the tables and columns to which the parameter marker predicates of a query refer. Random selection of bind value sets in step 202 advantageously allows bind values for any query to be selected irrespective of whether the query ever executed.
  • the number of collected bind values may be assigned as the values are randomly selected from the whole range of data.
  • using random sampling to collect bind values in step 202 allows whole bind value sets to be collected and not single bind values. Therefore, independence of the columns to which the parameter markers refer does not have to be assumed.
  • the query must be parsed if sampling is used in step 202 .
  • extract & transform tool 106 transforms the data extracted in step 202 into strings to facilitate handling and storage.
  • a validity check is performed on the extracted data. This validity check includes a check that the query includes one or more parameter markers. If the validity check determines that a query includes at least one parameter marker, then the process of FIG. 2 continues with step 204 . If the validity check determines that a query does not include any parameter markers, then the remaining steps of the process of FIG. 2 are skipped for that query.
  • step 204 extract & transform tool 106 loads (i.e., stores) the bind values collected in step 202 and related query execution information into PM tables 108 (see FIG. 1 ).
  • the PM tables are not part of system 100 (see FIG. 1 ) and step 204 stores the query execution information and the collected bind values in another data structure (e.g., plain files).
  • the related query execution information loaded in step 204 includes the text of the query, the query execution timestamp, and the duration of the query execution.
  • the collection and storage of bind values and related query execution information in steps 202 and 204 are discussed in more detail below relative to FIG. 3A .
  • plan space analyzer 110 (see FIG. 1 ) combines parameter marker bind value data collected and stored in steps 202 and 204 with database explain information retrieved from explain tables 112 (see FIG. 1 ). The details of step 206 are discussed below relative to FIG. 4A .
  • plan space analyzer 110 (see FIG. 1 ) optionally creates diagrams 114 (see FIG. 1 ) such as frequency and/or plan space diagrams.
  • plan space analyzer 110 (see FIG. 1 ) generates a recommendation for an optimal reoptimization mode for the query.
  • the optimal reoptimization mode recommended in step 210 is either ReOpt Once or ReOpt Always. The details of selecting between ReOpt Once and ReOpt Always in step 210 are described below relative to FIG. 4A .
  • the optimization frequency determination process of FIG. 2 ends at step 212 .
  • FIG. 3A is a flow diagram of a process of collecting and storing bind values of parameter markers in the process of FIG. 2 , in accordance with embodiments of the present invention.
  • the collection and storage process of FIG. 2 incorporates query feedback information and starts at step 300 .
  • step 302 each time a query is executed, a new record associated with the query execution is appended to binary dump file 104 (see FIG. 1 ).
  • step 302 appends the new record to a RDSMon binary dump file that provides a DB2® feedback warehouse with information about the execution of queries.
  • the feedback warehouse is filled with information about query executions that had a significant discrepancy between the optimizer's estimated costs and the actual query execution costs.
  • the contents of the record appended in step 302 include, for example, (1) a unique ID for the record, (2) query execution timestamp, (3) the amount of time taken to execute the query (a.k.a. query execution time), (4) the number of parameter markers included in the query, (5) the value type, value length and value of each parameter marker included in the query, (6) the text of the query statement, (7) the length of the query text, (8) the optimizer's estimated information and information length, and (9) the runtime counter's information and information length.
  • step 304 extract & transform tool 106 (see FIG. 1 ) extracts information about the executions of queries having one or more parameter markers and the bind values associated with those queries.
  • Step 304 extracts the query execution information and bind values from the records in binary dump file 104 (see FIG. 1 ).
  • the step 304 data is extracted from the aforementioned RDSMon binary dump file.
  • the data extracted in step 304 includes the text of the query statement, the query execution timestamp, the query execution time and all parameter marker bind values.
  • a validity check includes a check to determine whether the query includes at least one parameter marker and may include one or more other checks of the validity of the query execution information and bind values being extracted based on predefined criteria (e.g., check that the query text does not exceed a predetermined limit on the number of characters). If the validity check determines that the query does not include at least one parameter marker, then the remaining steps of FIG. 3A are not performed.
  • extract & transform tool 106 transforms the extracted parameter marker bind values into strings in step 306 .
  • the transformation in step 306 simplifies the handling and storage of the bind values.
  • extract & transform tool 106 stores the following items in a table of PM tables 108 (see FIG. 1 ): the query text, the database on which the query was executed, and a count of how many times the query was executed.
  • extract & transform tool 106 stores the following items in a table of PM tables 108 (see FIG. 1 ): an identifier of each parameter marker within the query, every actual bind value used for each parameter marker within the query, and a count of how many times a bind value was used for a parameter marker.
  • extract & transform tool 106 stores the following items in a table of PM tables 108 (see FIG. 1 ): each combination of bind values (i.e., bind value set) a query has been executed with and a count of how many times each combination occurred.
  • bind values i.e., bind value set
  • extract & transform tool 106 stores the following items in a table of PM tables 108 (see FIG. 1 ): a query execution timestamp for each query execution and an amount of time taken to execute the query for each query execution.
  • steps 308 , 310 , 312 and 314 are described below relative to FIGS. 3C and 3D .
  • the process of FIG. 3A ends at step 316 .
  • FIG. 3B is an example of a binary dump file converted into a human readable format from which query information and bind values are extracted in the process of FIG. 3A , in accordance with embodiments of the present invention.
  • a RDSMon binary dump file 320 includes examples of the query execution information, bind value information, optimizer estimations and runtime counter information described above relative to step 302 of FIG. 3A .
  • FIG. 3C are examples of code for creating database tables that store the query information and bind values in the process of FIG. 3A , in accordance with embodiments of the present invention.
  • Code 340 is an example of Data Definition Language (DDL) statements that generate four relational database tables PMQUERIES, PMVALUES, PMCOMBINATIONS AND PMEXECUTIONS. These relational database tables are examples of PM tables 108 (see FIG. 1 ).
  • the PMQUERIES table includes columns for storing the data items described above relative to step 308 (see FIG. 3A ).
  • the PMVALUES table includes columns for storing the data items described above relative to step 310 (see FIG. 3A ).
  • the PMCOMBINATIONS table includes columns for storing the data items described above relative to step 312 (see FIG. 3A ).
  • the PMEXECUTIONS table includes columns for storing the data items described above relative to step 314 (see FIG. 3A ).
  • Complete descriptions of the four PM tables generated by DDL statements 340 are presented below
  • FIG. 3D is an entity-relationship (ER) model of the relational database tables that store the query information and bind values in the process of FIG. 3A , in accordance with embodiments of the present invention.
  • ER model 360 illustrates the four PM tables created by the statements of FIG. 3C , their relationships and their respective columns. Each of the four PM tables in ER model 360 is described below.
  • PMQUERIES Each row contains information about one query. The columns of PMQUERIES are described in Table 1.
  • PMVALUES Each row contains the bind value of one parameter marker. A row is deleted from PMVALUES if the query the row refers to is deleted from the PMQUERIES table. The columns of PMVALUES are described in Table 2.
  • PMCOMBINATIONS This table combines one or more bind values to a bind value set. For each bind value set, one or more rows are inserted. A row is deleted from PMCOMBINATIONS if the query to which the row refers is deleted from the PMQUERIES table. The columns of PMCOMBINATIONS are described in Table 3.
  • PMEXECUTIONS Each row stores information about one execution of a query. For every execution, a new row is inserted. The primary key is the combination of CID and EXECUTED. A row is deleted from PMEXECUTIONS if the query to which the row refers is deleted from the PMQUERIES table. The columns of PMEXECUTIONS are described in Table 4.
  • step 204 includes the following storage process performed by extract & transform tool 106 (see FIG. 1 ) for each set of query execution data collected in step 202 (see FIG. 2 ).
  • the query text and the database name e.g., provided as a command line parameter
  • the associated counter QCOUNT is incremented by one. Otherwise, a new query identifier QID is received and the data is written to the PMQUERIES table.
  • all parameter marker bind values are inserted into the PMVALUES table according to the insertion of the query text.
  • the information about the combination of the bind values (i.e., the bind value set) is written to the PMCOMBINATIONS table. If the bind value set is already included in the PMCOMBINATIONS table, then the counter CCOUNT is incremented. Otherwise, a new bind value set identifier CID is generated and a row is inserted into the PMCOMBINATIONS table for each bind value of the bind value set. Finally, the query identifier, the bind value set identifier, the timestamp of the query's execution and the amount of time taken to execute the query are inserted into the PMEXECUTIONS table.
  • FIG. 4A is a process of analyzing a query execution plan space within the process of FIG. 2 , in accordance with embodiments of the present invention.
  • the query execution plan space analysis process begins at step 400 .
  • plan space analyzer 110 (see FIG. 1 ) requests and obtains the query text, the name of the database on which the query executed, and parameter marker bind value sets collected in step 202 (see FIG. 2 ).
  • step 402 requests and obtains the query and bind value set information from PM tables 108 (see FIG. 1 ).
  • plan space analyzer 110 Prior to step 404 , plan space analyzer 110 (see FIG. 1 ) employs a database facility to generate Explain information for each query whose query text is obtained in step 402 (hereinafter referred to as the “retrieved queries”). Generating Explain information for a query is also referred to herein as explaining the query (e.g., by composing and executing an EXPLAIN query). Each of the retrieved queries is explained with all of the retrieved query's bind value sets. In order to make each of the retrieved queries explainable, the retrieved query's one or more parameter markers are replaced by the bind values of one of the retrieved query's bind value sets.
  • the plan space analyzer assigns a query identifier (e.g., QUERYNO) to each bind value set and generates a query tag (e.g., QUERYTAG) for each of the retrieved queries.
  • the generated query tag includes the current timestamp and a random number.
  • each retrieved query is explained with its QUERYTAG and its bind value set's QUERYNO.
  • the query identifier and query tag uniquely identifies the Explain information of every retrieved query for each of the query's bind value sets.
  • the plan space analyzer uses a DB2® facility to store Explain information in database tables called Explain tables.
  • the Explain information stored in the Explain tables is generated while optimizing a query. If the Explain facility of DB2® is activated with the command SET CURRENT EXPLAIN MODE YES, the execution plan of a query is stored in the Explain tables when the query is executed.
  • Explain information is generated by composing and executing a Structured Query Language (SQL) statement EXPLAIN, which captures Explain information about the query execution plan for a supplied explainable statement and places the Explain information into Explain tables.
  • An explainable statement is one of the following SQL statements: DELETE, INSERT, SELECT, SELECT INTO, UPDATE, VALUES, or VALUES INTO.
  • the present invention employs the SQL EXPLAIN statement's SET QUERYNO and SET QUERYTAG options to set the aforementioned query identifiers and query tags, respectively, to uniquely identify the Explain information.
  • a sample SQL EXPLAIN statement that explains a query prior to step 404 is shown in FIG. 4D .
  • plan space analyzer 110 uses the query identifier and query tag to obtain Explain information from the aforementioned Explain tables for each bind value set of a retrieved query.
  • the Explain information obtained includes selectivities for all parameter marker predicates for each bind value set of each retrieved query.
  • cardinality measurements a.k.a. absolute measurements
  • the Explain information obtained includes query execution plan information and estimated execution costs for each bind value set of each retrieved query.
  • Each obtained estimated execution cost is an estimate of the cost of using an obtained query execution plan to optimally execute the retrieved query with a particular bind value set.
  • the plan space analyzer assigns unique execution plan IDs to the query execution plans, so that all identical query execution plans have the same execution plan ID.
  • step 406 obtains the execution plan ID instead of all of the query execution plan information.
  • plan space analyzer 110 groups parameter marker bind value sets by query execution plan ID. This grouping of parameter marker bind value sets by query execution plan ID is optionally displayed in a plan space diagram 114 (see FIG. 1 ).
  • plan space analyzer 110 determines one or more pairs of parameter marker bind value sets (p 1 , . . . , pn) i , (q 1 , . . . , qn) i where P i is an optimal query execution plan associated with (p 1 , . . . , pn) i , Q i is an optimal query execution plan associated with (q 1 , . . . , qn) i and i ⁇ 1.
  • the plan space analyzer determines a distance d i between a first selectivity measurement set (i.e., a set of one or more selectivity measurements) associated with (p 1 , . . .
  • step 410 determines the bind value set pairs that are associated with maximum distances between cardinality measurements.
  • plan space analyzer 110 determines one or more pairs of execution costs C 1 i and C 2 i where C 1 i is an estimated cost of executing the query via forcing the use of bind value set (q 1 , . . . , qn) i with query execution plan P i and where C 2 i is an estimated cost of executing the query via forcing the use of bind value set (p 1 , . . . , pn) i with query execution plan Q i .
  • This forcing of bind value sets with sub-optimal query execution plans is referred to herein as cross-forced plans.
  • plan space analyzer 110 compares each of the costs determined in step 412 with the cost of executing the query via a corresponding optimal query execution plan.
  • the plan space analyzer determines one or more pairs of differences D 1 i , D 2 i , where D 1 i is a difference between execution cost C 1 i and an optimal execution cost OC 1 i and D 2 i is a difference between execution cost C 2 i and an optimal execution cost OC 2 i .
  • Optimal execution costs OC 1 i and OC 2 i are included in the execution costs determined in step 406 .
  • OC 1 i is a cost of optimally executing the query via query execution plan Q i with bind value set (q 1 , . . . , qn) i
  • OC 2 i is a cost of optimally executing the query via query execution plan P i with bind value set (p 1 , . . . , pn) i .
  • plan space analyzer 110 recommends ReOpt Once as the optimization frequency for the query being analyzed.
  • plan space analyzer 110 recommends ReOpt Always as the optimization frequency for the query being analyzed.
  • FIG. 4B is an example of a table from which query information and bind value sets are requested and obtained in the process of FIG. 4A , in accordance with embodiments of the present invention.
  • Table 420 includes examples of a query text, a database name and bind value sets requested and obtained by plan space analyzer 110 (see FIG. 1 ) in step 402 of FIG. 4A .
  • Each parameter marker in the query text in table 420 is indicated by a question mark.
  • the query text, database name and bind value sets are obtained from PM tables 108 (see FIG. 1 ).
  • FIG. 4C is an example of a query in which parameter markers are replaced by bind values, where the query's text is obtained from the table in FIG. 4B , in accordance with embodiments of the present invention.
  • step 402 see FIG. 4A
  • the one or more parameter markers of the query being analyzed by the process of FIG. 4A are replaced by the one or more bind values in the bind value set obtained in step 402 (see FIG. 4A ) and the resulting query is explained.
  • Query 430 is an example of a query in which the two bind values of the first listed bind value set in table 420 (see FIG. 4B ) replace their corresponding parameter markers in the query text of table 420 (see FIG. 4B ).
  • the first listed bind value set in FIG. 4B is (1998, 63817) and therefore in query 430 , 1998 and 63817 replace the first question mark and second question mark, respectively, in the query text of FIG. 4B .
  • Explaining query 430 is illustrated in FIG. 4D .
  • FIG. 4D is an example of an Explain query generated from the query in FIG. 4C , in accordance with embodiments of the present invention.
  • Explain query 440 is an example of a query used to explain query 430 (see FIG. 4C ) subsequent to replacing a query's parameter markers with the bind values of a bind value set obtained in step 402 (see FIG. 4A ).
  • an Explain query is composed and executed for each bind value set.
  • the execution of each Explain query results in a query execution plan.
  • the combination of QUERYTAG and QUERYID in Explain query 440 is unique and is used to identify the explain information later in the process of FIG. 4A .
  • an initial criterion in the decision to use ReOpt Once or ReOpt Always is the number of different query execution plans associated with the query whose plan space is being analyzed. If only one query execution plan is determined in step 406 (see FIG. 4A ) for all bind value sets obtained in step 402 (see FIG. 4A ), then ReOpt Once is selected as the recommended optimal optimization frequency and the subsequent analysis steps of FIG. 4A are not required. In this case, ReOpt Always has the same execution cost as using ReOpt Once because ReOpt Always uses the same query execution plan after each reoptimization, but the optimization cost of ReOpt Always is significantly higher than using ReOpt Once.
  • plan space analyzer 110 makes an optimization frequency recommendation based on comparisons of estimated execution costs.
  • plan space analyzer 110 recommends ReOpt Once in step 414 (see FIG. 4A ) if using any of the query execution plans for all bind value sets does not result in execution costs that differ significantly from the execution costs associated with using the optimal query execution plan in each case. Otherwise, the plan space analyzer recommends ReOpt Always as the optimization frequency. That is, ReOpt Always is recommended if using any of the query execution plans for all bind value sets results in execution costs that differ significantly from the execution costs associated with using the optimal query execution plan in each case.
  • plan space analyzer 110 selects suitable bind value sets to be compared.
  • An example of selecting suitable bind value sets and making the cost comparisons is illustrated in FIGS. 4E-4G .
  • the selection of bind value sets for the execution cost comparison is based on the distance of their bind values in selectivity.
  • the optimizer's choice of the optimal execution plan is heavily dependent on the selectivity of local predicates.
  • the selection of the bind value sets for the execution cost comparison is based on distances between cardinality measurements.
  • A is a set of the bind value sets A 1 to A m and B is a set of bind value sets B 1 to B n , as shown in (1) and (2) presented below. All bind value sets of A and B are bind value sets of the query Q.
  • C(V,P) is the estimated execution cost of the query Q with the bind value set V and the execution plan P.
  • D A and D B are the differences in execution costs between using P A and P B , as shown in (5) and (6) presented below:
  • D A and D B are supposed to be maximal if the value sets A i and B j have a maximum distance in selectivity to each other because of the execution plan's dependency on selectivity.
  • the criterion for selecting bind value sets is the distance between bind value sets in selectivity. For each combination of two query execution plans P A and P B , one pair of bind value sets A i and B j is selected. A i and B j are the bind value sets with the maximum distance in selectivity.
  • plan space diagrams show the distribution of execution plans in the selectivity space of a query.
  • the selectivity measurements of the parameter marker predicates are plotted on the axes of the coordinate system of a plan space diagram.
  • cardinality measurements of the parameter marker predicates are plotted on the axes.
  • Each point represents one bind value set.
  • the shape and fill color of a circular or rectangular point on a plan space diagram indicates the query execution plan associated with the bind value set represented by that point.
  • the number of parameter markers in a query whose plan space is being analyzed is equivalent to the number of dimensions in the plan space diagram.
  • FIG. 4E is an exemplary plan space diagram illustrating a query execution plan space being analyzed by the process of FIG. 4A , in accordance with embodiments of the present invention.
  • Plan space diagram 450 illustrates the plan space of a query that has two parameter markers.
  • the bind value sets indicated by the points in plan space diagram 450 are grouped by execution plan ID in step 408 of FIG. 4A .
  • each bind value set is associated with one of three query execution plans (i.e., Plan 1 , Plan 2 and Plan 3 ), which are indicated in diagram 450 by the white-filled circles (i.e., Plan 1 ), black-filled circles (i.e., Plan 2 ) and the black-filled rectangles (i.e., Plan 3 ).
  • the goal of the selection of the bind value sets is it to determine pairs of points on a plan space diagram where the bind value sets represented by each pair of points are associated with different query execution plans Plan I and Plan II and where the points are at a maximum distance from each other, as compared to other pairs of points also associated with Plan I and Plan II. In one embodiment, distances between all points on a plan space diagram are calculated to determine the aforementioned pairs of points.
  • D E is the Euclidean distance between two points P and Q in an n-dimensional space.
  • a distance measure D is sufficient, as shown below in expression (7b).
  • the cost of the computation of the distance D is denoted by C(D).
  • each point in a plan space diagram that is totally surrounded by points associated with the same query execution plan as the point being considered is ignored (a.k.a. sorted out) in the determination of the maximum distances.
  • Points in plan space diagram 450 that are sorted out are indicated by triangular points in a plan space diagram 460 of FIG. 4F .
  • points that indicate bind value sets with minimum or maximum selectivities remain as circular or rectangular points in FIG. 4F , but points representing midrange bind value sets are ignored to simplify the maximum distances determination in step 410 of FIG. 4A .
  • a point P is totally surrounded if in every orthant of an n-dimensional Cartesian coordinate system with its origin in the point P, a point with the same shape and fill color as P exists. After a point is classified as surrounded, the point cannot be used to surround other points. Therefore, in an n-dimensional space, at least 2 n points are not surrounded by other points with the same shape and fill color.
  • worst case i.e., expression (11)
  • no point is sorted out and every point is combined two times with every other point. Comparing the worst case (i.e., expression (11)) to the original costs in expression (8) presented above, the worst case is three times as expensive as the original cost of determining the distances between all points.
  • maximum distances between selectivity measurements in the plan space diagram are determined in step 410 (see FIG. 4A ).
  • the maximum selectivity distances are between bind value sets associated with different query execution plans.
  • the line segments drawn between the larger circular or rectangular points in plan space diagram 470 in FIG. 4G indicate the maximum selectivity distances.
  • the larger circular or rectangular points at the endpoints of a line segment in FIG. 4G represent a pair of bind value sets used in a determination of estimated execution costs in step 412 (see FIG. 4A ).
  • each pair of bind value sets identified in step 410 is also referred to as an identified pair.
  • 4G identify three pairs of bind value sets. That is, a first line segment joins a first pair of bind value sets associated with Plan 1 and Plan 2 , a second line segment joins a second pair of bind value sets associated with Plan 1 and Plan 3 , and a third line segment joins a third pair of bind value sets associated with Plan 2 and Plan 3 .
  • each difference determined in step 414 is between a first execution cost of sub-optimally executing a query with a first query execution plan that is forced to use a bind value set of an identified pair and a second execution cost of optimally executing the query with a second query execution plan using the same bind value set.
  • the first execution plan is the plan for optimally executing the query with the other bind value set in the identified pair.
  • step 412 determines the suboptimal execution cost of forcing the query execution plan to use the other bind value set in the identified pair. This determination of the suboptimal execution costs in step 412 is performed via explaining the query with cross forced plans.
  • step 412 the query is explained in step 412 to determine a first suboptimal execution cost of using query execution plan P with bind value set B and a second suboptimal execution cost of using query execution plan Q with bind value set A.
  • a database hint feature is employed in step 412 (see FIG. 4A ) to force the query to use suboptimal query execution plans and subsequently determine the aforementioned suboptimal execution costs.
  • a database hint is a syntax addition to a SQL query (e.g., embedded within a comment of the SQL query) that overrides the database optimizer and controls the execution plan of the SQL query.
  • plan space analyzer 110 recommends ReOpt Once as the optimization frequency for the query. That is, ReOpt Once is recommended if each suboptimal execution cost and its associated optimal execution cost determined in step 412 (see FIG. 4A ) are substantially similar.
  • the predefined threshold value takes into account the optimization costs associated with ReOpt Always.
  • plan space analyzer 110 recommends ReOpt Always as the optimization frequency for the query. That is, ReOpt Always is recommended if any suboptimal execution cost determined in step 412 of FIG. 4A and its associated optimal execution cost determined in step 206 of FIG. 2 differ substantially.
  • step 412 determines six suboptimal execution costs.
  • step 414 determines six differences between the suboptimal execution costs and their associated optimal execution costs.
  • FIG. 5 is a computing system that includes components of the system of FIG. 1 and implements the processes of FIGS. 2 , 3 A and 4 A, in accordance with embodiments of the present invention.
  • Computing unit 500 is suitable for storing and/or executing program code of a system for automatically determining optimization frequencies of queries with parameter markers 514 , and generally comprises a central processing unit (CPU) 502 , a memory 504 , an input/output (I/O) interface 506 , a bus 508 , I/O devices 510 and a storage unit 512 .
  • Optimization frequency determination system 514 includes, for example, extract & transform tool 106 (see FIG. 1 ) and plan space analyzer 110 (see FIG. 1 ).
  • CPU 502 performs computation and control functions of computing unit 500 .
  • CPU 502 may comprise a single processing unit, or be distributed across one or more processing units in one or more locations (e.g., on a client and server).
  • Local memory elements of memory 504 are employed during actual execution of the program code of optimization frequency determination system 514 .
  • Cache memory elements of memory 504 provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • memory 504 may include other systems not shown in FIG. 5 , such as an operating system (e.g., Linux) that runs on CPU 502 and provides control of various components within and/or connected to computing unit 500 .
  • an operating system e.g., Linux
  • Memory 504 may comprise any known type of data storage and/or transmission media, including bulk storage, magnetic media, optical media, random access memory (RAM), read-only memory (ROM), a data cache, a data object, etc.
  • Storage unit 512 is, for example, a magnetic disk drive or an optical disk drive that stores data.
  • memory 504 may reside at a single physical location, comprising one or more types of data storage, or be distributed across a plurality of physical systems in various forms. Further, memory 504 can include data distributed across, for example, a LAN, WAN or storage area network (SAN) (not shown).
  • LAN local area network
  • WAN storage area network
  • I/O interface 506 comprises any system for exchanging information to or from an external source.
  • I/O devices 510 comprise any known type of external device, including a display monitor, keyboard, mouse, printer, speakers, handheld device, printer, facsimile, etc.
  • Bus 508 provides a communication link between each of the components in computing unit 500 , and may comprise any type of transmission link, including electrical, optical, wireless, etc.
  • I/O interface 506 also allows computing unit 500 to store and retrieve information (e.g., program instructions or data) from an auxiliary storage device (e.g., storage unit 512 ).
  • the auxiliary storage device may be a non-volatile storage device (e.g., a CD-ROM drive which receives a CD-ROM disk).
  • Computing unit 500 can store and retrieve information from other auxiliary storage devices (not shown), which can include a direct access storage device (DASD) (e.g., hard disk or floppy diskette), a magneto-optical disk drive, a tape drive, or a wireless communication device.
  • DASD direct access storage device
  • the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
  • the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code of optimization frequency determination system 514 for use by or in connection with a computing unit 500 or any instruction execution system to provide and facilitate the capabilities of the present invention.
  • a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, RAM 504 , ROM, a rigid magnetic disk and an optical disk.
  • Current examples of optical disks include compact disk-read-only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and system for automatically determining optimization frequencies of queries having one or more parameter markers. Execution plans for a query are generated and each plan is associated with one or more bind value sets. An optimization frequency is selected based on differences between pairs of execution costs where one execution cost of a pair is a cost of executing the query with a bind value set via a first execution plan and the other execution cost of the pair is a cost of optimally executing the query with the bind value set via a second execution plan. The differences are based on maximum selectivity or cardinality distances associated with the bind value sets. If none of the differences exceeds a predefined value, the query is optimized once. If at least one of the differences exceeds the predefined value, the query is reoptimized each time the query is executed.

Description

  • This application is a continuation application claiming priority to Ser. No. 11/673,142, filed Feb. 9, 2007.
  • FIELD OF THE INVENTION
  • The present invention relates to a method and system for automatically determining optimization frequencies of queries with parameter markers.
  • BACKGROUND OF THE INVENTION
  • Conventionally, a programmer or a database administrator (DBA) managing a database system manually selects a reoptimization mode for queries having parameter markers. This selection of an optimal reoptimization mode depends not only on the query itself, but also on the bind values of the query's parameter marker(s). As these bind values can be unknown or change over time, an attempt to select an optimal reoptimization mode is a non-trivial procedure that may result in selecting a sub-optimal reoptimization mode. This burdensome manual process of selecting a reoptimization mode decreases the productivity of the programmer or DBA and increases the total cost of ownership of the database system. Further, a selection of a sub-optimal reoptimization mode slows down a program or the entire database system. Thus, there exists a need to overcome at least one of the preceding deficiencies and limitations of the related art.
  • SUMMARY OF THE INVENTION
  • In first embodiments, the present invention provides a computer-implemented method of automatically determining an optimization frequency of a query having one or more parameter markers, the method comprising:
  • generating, by a computing system, a plurality of query execution plans for an execution of a query having one or more parameter markers, each query execution plan associated with one or more bind value sets of a plurality of bind values sets;
  • determining that no difference of a plurality of differences between pairs of execution costs exceeds a predefined threshold value or that at least one difference of the plurality of differences exceeds the predefined threshold value, each pair of execution costs including a first execution cost and a second execution cost, the first execution cost being a cost of executing the query with a bind value set of the plurality of bind value sets via a first query execution plan of the plurality of query execution plans and the second execution cost being a cost of optimally executing the query with the bind value set via a second query execution plan of the plurality of query execution plans;
  • automatically selecting an optimization frequency by the computing system; and
  • storing the optimization frequency in a computer-usable medium,
  • wherein the optimization frequency is optimizing the query once as a result of a first determination by the determining that no difference of the plurality of differences exceeds the predefined threshold value, and
  • wherein the optimization frequency is reoptimizing the query each time the query is executed as a result of a second determination by the determining that at least one difference of the plurality of differences exceeds the predefined threshold value.
  • In second embodiments, the present invention provides a computer-implemented method of determining an optimization frequency of a query having one or more parameter markers, the method comprising:
  • obtaining, by a computing system, a plurality of bind value sets, each bind value set including one or more bind values and associated with one or more parameter markers of a query;
  • obtaining, by the computing system, a plurality of measurement sets associated with the bind value sets in a one-to-one correspondence, each measurement set selected from the group consisting of one or more selectivity measurements and one or more cardinality measurements;
  • determining, by the computing system, a plurality of query execution plans, each query execution plan capable of optimally executing the query with one or more bind value sets of the plurality of bind value sets;
  • determining, by the computing system, a first set of execution costs associated with the query execution plans of the plurality of query execution plans in a one-to-one correspondence, each execution cost of the first set being a cost of optimally executing the query with a bind value set of the plurality of bind value sets;
  • determining, by the computing system, one or more pairs of bind value sets (PI, . . . , pn)i, (q1, . . . , qn)i of the plurality of bind value sets, the determining the one or more pairs of bind value sets including determining one or more distances di between a first measurement set S1 i associated with the bind value set (p1, . . . , pn)i and a second measurement set S2 i associated with the (q1, . . . , qn)i, the S1 i and the S2 i included in the plurality of measurement sets, wherein each distance d1 is a maximum distance between any pair of measurement sets associated with query execution plans Pi and Qi of the plurality of query execution plans, wherein the query execution plan Pi is an optimal query execution plan associated with the bind value set (p1, . . . , pn)i and the query execution plan Qi is an optimal query execution plan associated with the bind value set (p1, . . . , pn)i, and wherein the i≧1;
  • determining, by the computing system, one or more pairs of execution costs C1 i, C2 i of a second set of execution costs, wherein the C1 i is a cost of executing the query via the query execution plan Pi with bind value set (q1, . . . , qn)i and the C2 i is a cost of executing the query via the query execution plan Qi with bind value set (p1, . . . , pn)i;
  • determining, by the computing system, one or more pairs of differences D1 i and D2 i, wherein the D1 i is a difference between the cost C1 i and an optimal execution cost OC1 i of the first set of execution costs and the D2 i is a difference between the cost C2 i and an optimal execution cost OC2 i of the first set of execution costs, wherein the OC1 i is a cost of optimally executing the query via the query execution plan Qi with bind value set (q1, . . . , qn)i, and the OC2 i is a cost of optimally executing the query via the query execution plan Pi with bind value set (p1, . . . , pn)i;
  • automatically selecting, by the computing system, an optimization frequency, wherein the optimization frequency is selected from the group consisting of optimizing the query once and reoptimizing the query each time the query is executed; and
  • storing the optimization frequency in a computer-usable medium,
  • wherein the optimization frequency is the optimizing the query once as a result of a first determination, via the determining the one or more pairs of differences, that no difference of the one or more pairs of differences exceeds a predefined threshold value, and
  • wherein the optimization frequency is the reoptimizing the query each time the query is executed as a result of a second determination, via the determining the one or more pairs of differences, that at least one difference of the one or more pairs of differences exceeds the predefined threshold value.
  • Systems and computer program products corresponding to the above-summarized methods are also described herein.
  • Advantageously, the present invention provides a technique for using selectivity or cardinality information to automatically determine the reoptimization mode of a query having parameter markers, thereby improving the productivity of DBAs and reducing the total cost of ownership of databases.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of a system for automatically determining optimization frequencies of queries with parameter markers, in accordance with embodiments of the present invention.
  • FIG. 2 is a flow diagram of a process of automatically determining optimization frequencies of queries with parameter markers in the system of FIG. 1, in accordance with embodiments of the present invention.
  • FIG. 3A is a flow diagram of a process of collecting bind values of parameter markers in the process of FIG. 2, in accordance with embodiments of the present invention.
  • FIG. 3B is an example of a binary dump file converted into a human readable format from which query information and bind values are extracted in the process of FIG. 3A, in accordance with embodiments of the present invention.
  • FIG. 3C are examples of code for creating database tables that store the query information and bind values in the process of FIG. 3A, in accordance with embodiments of the present invention.
  • FIG. 3D is an entity-relationship model of the database tables that store the query information and bind values in the process of FIG. 3A, in accordance with embodiments of the present invention.
  • FIG. 4A is a process of analyzing a query execution plan space within the process of FIG. 2, in accordance with embodiments of the present invention.
  • FIG. 4B is an example of a table from which query information and bind value sets are requested and obtained in the process of FIG. 4A, in accordance with embodiments of the present invention.
  • FIG. 4C is an example of a query in which parameter markers are replaced by bind values, where the query's text is obtained from the table in FIG. 4B, in accordance with embodiments of the present invention.
  • FIG. 4D is an example of an Explain query generated from the query in FIG. 4C, in accordance with embodiments of the present invention.
  • FIG. 4E is an exemplary plan space diagram illustrating a query execution plan space being analyzed by the process of FIG. 4A, in accordance with embodiments of the present invention.
  • FIG. 4F is an exemplary plan space diagram illustrating a sorting out of bind value sets for determining a cost difference in the process of FIG. 4A, in accordance with embodiments of the present invention.
  • FIG. 4G is an exemplary plan space diagram illustrating a determination of pairs of bind value sets having a maximum distance in selectivity for determining a cost difference in the process of FIG. 4A, in accordance with embodiments of the present invention.
  • FIG. 5 is a computing system that includes components of the system of FIG. 1 and implements the processes of FIGS. 2, 3A and 4A, in accordance with embodiments of the present invention.
  • DETAILED DESCRIPTION OF THE INVENTION
  • The present invention provides a technique for using selectivity information to automatically determine an optimization frequency (a.k.a. reoptimization mode) of a query having one or more parameter markers. The selectivity information is obtained for multiple instances of parameter marker values either through query feedback, the system catalog, or by drawing random samples. The present invention determines whether a query execution plan is sensitive to the selectivity of the parameter marker values. To support this determination of the query execution plan's sensitivity, the technique disclosed herein creates a graph of the selectivity space and associates each parameter marker bind value set in this space with a query execution plan. By taking the largest distances in the selectivity space, the technique disclosed herein determines whether one query execution plan is sufficient to cover the selectivity space, or whether multiple query execution plans are required. Further, the technique provides a recommendation to either optimize the query once or to reoptimize the query for every set of parameter marker bind values.
  • DEFINITIONS
  • Parameter marker: a placeholder for a variable within a query. To provide parameter markers with values, variables are bound to the parameter markers. A bind value replaces a parameter marker at query execution time. A bind value of a parameter marker is known only at query execution time.
  • Plan space diagram: a diagram that shows the location of all different query execution plans for a query having two parameter markers in the space defined by the selectivities of the bind values of the parameter markers.
  • Frequency diagram: a bar chart showing the frequency distribution of the selectivities of the bind values of a parameter marker (e.g., one axis of the chart plots selectivity and the other axis plots frequency in percent).
  • Selectivity: A ratio or percentage of data sets that are sorted out by a predicate of a database query. For example, a predicate is a condition in a WHERE clause applied to a database table and a selectivity indicates the percentage of rows of the table that satisfy the condition.
  • Default selectivity: A default selectivity assumes a uniform data distribution in an affected database column and is defined as:
  • s = 1 ( number of distinct values of the affected column )
  • Reoptimization mode: Defines when and how often a database optimizer is called to select a query execution plan for a query with one or more parameter markers. Reoptimization modes are also referred to as optimization frequencies and include ReOpt None, ReOpt Once and ReOpt Always.
  • ReOpt None: A reoptimization mode in which queries are optimized once before their first execution during a query preparation process. Since parameter marker bind values are unknown at optimization time and known only at query execution time, the optimizer cannot estimate their selectivities using statistical information. Instead, the optimizer determines a default selectivity for each parameter marker predicate. Using the default selectivities, the optimizer selects a query execution plan that is cached and used for all executions of the query. ReOpt None results in an optimal query execution plan only if the data is uniformly distributed in all affected database columns. DB2® uses ReOpt None as the default reoptimization mode.
  • ReOpt Once: A reoptimization mode in which a query is optimized only once, before the query's first execution and using the query's first bind value set to estimate the selectivities of the parameter marker predicates. The optimizer chooses a query execution plan based on execution cost estimations for the estimated selectivities. The resulting query execution plan is cached and used for all executions of the query. ReOpt Once provides a savings in optimization costs, but causes high execution costs if the cached query execution plan is suboptimal for a set of bind values. ReOpt Once is efficient only if the selectivities of the parameter marker predicates for all subsequent bind value sets do not differ significantly from the aforementioned selectivities determined for the first bind value set.
  • ReOpt Always: A reoptimization mode in which a query is reoptimized before every execution of the query, each time using a current bind value set (i.e., one or more bind values associated with the current query execution) to estimate current selectivities of the parameter marker predicates. For each reoptimization, the optimizer selects the query execution plan that is optimal for the current bind value set based on the optimizer's knowledge of the data distribution according to available database statistics. ReOpt Always is expensive in terms of optimization costs.
  • System Overview
  • FIG. 1 is a block diagram of a system for automatically determining optimization frequencies of queries with parameter markers, in accordance with embodiments of the present invention. System 100 includes an operational database 102, a binary dump file 104, and extract & transform software tool 106 and parameter marker (PM) tables 108. Operational database 102 dumps query execution information in binary dump file 104. The query execution information is associated with a query that has one or more parameter markers. Extract & transform tool 106 extracts query text, parameter marker values and query execution time information from binary dump file 104. The extracted data is stored in tables 108.
  • In another embodiment, the query text, parameter marker values and query execution time information are collected from a source other than binary dump file 104 and are stored in a structure (e.g., plain files) other than PM tables 108.
  • System 100 also includes a software-based query execution plan space analyzer 110, database explain tables 112, optional frequency and plan space diagrams 114 and reoptimization advice 116. Hereinafter, a query execution plan space analyzer is also referred to simply as a plan space analyzer. Plan space analyzer 110 combines parameter marker data from tables 108 with explain information from explain tables 112, optionally generates frequency and/or plan space diagrams, and determines a recommendation 116 for a reoptimization mode (i.e., an optimization frequency). The recommended reoptimization mode is stored in a computer-usable or computer-readable medium (not shown), transmitted to a user (not shown) of system 100 or displayed onscreen or in a hard copy format. The process for determining a recommended reoptimization mode is described below relative to FIGS. 2, 3A and 4A.
  • Overview of Determining Optimization Frequency
  • FIG. 2 is a flow diagram of a process of automatically determining optimization frequencies of queries with parameter markers in the system of FIG. 1, in accordance with embodiments of the present invention. The optimization frequency determination process of FIG. 2 begins at step 200. In step 202, a software tool (e.g., extract & transform tool 106 in FIG. 1) collects bind values of each parameter marker of a set of one or more parameter markers included in a query. In a first embodiment, the bind values are written into binary dump file 104 (see FIG. 1) (e.g., a RDSMon dump file provided by a DB2® feedback warehouse architecture) and the bind values are subsequently extracted from the binary dump file in step 202 by extract & transform tool 106 (see FIG. 1).
  • In a second embodiment, a software tool (not shown) that replaces extract & transform tool 106 in FIG. 1 uses single column statistics (e.g., information about frequent values and quantiles of a table's column) from system catalog tables to collect the bind values in step 202. Using single column database statistics in step 202 advantageously allows bind values to be collected according to their selectivities, thereby permitting the bind values to be used to analyze the whole query execution plan space, as described below relative to FIG. 4A. Hereinafter, a query execution plan space is also referred to simply as a plan space. Further, using database statistics in step 202 makes it possible to collect bind values for queries that never executed. On the other hand, using single column statistics in step 202 requires up-to-date statistics and refreshing such statistics is expensive. Further, collecting bind values from single column statistics assumes independence of the columns to which the parameter markers refer. To request values from single column statistics, it is necessary to identify the tables and columns to which the parameter marker predicates of a query refer. Therefore, a complex task of parsing of the query must be performed.
  • In a third embodiment, a software tool (not shown) that replaces extract & transform tool 106 in FIG. 1 randomly selects bind value sets in step 202. The bind value sets are selected from the tables and columns to which the parameter marker predicates of a query refer. Random selection of bind value sets in step 202 advantageously allows bind values for any query to be selected irrespective of whether the query ever executed. The number of collected bind values may be assigned as the values are randomly selected from the whole range of data. Further, using random sampling to collect bind values in step 202 allows whole bind value sets to be collected and not single bind values. Therefore, independence of the columns to which the parameter markers refer does not have to be assumed. However, it is very expensive to collect bind values by random sampling if the query joins large tables because the whole data range has to be scanned to ensure a satisfactory distribution of bind values. Furthermore, the query must be parsed if sampling is used in step 202.
  • In one embodiment, extract & transform tool 106 (see FIG. 1) transforms the data extracted in step 202 into strings to facilitate handling and storage. In one embodiment, after the extraction in step 202, a validity check is performed on the extracted data. This validity check includes a check that the query includes one or more parameter markers. If the validity check determines that a query includes at least one parameter marker, then the process of FIG. 2 continues with step 204. If the validity check determines that a query does not include any parameter markers, then the remaining steps of the process of FIG. 2 are skipped for that query.
  • In step 204, extract & transform tool 106 (see FIG. 1) loads (i.e., stores) the bind values collected in step 202 and related query execution information into PM tables 108 (see FIG. 1). In another embodiment, the PM tables are not part of system 100 (see FIG. 1) and step 204 stores the query execution information and the collected bind values in another data structure (e.g., plain files). The related query execution information loaded in step 204 includes the text of the query, the query execution timestamp, and the duration of the query execution. The collection and storage of bind values and related query execution information in steps 202 and 204 are discussed in more detail below relative to FIG. 3A.
  • In step 206, plan space analyzer 110 (see FIG. 1) combines parameter marker bind value data collected and stored in steps 202 and 204 with database explain information retrieved from explain tables 112 (see FIG. 1). The details of step 206 are discussed below relative to FIG. 4A.
  • In step 208, plan space analyzer 110 (see FIG. 1) optionally creates diagrams 114 (see FIG. 1) such as frequency and/or plan space diagrams. In step 210, plan space analyzer 110 (see FIG. 1) generates a recommendation for an optimal reoptimization mode for the query. The optimal reoptimization mode recommended in step 210 is either ReOpt Once or ReOpt Always. The details of selecting between ReOpt Once and ReOpt Always in step 210 are described below relative to FIG. 4A. The optimization frequency determination process of FIG. 2 ends at step 212.
  • Collecting Bind Values
  • FIG. 3A is a flow diagram of a process of collecting and storing bind values of parameter markers in the process of FIG. 2, in accordance with embodiments of the present invention. The collection and storage process of FIG. 2 incorporates query feedback information and starts at step 300. In step 302, each time a query is executed, a new record associated with the query execution is appended to binary dump file 104 (see FIG. 1). As one example, step 302 appends the new record to a RDSMon binary dump file that provides a DB2® feedback warehouse with information about the execution of queries. The feedback warehouse is filled with information about query executions that had a significant discrepancy between the optimizer's estimated costs and the actual query execution costs.
  • The contents of the record appended in step 302 include, for example, (1) a unique ID for the record, (2) query execution timestamp, (3) the amount of time taken to execute the query (a.k.a. query execution time), (4) the number of parameter markers included in the query, (5) the value type, value length and value of each parameter marker included in the query, (6) the text of the query statement, (7) the length of the query text, (8) the optimizer's estimated information and information length, and (9) the runtime counter's information and information length.
  • In step 304, extract & transform tool 106 (see FIG. 1) extracts information about the executions of queries having one or more parameter markers and the bind values associated with those queries. Step 304 extracts the query execution information and bind values from the records in binary dump file 104 (see FIG. 1). As one example, the step 304 data is extracted from the aforementioned RDSMon binary dump file. The data extracted in step 304 includes the text of the query statement, the query execution timestamp, the query execution time and all parameter marker bind values. A validity check (not shown) includes a check to determine whether the query includes at least one parameter marker and may include one or more other checks of the validity of the query execution information and bind values being extracted based on predefined criteria (e.g., check that the query text does not exceed a predetermined limit on the number of characters). If the validity check determines that the query does not include at least one parameter marker, then the remaining steps of FIG. 3A are not performed.
  • As all parameter marker bind values are saved in their original data type in the RDSMon file, extract & transform tool 106 (see FIG. 1) transforms the extracted parameter marker bind values into strings in step 306. The transformation in step 306 simplifies the handling and storage of the bind values.
  • In step 308, extract & transform tool 106 (see FIG. 1) stores the following items in a table of PM tables 108 (see FIG. 1): the query text, the database on which the query was executed, and a count of how many times the query was executed.
  • In step 310, extract & transform tool 106 (see FIG. 1) stores the following items in a table of PM tables 108 (see FIG. 1): an identifier of each parameter marker within the query, every actual bind value used for each parameter marker within the query, and a count of how many times a bind value was used for a parameter marker.
  • In step 312, extract & transform tool 106 (see FIG. 1) stores the following items in a table of PM tables 108 (see FIG. 1): each combination of bind values (i.e., bind value set) a query has been executed with and a count of how many times each combination occurred.
  • In step 314, extract & transform tool 106 (see FIG. 1) stores the following items in a table of PM tables 108 (see FIG. 1): a query execution timestamp for each query execution and an amount of time taken to execute the query for each query execution.
  • The tables referenced by steps 308, 310, 312 and 314 are described below relative to FIGS. 3C and 3D. The process of FIG. 3A ends at step 316.
  • FIG. 3B is an example of a binary dump file converted into a human readable format from which query information and bind values are extracted in the process of FIG. 3A, in accordance with embodiments of the present invention. A RDSMon binary dump file 320 includes examples of the query execution information, bind value information, optimizer estimations and runtime counter information described above relative to step 302 of FIG. 3A.
  • Storing Query Execution Information
  • FIG. 3C are examples of code for creating database tables that store the query information and bind values in the process of FIG. 3A, in accordance with embodiments of the present invention. Code 340 is an example of Data Definition Language (DDL) statements that generate four relational database tables PMQUERIES, PMVALUES, PMCOMBINATIONS AND PMEXECUTIONS. These relational database tables are examples of PM tables 108 (see FIG. 1). The PMQUERIES table includes columns for storing the data items described above relative to step 308 (see FIG. 3A). The PMVALUES table includes columns for storing the data items described above relative to step 310 (see FIG. 3A). The PMCOMBINATIONS table includes columns for storing the data items described above relative to step 312 (see FIG. 3A). The PMEXECUTIONS table includes columns for storing the data items described above relative to step 314 (see FIG. 3A). Complete descriptions of the four PM tables generated by DDL statements 340 are presented below relative to FIG. 3D.
  • FIG. 3D is an entity-relationship (ER) model of the relational database tables that store the query information and bind values in the process of FIG. 3A, in accordance with embodiments of the present invention. ER model 360 illustrates the four PM tables created by the statements of FIG. 3C, their relationships and their respective columns. Each of the four PM tables in ER model 360 is described below.
  • PMQUERIES: Each row contains information about one query. The columns of PMQUERIES are described in Table 1.
  • TABLE 1
    PMQUERIES
    Column Description
    QID Primary key column
    QTEXT Contains the query text having no more than
    a predefined maximum number of
    characters
    DBNAME Name of the database on which the query
    ran
    CCOUNT Counts how many times information about a
    certain query is added to the PM Tables. The
    counter is initialized with one when a new
    query is inserted.
  • PMVALUES: Each row contains the bind value of one parameter marker. A row is deleted from PMVALUES if the query the row refers to is deleted from the PMQUERIES table. The columns of PMVALUES are described in Table 2.
  • TABLE 2
    PMVALUES
    Column Description
    QID References a row in the PMQUERIES table
    and therefore references a specific query
    PNO Identifies the parameter marker within the
    query. For the first parameter marker, PNO
    has the value of one, for the second
    parameter marker, PNO has the value of
    two, etc.
    PVID Primary key column
    PVAL Stores the bind value of the parameter
    marker as a VARCHAR having no more
    than a predefined maximum number of
    characters
    PVCOUNT Counts how many times the value for a
    parameter marker has occurred, where the
    parameter marker is in the query specified
    by QID and on the parameter marker
    position specified by PNO
  • PMCOMBINATIONS: This table combines one or more bind values to a bind value set. For each bind value set, one or more rows are inserted. A row is deleted from PMCOMBINATIONS if the query to which the row refers is deleted from the PMQUERIES table. The columns of PMCOMBINATIONS are described in Table 3.
  • TABLE 3
    PMCOMBINATIONS
    Column Description
    QID References a row in the PMQUERIES table
    and therefore references a specific query.
    CID Primary key column
    PVID References a bind value in the PMVALUES
    table
    CCOUNT Counts how many times the bind value set
    of this row has occurred.
  • PMEXECUTIONS: Each row stores information about one execution of a query. For every execution, a new row is inserted. The primary key is the combination of CID and EXECUTED. A row is deleted from PMEXECUTIONS if the query to which the row refers is deleted from the PMQUERIES table. The columns of PMEXECUTIONS are described in Table 4.
  • TABLE 4
    PMEXECUTIONS
    Column Description
    QID References a row in the PMQUERIES table
    and therefore references a specific query
    EXECUTED Stores the time when the query was
    executed
    CID Stores the bind value set with which the
    query was executed
    EXEC_TIME Stores the time taken to execute the query
  • In one embodiment, step 204 (see FIG. 2) includes the following storage process performed by extract & transform tool 106 (see FIG. 1) for each set of query execution data collected in step 202 (see FIG. 2). First, the query text and the database name (e.g., provided as a command line parameter) are inserted into the PMQUERIES table. If the combination of both values already exists, the associated counter QCOUNT is incremented by one. Otherwise, a new query identifier QID is received and the data is written to the PMQUERIES table. Second, all parameter marker bind values are inserted into the PMVALUES table according to the insertion of the query text. Third, the information about the combination of the bind values (i.e., the bind value set) is written to the PMCOMBINATIONS table. If the bind value set is already included in the PMCOMBINATIONS table, then the counter CCOUNT is incremented. Otherwise, a new bind value set identifier CID is generated and a row is inserted into the PMCOMBINATIONS table for each bind value of the bind value set. Finally, the query identifier, the bind value set identifier, the timestamp of the query's execution and the amount of time taken to execute the query are inserted into the PMEXECUTIONS table.
  • Plan Space Analysis
  • FIG. 4A is a process of analyzing a query execution plan space within the process of FIG. 2, in accordance with embodiments of the present invention. The query execution plan space analysis process begins at step 400. In step 402, plan space analyzer 110 (see FIG. 1) requests and obtains the query text, the name of the database on which the query executed, and parameter marker bind value sets collected in step 202 (see FIG. 2). In one example, step 402 requests and obtains the query and bind value set information from PM tables 108 (see FIG. 1).
  • Prior to step 404, plan space analyzer 110 (see FIG. 1) employs a database facility to generate Explain information for each query whose query text is obtained in step 402 (hereinafter referred to as the “retrieved queries”). Generating Explain information for a query is also referred to herein as explaining the query (e.g., by composing and executing an EXPLAIN query). Each of the retrieved queries is explained with all of the retrieved query's bind value sets. In order to make each of the retrieved queries explainable, the retrieved query's one or more parameter markers are replaced by the bind values of one of the retrieved query's bind value sets. Via the Explain facility being used, the plan space analyzer assigns a query identifier (e.g., QUERYNO) to each bind value set and generates a query tag (e.g., QUERYTAG) for each of the retrieved queries. The generated query tag includes the current timestamp and a random number. As one example, each retrieved query is explained with its QUERYTAG and its bind value set's QUERYNO. Thus, the query identifier and query tag uniquely identifies the Explain information of every retrieved query for each of the query's bind value sets.
  • As one example, the plan space analyzer uses a DB2® facility to store Explain information in database tables called Explain tables. The Explain information stored in the Explain tables is generated while optimizing a query. If the Explain facility of DB2® is activated with the command SET CURRENT EXPLAIN MODE YES, the execution plan of a query is stored in the Explain tables when the query is executed.
  • As another example, Explain information is generated by composing and executing a Structured Query Language (SQL) statement EXPLAIN, which captures Explain information about the query execution plan for a supplied explainable statement and places the Explain information into Explain tables. An explainable statement is one of the following SQL statements: DELETE, INSERT, SELECT, SELECT INTO, UPDATE, VALUES, or VALUES INTO. The present invention employs the SQL EXPLAIN statement's SET QUERYNO and SET QUERYTAG options to set the aforementioned query identifiers and query tags, respectively, to uniquely identify the Explain information. A sample SQL EXPLAIN statement that explains a query prior to step 404 is shown in FIG. 4D.
  • In steps 404 and 406, plan space analyzer 110 (see FIG. 1) uses the query identifier and query tag to obtain Explain information from the aforementioned Explain tables for each bind value set of a retrieved query. In step 404, the Explain information obtained includes selectivities for all parameter marker predicates for each bind value set of each retrieved query. In another embodiment, cardinality measurements (a.k.a. absolute measurements) for each bind value set are obtained in step 404 instead of selectivities. In step 406, the Explain information obtained includes query execution plan information and estimated execution costs for each bind value set of each retrieved query. Each obtained estimated execution cost is an estimate of the cost of using an obtained query execution plan to optimally execute the retrieved query with a particular bind value set.
  • In one embodiment, the plan space analyzer assigns unique execution plan IDs to the query execution plans, so that all identical query execution plans have the same execution plan ID. In this case, step 406 obtains the execution plan ID instead of all of the query execution plan information.
  • In step 408, plan space analyzer 110 (see FIG. 1) groups parameter marker bind value sets by query execution plan ID. This grouping of parameter marker bind value sets by query execution plan ID is optionally displayed in a plan space diagram 114 (see FIG. 1).
  • In step 410, plan space analyzer 110 (see FIG. 1) determines one or more pairs of parameter marker bind value sets (p1, . . . , pn)i, (q1, . . . , qn)i where Pi is an optimal query execution plan associated with (p1, . . . , pn)i, Qi is an optimal query execution plan associated with (q1, . . . , qn)i and i≧1. The plan space analyzer determines a distance di between a first selectivity measurement set (i.e., a set of one or more selectivity measurements) associated with (p1, . . . , pn)i and a second selectivity measurement set associated with (q1, . . . , qn)i. Each distance di is a maximum distance between any pair of selectivity measurement sets, where the first selectivity measurement set of the pair is associated with query execution plan Pi and the second selectivity measurement set of the pair is associated with Qi. In another embodiment, step 410 determines the bind value set pairs that are associated with maximum distances between cardinality measurements.
  • In step 412, for the one or more bind value set pairs determined in step 410, plan space analyzer 110 (see FIG. 1) determines one or more pairs of execution costs C1 i and C2 i where C1 i is an estimated cost of executing the query via forcing the use of bind value set (q1, . . . , qn)i with query execution plan Pi and where C2 i is an estimated cost of executing the query via forcing the use of bind value set (p1, . . . , pn)i with query execution plan Qi. This forcing of bind value sets with sub-optimal query execution plans is referred to herein as cross-forced plans.
  • In step 414, plan space analyzer 110 (see FIG. 1) compares each of the costs determined in step 412 with the cost of executing the query via a corresponding optimal query execution plan. The plan space analyzer determines one or more pairs of differences D1 i, D2 i, where D1 i is a difference between execution cost C1 i and an optimal execution cost OC1 i and D2 i is a difference between execution cost C2 i and an optimal execution cost OC2 i. Optimal execution costs OC1 i and OC2 i are included in the execution costs determined in step 406. OC1 i is a cost of optimally executing the query via query execution plan Qi with bind value set (q1, . . . , qn)i and OC2 i is a cost of optimally executing the query via query execution plan Pi with bind value set (p1, . . . , pn)i.
  • If none of the differences in the one or more pairs of differences D1 i, D2 i exceed a predefined threshold value (i.e., the costs in each pair of costs compared in step 414 are substantially similar), then plan space analyzer 110 (see FIG. 1) recommends ReOpt Once as the optimization frequency for the query being analyzed. On the other hand, if at least one difference of the one or more pairs of differences D1 i, D2 i exceeds the predefined threshold value (i.e., costs in at least one of the pairs of costs compared in step 414 differ substantially), then plan space analyzer 110 (see FIG. 1) recommends ReOpt Always as the optimization frequency for the query being analyzed.
  • FIG. 4B is an example of a table from which query information and bind value sets are requested and obtained in the process of FIG. 4A, in accordance with embodiments of the present invention. Table 420 includes examples of a query text, a database name and bind value sets requested and obtained by plan space analyzer 110 (see FIG. 1) in step 402 of FIG. 4A. Each parameter marker in the query text in table 420 is indicated by a question mark. In one embodiment, the query text, database name and bind value sets are obtained from PM tables 108 (see FIG. 1).
  • FIG. 4C is an example of a query in which parameter markers are replaced by bind values, where the query's text is obtained from the table in FIG. 4B, in accordance with embodiments of the present invention. After step 402 (see FIG. 4A), the one or more parameter markers of the query being analyzed by the process of FIG. 4A are replaced by the one or more bind values in the bind value set obtained in step 402 (see FIG. 4A) and the resulting query is explained. Query 430 is an example of a query in which the two bind values of the first listed bind value set in table 420 (see FIG. 4B) replace their corresponding parameter markers in the query text of table 420 (see FIG. 4B). That is, the first listed bind value set in FIG. 4B is (1998, 63817) and therefore in query 430, 1998 and 63817 replace the first question mark and second question mark, respectively, in the query text of FIG. 4B. Explaining query 430 is illustrated in FIG. 4D.
  • FIG. 4D is an example of an Explain query generated from the query in FIG. 4C, in accordance with embodiments of the present invention. Explain query 440 is an example of a query used to explain query 430 (see FIG. 4C) subsequent to replacing a query's parameter markers with the bind values of a bind value set obtained in step 402 (see FIG. 4A). For each query being considered by the analysis process of FIG. 4A, an Explain query is composed and executed for each bind value set. The execution of each Explain query results in a query execution plan. The combination of QUERYTAG and QUERYID in Explain query 440 is unique and is used to identify the explain information later in the process of FIG. 4A.
  • Comparing Execution Costs
  • Although not shown in FIG. 4A, an initial criterion in the decision to use ReOpt Once or ReOpt Always is the number of different query execution plans associated with the query whose plan space is being analyzed. If only one query execution plan is determined in step 406 (see FIG. 4A) for all bind value sets obtained in step 402 (see FIG. 4A), then ReOpt Once is selected as the recommended optimal optimization frequency and the subsequent analysis steps of FIG. 4A are not required. In this case, ReOpt Always has the same execution cost as using ReOpt Once because ReOpt Always uses the same query execution plan after each reoptimization, but the optimization cost of ReOpt Always is significantly higher than using ReOpt Once.
  • If more than one query execution plan is determined for a query in step 406 (see FIG. 4A), then plan space analyzer 110 (see FIG. 1) makes an optimization frequency recommendation based on comparisons of estimated execution costs.
  • If a query is explained in step 406 (see FIG. 4A) with more than one query execution plan, then plan space analyzer 110 (see FIG. 1) recommends ReOpt Once in step 414 (see FIG. 4A) if using any of the query execution plans for all bind value sets does not result in execution costs that differ significantly from the execution costs associated with using the optimal query execution plan in each case. Otherwise, the plan space analyzer recommends ReOpt Always as the optimization frequency. That is, ReOpt Always is recommended if using any of the query execution plans for all bind value sets results in execution costs that differ significantly from the execution costs associated with using the optimal query execution plan in each case.
  • Assuming the worst case, the maximum difference in execution costs for a bind value set has to be determined. However, comparing the execution costs of all bind value sets of a query for all query execution plans is very expensive and time consuming. To reduce the number of cost comparisons, plan space analyzer 110 (see FIG. 1) selects suitable bind value sets to be compared. An example of selecting suitable bind value sets and making the cost comparisons is illustrated in FIGS. 4E-4G.
  • The selection of bind value sets for the execution cost comparison is based on the distance of their bind values in selectivity. The optimizer's choice of the optimal execution plan is heavily dependent on the selectivity of local predicates. In another embodiment, the selection of the bind value sets for the execution cost comparison is based on distances between cardinality measurements.
  • In the description of selecting bind value sets that follows, A is a set of the bind value sets A1 to Am and B is a set of bind value sets B1 to Bn, as shown in (1) and (2) presented below. All bind value sets of A and B are bind value sets of the query Q.

  • A={A1,A2, . . . , Am}  (1)

  • B={B1,B2, . . . , Bn}  (2)
  • All bind value sets of A use PA as the optimal execution plan and all bind value sets of B use PB as the optimal execution plan, as shown in (3) and (4) presented below:

  • A→PA  (3)

  • B→PB  (4)
  • C(V,P) is the estimated execution cost of the query Q with the bind value set V and the execution plan P.
  • DA and DB are the differences in execution costs between using PA and PB, as shown in (5) and (6) presented below:

  • D A =C(A i ,P B)−C(A i ,P A)  (5)

  • D B =C(B j ,P A)−C(B j ,P B)  (6)
  • DA and DB are supposed to be maximal if the value sets Ai and Bj have a maximum distance in selectivity to each other because of the execution plan's dependency on selectivity.
  • Therefore, the criterion for selecting bind value sets is the distance between bind value sets in selectivity. For each combination of two query execution plans PA and PB, one pair of bind value sets Ai and Bj is selected. Ai and Bj are the bind value sets with the maximum distance in selectivity.
  • The process of selecting bind value sets is illustrated by plan space diagrams in FIGS. 4E-4G. A plan space diagram shows the distribution of execution plans in the selectivity space of a query. The selectivity measurements of the parameter marker predicates are plotted on the axes of the coordinate system of a plan space diagram. In another embodiment, cardinality measurements of the parameter marker predicates are plotted on the axes. Each point represents one bind value set. In FIGS. 4E-4G, the shape and fill color of a circular or rectangular point on a plan space diagram indicates the query execution plan associated with the bind value set represented by that point. The number of parameter markers in a query whose plan space is being analyzed is equivalent to the number of dimensions in the plan space diagram.
  • FIG. 4E is an exemplary plan space diagram illustrating a query execution plan space being analyzed by the process of FIG. 4A, in accordance with embodiments of the present invention. Plan space diagram 450 illustrates the plan space of a query that has two parameter markers. The bind value sets indicated by the points in plan space diagram 450 are grouped by execution plan ID in step 408 of FIG. 4A. In this case, each bind value set is associated with one of three query execution plans (i.e., Plan 1, Plan 2 and Plan 3), which are indicated in diagram 450 by the white-filled circles (i.e., Plan 1), black-filled circles (i.e., Plan 2) and the black-filled rectangles (i.e., Plan 3).
  • The goal of the selection of the bind value sets is it to determine pairs of points on a plan space diagram where the bind value sets represented by each pair of points are associated with different query execution plans Plan I and Plan II and where the points are at a maximum distance from each other, as compared to other pairs of points also associated with Plan I and Plan II. In one embodiment, distances between all points on a plan space diagram are calculated to determine the aforementioned pairs of points.
  • In equation (7a) presented below, DE is the Euclidean distance between two points P and Q in an n-dimensional space.
  • D E = i = 1 n ( P i - Q i ) 2 ( 7 a )
  • For comparison purposes, a distance measure D is sufficient, as shown below in expression (7b). The cost of the computation of the distance D is denoted by C(D).
  • D = i = 1 n ( P i - Q i ) 2 ( 7 b )
  • The overall cost to select desired points out of m points in an n-dimensional space is shown below in expression (8):
  • ( m * ( m - 1 ) 2 ) * C ( D ) ( 8 )
  • As the overall cost represented by expression (8) rises quadratically for an increasing number of points and linearly for an increasing number of dimensions, the method of selecting bind value sets by calculating the distances between all points in a plan space diagram is expensive.
  • In a second embodiment, in order to reduce the number of distance calculations and male the process less expensive, each point in a plan space diagram that is totally surrounded by points associated with the same query execution plan as the point being considered is ignored (a.k.a. sorted out) in the determination of the maximum distances. Points in plan space diagram 450 that are sorted out are indicated by triangular points in a plan space diagram 460 of FIG. 4F. In other words, points that indicate bind value sets with minimum or maximum selectivities remain as circular or rectangular points in FIG. 4F, but points representing midrange bind value sets are ignored to simplify the maximum distances determination in step 410 of FIG. 4A.
  • A point P is totally surrounded if in every orthant of an n-dimensional Cartesian coordinate system with its origin in the point P, a point with the same shape and fill color as P exists. After a point is classified as surrounded, the point cannot be used to surround other points. Therefore, in an n-dimensional space, at least 2n points are not surrounded by other points with the same shape and fill color.
  • The costs for this algorithm for m points and n dimensions are shown below in expressions (9) and (10), where C(P) denotes the cost to compare the position of two points. In the worst case (i.e., expression (9)), no bind value set is sorted out. In the best case (i.e., expression (10)), all bind value sets are sorted out except for the aforementioned 2n points.
  • worst case : ( m * ( m - 1 ) ) * C ( P ) ( 9 ) best case : ( ( m - n 2 ) * ( m + n 2 - 1 2 ) ) * C ( P ) ( 10 )
  • Using the sort out algorithm to reduce the number of points to be considered for the distance calculation, the final costs for m points and n dimensions are presented below in expressions (11) and (12), in which C(D) is the cost of the distance calculation and C(P) is the cost to compare the position of two points.
  • worst case : [ ( m * ( m - 1 ) ) * C ( P ) ] + [ ( m * ( m - 1 ) 2 ) * C ( D ) ] ( 11 ) best case : [ ( ( m - n 2 ) * ( m + n 2 - 1 2 ) ) * C ( P ) ] + [ ( n 2 * ( n 2 - 1 ) 2 ) * C ( D ) ] ( 12 )
  • In the worst case (i.e., expression (11)), no point is sorted out and every point is combined two times with every other point. Comparing the worst case (i.e., expression (11)) to the original costs in expression (8) presented above, the worst case is three times as expensive as the original cost of determining the distances between all points.
  • Comparing the best case (i.e., expression (12)) with the original costs in expression (8), the number of distance computations is reduced to 2n.
  • After using the sort out algorithm to generate plan space diagram 460 in FIG. 4F, maximum distances between selectivity measurements in the plan space diagram are determined in step 410 (see FIG. 4A). The maximum selectivity distances are between bind value sets associated with different query execution plans. The line segments drawn between the larger circular or rectangular points in plan space diagram 470 in FIG. 4G indicate the maximum selectivity distances. The larger circular or rectangular points at the endpoints of a line segment in FIG. 4G represent a pair of bind value sets used in a determination of estimated execution costs in step 412 (see FIG. 4A). Hereinafter, each pair of bind value sets identified in step 410 (see FIG. 4A) is also referred to as an identified pair. Thus, the three line segments joining points in FIG. 4G identify three pairs of bind value sets. That is, a first line segment joins a first pair of bind value sets associated with Plan 1 and Plan 2, a second line segment joins a second pair of bind value sets associated with Plan 1 and Plan 3, and a third line segment joins a third pair of bind value sets associated with Plan 2 and Plan 3.
  • After identifying the pairs of bind value sets that have a maximum distance in selectivity, the differences in estimated execution costs are determined in step 414 (see FIG. 4A). Each difference determined in step 414 (see FIG. 4A) is between a first execution cost of sub-optimally executing a query with a first query execution plan that is forced to use a bind value set of an identified pair and a second execution cost of optimally executing the query with a second query execution plan using the same bind value set. The first execution plan is the plan for optimally executing the query with the other bind value set in the identified pair.
  • As the optimal execution costs for each bind value set in the identified pairs are already known from the explain information generated in step 206 (see FIG. 2), step 412 (see FIG. 4A) does not need to determine those optimal execution costs. In step 412, for each query execution plan associated with a bind value set in an identified pair, plan space analyzer 110 (see FIG. 1) determines the suboptimal execution cost of forcing the query execution plan to use the other bind value set in the identified pair. This determination of the suboptimal execution costs in step 412 is performed via explaining the query with cross forced plans. Thus, for each identified pair of bind value sets A and B, which are associated with optimal query execution plans P and Q, respectively, the query is explained in step 412 to determine a first suboptimal execution cost of using query execution plan P with bind value set B and a second suboptimal execution cost of using query execution plan Q with bind value set A.
  • In one embodiment, a database hint feature is employed in step 412 (see FIG. 4A) to force the query to use suboptimal query execution plans and subsequently determine the aforementioned suboptimal execution costs. As one example, a database hint is a syntax addition to a SQL query (e.g., embedded within a comment of the SQL query) that overrides the database optimizer and controls the execution plan of the SQL query.
  • After explaining the query with all of the identified pairs of bind value sets using cross forced plans, the differences determined between the suboptimal execution costs and the optimal costs are analyzed in step 414 (see FIG. 4A). If all of the differences do not exceed a predefined threshold value, then plan space analyzer 110 (see FIG. 1) recommends ReOpt Once as the optimization frequency for the query. That is, ReOpt Once is recommended if each suboptimal execution cost and its associated optimal execution cost determined in step 412 (see FIG. 4A) are substantially similar. The predefined threshold value takes into account the optimization costs associated with ReOpt Always.
  • If any of the differences determined in step 414 (see FIG. 4A) exceed the predefined threshold value, then plan space analyzer 110 (see FIG. 1) recommends ReOpt Always as the optimization frequency for the query. That is, ReOpt Always is recommended if any suboptimal execution cost determined in step 412 of FIG. 4A and its associated optimal execution cost determined in step 206 of FIG. 2 differ substantially.
  • In the case of the three identified pairs of bind value sets in plan space diagram 470, six suboptimal execution costs are determined in step 412 (see FIG. 4A) and six comparisons are made in step 414 to determine six differences between the suboptimal execution costs and their associated optimal execution costs.
  • Computing System
  • FIG. 5 is a computing system that includes components of the system of FIG. 1 and implements the processes of FIGS. 2, 3A and 4A, in accordance with embodiments of the present invention. Computing unit 500 is suitable for storing and/or executing program code of a system for automatically determining optimization frequencies of queries with parameter markers 514, and generally comprises a central processing unit (CPU) 502, a memory 504, an input/output (I/O) interface 506, a bus 508, I/O devices 510 and a storage unit 512. Optimization frequency determination system 514 includes, for example, extract & transform tool 106 (see FIG. 1) and plan space analyzer 110 (see FIG. 1). CPU 502 performs computation and control functions of computing unit 500. CPU 502 may comprise a single processing unit, or be distributed across one or more processing units in one or more locations (e.g., on a client and server).
  • Local memory elements of memory 504 are employed during actual execution of the program code of optimization frequency determination system 514. Cache memory elements of memory 504 provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution. Further, memory 504 may include other systems not shown in FIG. 5, such as an operating system (e.g., Linux) that runs on CPU 502 and provides control of various components within and/or connected to computing unit 500.
  • Memory 504 may comprise any known type of data storage and/or transmission media, including bulk storage, magnetic media, optical media, random access memory (RAM), read-only memory (ROM), a data cache, a data object, etc. Storage unit 512 is, for example, a magnetic disk drive or an optical disk drive that stores data. Moreover, similar to CPU 502, memory 504 may reside at a single physical location, comprising one or more types of data storage, or be distributed across a plurality of physical systems in various forms. Further, memory 504 can include data distributed across, for example, a LAN, WAN or storage area network (SAN) (not shown).
  • I/O interface 506 comprises any system for exchanging information to or from an external source. I/O devices 510 comprise any known type of external device, including a display monitor, keyboard, mouse, printer, speakers, handheld device, printer, facsimile, etc. Bus 508 provides a communication link between each of the components in computing unit 500, and may comprise any type of transmission link, including electrical, optical, wireless, etc.
  • I/O interface 506 also allows computing unit 500 to store and retrieve information (e.g., program instructions or data) from an auxiliary storage device (e.g., storage unit 512). The auxiliary storage device may be a non-volatile storage device (e.g., a CD-ROM drive which receives a CD-ROM disk). Computing unit 500 can store and retrieve information from other auxiliary storage devices (not shown), which can include a direct access storage device (DASD) (e.g., hard disk or floppy diskette), a magneto-optical disk drive, a tape drive, or a wireless communication device.
  • The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code of optimization frequency determination system 514 for use by or in connection with a computing unit 500 or any instruction execution system to provide and facilitate the capabilities of the present invention. For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, RAM 504, ROM, a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read-only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
  • The flow diagrams depicted herein are provided by way of example. There may be variations to these diagrams or the steps (or operations) described herein without departing from the spirit of the invention. For instance, in certain cases, the steps may be performed in differing order, or steps may be added, deleted or modified. All of these variations are considered a part of the present invention as recited in the appended claims.
  • While embodiments of the present invention have been described herein for purposes of illustration, many modifications and changes will become apparent to those skilled in the art. Accordingly, the appended claims are intended to encompass all such modifications and changes as fall within the true spirit and scope of this invention.

Claims (20)

1. A computer-implemented method of automatically determining an optimization frequency of a query having one or more parameter markers, said method comprising:
generating, by a computing system, a plurality of query execution plans for an execution of a query having one or more parameter markers, each query execution plan associated with one or more bind value sets of a plurality of bind values sets;
determining that no difference of a plurality of differences between pairs of execution costs exceeds a predefined threshold value or that at least one difference of said plurality of differences exceeds said predefined threshold value, each pair of execution costs including a first execution cost and a second execution cost, said first execution cost being a cost of executing said query with a bind value set of said plurality of bind value sets via a first query execution plan of said plurality of query execution plans and said second execution cost being a cost of optimally executing said query with said bind value set via a second query execution plan of said plurality of query execution plans;
automatically selecting an optimization frequency by said computing system; and
storing said optimization frequency in a computer-usable medium,
wherein said optimization frequency is optimizing said query once as a result of a first determination by said determining that no difference of said plurality of differences exceeds said predefined threshold value, and
wherein said optimization frequency is reoptimizing said query each time said query is executed as a result of a second determination by said determining that at least one difference of said plurality of differences exceeds said predefined threshold value.
2. The method of claim 1, wherein said determining includes determining that no difference of said plurality of differences exceeds said predefined threshold value, and wherein said optimization frequency is said optimizing said query once.
3. The method of claim 1, wherein said determining includes determining that at least one difference of said plurality of differences exceeds said predefined threshold value, and wherein said optimizing frequency is said reoptimizing said query each time said query is executed.
4. The method of claim 1, further comprising collecting parameter marker data by said computing system, said parameter marker data including said plurality of bind value sets, wherein said collecting said parameter marker data comprises:
extracting, from a record of a binary dump file provided by a database feedback warehouse system, query information and a first bind value set of said plurality of bind value sets, said query information including a text of said query, a timestamp indicating an execution of said query, and an amount of time taken by said execution of said query; and
storing said query information and said first bind value set in a plurality of relational database tables.
5. The method of claim 4, wherein said first bind value set includes one or more actual values of said one or more parameters or one or more selectivities of one or more bind values of said first bind value set.
6. The method of claim 1, further comprising collecting parameter marker data by said computing system, said parameter marker data including said plurality of bind value sets, wherein said collecting said parameter marker data comprises:
parsing said query with a Structured Query Language (SQL) parser;
identifying one or more database columns referred to by one or more predicates that include said one or more parameter markers;
obtaining single-column database statistics associated with said one or more database columns;
collecting said plurality of bind value sets from said single-column database statistics; and
determining selectivities of said plurality of bind value sets from said single-column database statistics.
7. The method of claim 1, further comprising collecting parameter marker data by said computing system, said parameter marker data including said plurality of bind value sets, wherein said collecting said parameter marker data comprises:
parsing said query with a Structured Query Language (SQL) parser;
identifying one or more database columns referred to by one or more predicates that include said one or more parameter markers; and
selecting, via random sampling, said plurality of bind value sets from said one or more database columns.
8. A computer program product, comprising a computer-usable medium having a computer-readable program code embodied therein, said computer-readable program code comprising an algorithm adapted to implement the method of claim 1.
9. A computing system comprising a processor coupled to a computer-readable memory unit, said memory unit comprising a software application and instructions that when executed by said processor implement a method of automatically determining an optimization frequency of a query having one or more parameter markers, said method comprising:
generating, by a computing system, a plurality of query execution plans for an execution of a query having one or more parameter markers, each query execution plan associated with one or more bind value sets of a plurality of bind values sets;
determining that no difference of a plurality of differences between pairs of execution costs exceeds a predefined threshold value or that at least one difference of said plurality of differences exceeds said predefined threshold value, each pair of execution costs including a first execution cost and a second execution cost, said first execution cost being a cost of executing said query with a bind value set of said plurality of bind value sets via a first query execution plan of said plurality of query execution plans and said second execution cost being a cost of optimally executing said query with said bind value set via a second query execution plan of said plurality of query execution plans;
automatically selecting an optimization frequency by said computing system; and
storing said optimization frequency in a computer-usable medium,
wherein said optimization frequency is optimizing said query once as a result of a first determination by said determining that no difference of said plurality of differences exceeds said predefined threshold value, and
wherein said optimization frequency is reoptimizing said query each time said query is executed as a result of a second determination by said determining that at least one difference of said plurality of differences exceeds said predefined threshold value.
10. The computing system of claim 9, wherein said determining includes determining that no difference of said plurality of differences exceeds said predefined threshold value, and wherein said optimization frequency is said optimizing said query once.
11. The computing system of claim 9, wherein said determining includes determining that at least one difference of said plurality of differences exceeds said predefined threshold value, and wherein said optimizing frequency is said reoptimizing said query each time said query is executed.
12. The computing system of claim 9, wherein said method further comprises collecting parameter marker data by said computing system, said parameter marker data including said plurality of bind value sets, wherein said collecting said parameter marker data comprises:
extracting, from a record of a binary dump file provided by a database feedback warehouse system, query information and a first bind value set of said plurality of bind value sets, said query information including a text of said query, a timestamp indicating an execution of said query, and an amount of time taken by said execution of said query; and
storing said query information and said first bind value set in a plurality of relational database tables.
13. The computing system of claim 12, wherein said first bind value set includes one or more actual values of said one or more parameters or one or more selectivities of one or more bind values of said first bind value set.
14. The computing system of claim 9, wherein said method further comprises collecting parameter marker data by said computing system, said parameter marker data including said plurality of bind value sets, wherein said collecting said parameter marker data comprises:
parsing said query with a Structured Query Language (SQL) parser;
identifying one or more database columns referred to by one or more predicates that include said one or more parameter markers;
obtaining single-column database statistics associated with said one or more database columns;
collecting said plurality of bind value sets from said single-column database statistics; and
determining selectivities of said plurality of bind value sets from said single-column database statistics.
15. The computing system of claim 9, wherein said method further comprises collecting parameter marker data by said computing system, said parameter marker data including said plurality of bind value sets, wherein said collecting said parameter marker data comprises:
parsing said query with a Structured Query Language (SQL) parser;
identifying one or more database columns referred to by one or more predicates that include said one or more parameter markers; and
selecting, via random sampling, said plurality of bind value sets from said one or more database columns.
16. A computer-implemented method of determining an optimization frequency of a query having one or more parameter markers, said method comprising:
obtaining, by a computing system, a plurality of bind value sets, each bind value set including one or more bind values and associated with one or more parameter markers of a query;
obtaining, by said computing system, a plurality of measurement sets associated with said bind value sets in a one-to-one correspondence, each measurement set selected from the group consisting of one or more selectivity measurements and one or more cardinality measurements;
determining, by said computing system, a plurality of query execution plans, each query execution plan capable of optimally executing said query with one or more bind value sets of said plurality of bind value sets;
determining, by said computing system, a first set of execution costs associated with said query execution plans of said plurality of query execution plans in a one-to-one correspondence, each execution cost of said first set being a cost of optimally executing said query with a bind value set of said plurality of bind value sets;
determining, by said computing system, one or more pairs of bind value sets (p1, . . . , pn)i, (q1, . . . , qn)i of said plurality of bind value sets, said determining said one or more pairs of bind value sets including determining one or more distances di between a first measurement set S1 i associated with said bind value set (p1, . . . , pn)i and a second measurement set S2 i associated with said (q1, . . . , qn)i, said S1 i and said S2 i included in said plurality of measurement sets, wherein each distance di is a maximum distance between any pair of measurement sets associated with query execution plans Pi and Qi of said plurality of query execution plans, wherein said query execution plan Pi is an optimal query execution plan associated with said bind value set (p1, . . . , pn)i and said query execution plan Qi is an optimal query execution plan associated with said bind value set (q1, . . . , qn)i, and wherein said i≧1;
determining, by said computing system, one or more pairs of execution costs C1 i, C2 i of a second set of execution costs, wherein said C1 i is a cost of executing said query via said query execution plan Pi with bind value set (q1, . . . , qn)i and said C2 i is a cost of executing said query via said query execution plan Qi with bind value set (p1, . . . , pn)i;
determining, by said computing system, one or more pairs of differences D1 i and D2 i, wherein said D1 i is a difference between said cost C1 i and an optimal execution cost OC1 i of said first set of execution costs and said D2 i is a difference between said cost C2 i and an optimal execution cost OC2 i of said first set of execution costs, wherein said OC1 i is a cost of optimally executing said query via said query execution plan Qi with bind value set (q1, . . . , qn)i, and said OC2 i is a cost of optimally executing said query via said query execution plan Pi with bind value set (p1, . . . , pn)i;
automatically selecting, by said computing system, an optimization frequency, wherein said optimization frequency is selected from the group consisting of optimizing said query once and reoptimizing said query each time said query is executed; and
storing said optimization frequency in a computer-usable medium,
wherein said optimization frequency is said optimizing said query once as a result of a first determination, via said determining said one or more pairs of differences, that no difference of said one or more pairs of differences exceeds a predefined threshold value, and
wherein said optimization frequency is said reoptimizing said query each time said query is executed as a result of a second determination, via said determining said one or more pairs of differences, that at least one difference of said one or more pairs of differences exceeds said predefined threshold value.
17. The method of claim 16, wherein each measurement set of said plurality of measurement sets is one or more selectivity measurements.
18. The method of claim 16, wherein each measurement set of said plurality of measurement sets is one or more cardinality measurements.
19. The method of claim 16, wherein said determining said one or more pairs of execution costs C1 i, C2 i of said second set of execution costs includes:
using a first database hint to force said query to use said query execution plan Pi with bind value set (q1, . . . , qn)i; and
using a second database hint to force said query to use said query execution plan Qi with bind value set (p1, . . . , pn)i.
20. A computer program product, comprising a computer-usable medium having a computer-readable program code embodied therein, said computer-readable program code comprising an algorithm adapted to implement the method of claim 16.
US12/125,146 2007-02-09 2008-05-22 Automatically determining optimization frequencies of queries with parameter markers Expired - Fee Related US7987178B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/125,146 US7987178B2 (en) 2007-02-09 2008-05-22 Automatically determining optimization frequencies of queries with parameter markers

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/673,142 US20080195578A1 (en) 2007-02-09 2007-02-09 Automatically determining optimization frequencies of queries with parameter markers
US12/125,146 US7987178B2 (en) 2007-02-09 2008-05-22 Automatically determining optimization frequencies of queries with parameter markers

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/673,142 Continuation US20080195578A1 (en) 2007-02-09 2007-02-09 Automatically determining optimization frequencies of queries with parameter markers

Publications (2)

Publication Number Publication Date
US20080222092A1 true US20080222092A1 (en) 2008-09-11
US7987178B2 US7987178B2 (en) 2011-07-26

Family

ID=39686724

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/673,142 Abandoned US20080195578A1 (en) 2007-02-09 2007-02-09 Automatically determining optimization frequencies of queries with parameter markers
US12/125,146 Expired - Fee Related US7987178B2 (en) 2007-02-09 2008-05-22 Automatically determining optimization frequencies of queries with parameter markers

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/673,142 Abandoned US20080195578A1 (en) 2007-02-09 2007-02-09 Automatically determining optimization frequencies of queries with parameter markers

Country Status (1)

Country Link
US (2) US20080195578A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080195578A1 (en) * 2007-02-09 2008-08-14 Fabian Hueske Automatically determining optimization frequencies of queries with parameter markers
US20090319477A1 (en) * 2008-06-23 2009-12-24 Oracle International Corporation Performing cost-based optimizations of authorization checks in database systems
US20100161593A1 (en) * 2008-12-23 2010-06-24 Andrew Paulsen Graphical result set representation and manipulation
US20110060731A1 (en) * 2009-09-04 2011-03-10 Al-Omari Awny K System and method for optimizing queries
US20170024433A1 (en) * 2014-04-29 2017-01-26 Hewlett Packard Enterprise Development Lp Query plan post optimization analysis and reoptimization
US11023476B2 (en) 2016-07-22 2021-06-01 International Business Machines Corporation Testing pairings to determine whether they are publically known
US11334538B2 (en) * 2019-05-31 2022-05-17 Microsoft Technology Licensing, Llc System and method for cardinality estimation feedback loops in query processing

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8805879B2 (en) * 2008-03-11 2014-08-12 Virtualagility Inc. Techniques for integrating parameterized information request into a system for collaborative work
KR101083563B1 (en) * 2009-04-24 2011-11-14 엔에이치엔비즈니스플랫폼 주식회사 Method and System for Managing Database
US8332388B2 (en) 2010-06-18 2012-12-11 Microsoft Corporation Transformation rule profiling for a query optimizer
US8219575B2 (en) * 2010-11-12 2012-07-10 Business Objects Software Ltd. Method and system for specifying, preparing and using parameterized database queries
CN103136260A (en) 2011-11-30 2013-06-05 国际商业机器公司 Method and device for applying filtration factor assessment in optimization of access path in database
US8812489B2 (en) * 2012-10-08 2014-08-19 International Business Machines Corporation Swapping expected and candidate affinities in a query plan cache
US9892159B2 (en) * 2013-03-14 2018-02-13 Microsoft Technology Licensing, Llc Distance-based logical exploration in a relational database query optimizer
US9251213B2 (en) * 2013-03-15 2016-02-02 International Business Machines Corporation Estimating error propagation for database optimizers
US9384236B2 (en) * 2013-06-14 2016-07-05 Sap Se Method and system for operating on database queries
US10019478B2 (en) * 2013-09-05 2018-07-10 Futurewei Technologies, Inc. Mechanism for optimizing parallel execution of queries on symmetric resources
US11061898B2 (en) * 2016-09-15 2021-07-13 Sap Se Uncertainty-aware selection of query execution plan
US10558663B2 (en) * 2017-03-30 2020-02-11 Sap Se Automated application of query hints
US11372858B2 (en) * 2017-05-18 2022-06-28 Oracle International Corporation Estimated query performance
US11875386B2 (en) * 2020-06-24 2024-01-16 Teradata Us, Inc. Estimating as-a-service query prices within optimizer explained plans

Citations (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5412804A (en) * 1992-04-30 1995-05-02 Oracle Corporation Extending the semantics of the outer join operator for un-nesting queries to a data base
US5544355A (en) * 1993-06-14 1996-08-06 Hewlett-Packard Company Method and apparatus for query optimization in a relational database system having foreign functions
US5819255A (en) * 1996-08-23 1998-10-06 Tandem Computers, Inc. System and method for database query optimization
US5822747A (en) * 1996-08-23 1998-10-13 Tandem Computers, Inc. System and method for optimizing database queries
US6021405A (en) * 1996-08-23 2000-02-01 Tandem Computers, Inc. System and method for optimizing database queries with improved performance enhancements
US6061676A (en) * 1996-05-29 2000-05-09 Lucent Technologies Inc. Effecting constraint magic rewriting on a query with the multiset version of the relational algebric theta-semijoin operator
US6205441B1 (en) * 1999-03-31 2001-03-20 Compaq Computer Corporation System and method for reducing compile time in a top down rule based system using rule heuristics based upon the predicted resulting data flow
US6219660B1 (en) * 1997-09-30 2001-04-17 International Business Machines Corporation Access path selection for SQL with variables in a RDBMS
US6330552B1 (en) * 1998-09-28 2001-12-11 Compaq Database query cost model optimizer
US6351742B1 (en) * 1999-03-18 2002-02-26 Oracle Corporation Method and mechanism for database statement optimization
US6353818B1 (en) * 1998-08-19 2002-03-05 Ncr Corporation Plan-per-tuple optimizing of database queries with user-defined functions
US6356891B1 (en) * 2000-04-20 2002-03-12 Microsoft Corporation Identifying indexes on materialized views for database workload
US6510428B2 (en) * 1993-11-16 2003-01-21 Hitachi, Ltd. Method and system of database divisional management for parallel database system
US6529901B1 (en) * 1999-06-29 2003-03-04 Microsoft Corporation Automating statistics management for query optimizers
US6735694B1 (en) * 1997-11-21 2004-05-11 International Business Machines Corporation Method and system for certifying authenticity of a web page copy
US6735594B1 (en) * 2000-06-23 2004-05-11 International Business Machines Corporation Transparent parameter marker support for a relational database over a network
US6757670B1 (en) * 1999-09-13 2004-06-29 Hitachi, Ltd. Method and system for query processing
US6865567B1 (en) * 1999-07-30 2005-03-08 Basantkumar John Oommen Method of generating attribute cardinality maps
US20050065921A1 (en) * 2003-09-22 2005-03-24 International Business Machines Corporation System and method for performing a query in a computer system to retrieve data from a database
US20050071346A1 (en) * 2003-09-26 2005-03-31 International Business Machines Corporation Method, system, and program for optimized parameter binding
US20050097078A1 (en) * 2003-10-31 2005-05-05 Lohman Guy M. System, method, and computer program product for progressive query processing
US6928451B2 (en) * 2001-11-14 2005-08-09 Hitachi, Ltd. Storage system having means for acquiring execution information of database management system
US20050192951A1 (en) * 2001-10-25 2005-09-01 International Business Machines Corporation Method and apparatus for optimizing queries in a logically partitioned computer system
US20050223019A1 (en) * 2004-03-31 2005-10-06 Microsoft Corporation Block-level sampling in statistics estimation
US6957211B1 (en) * 2002-05-06 2005-10-18 Oracle International Corporation Query optimizer cost model
US20050267866A1 (en) * 2004-05-28 2005-12-01 Markl Volker G Determining validity ranges of query plans based on suboptimality
US20060074875A1 (en) * 2004-09-30 2006-04-06 International Business Machines Corporation Method and apparatus for predicting relative selectivity of database query conditions using respective cardinalities associated with different subsets of database records
US7051034B1 (en) * 2002-12-18 2006-05-23 Oracle International Corporation Dynamic optimization for processing a restartable sub-tree of a query execution plan
US20060136396A1 (en) * 2004-12-22 2006-06-22 Ncr Corporation Self-adjusting database-query optimizer
US20060212429A1 (en) * 2005-03-17 2006-09-21 Microsoft Corporation Answering top-K selection queries in a relational engine
US20070179947A1 (en) * 2004-06-22 2007-08-02 Oracle International Corporation Efficient interaction among cost-based transformations
US20070192296A1 (en) * 2005-11-10 2007-08-16 Louis Burger Managing the execution of a query
US7299226B2 (en) * 2003-06-19 2007-11-20 Microsoft Corporation Cardinality estimation of joins
US20080052266A1 (en) * 2006-08-25 2008-02-28 Microsoft Corporation Optimizing parameterized queries in a relational database management system
US20080109424A1 (en) * 2006-11-08 2008-05-08 Paul Reuben Day Method of Querying Relational Database Management Systems
US20080195577A1 (en) * 2007-02-09 2008-08-14 Wei Fan Automatically and adaptively determining execution plans for queries with parameter markers
US20080228831A1 (en) * 2004-12-17 2008-09-18 International Business Machines Method, system and program for prioritizing maintenance of database tables

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000055755A2 (en) 1999-03-18 2000-09-21 Oracle Corporation Method for extending native optimization in a database system
US8775412B2 (en) 2004-01-08 2014-07-08 International Business Machines Corporation Method and system for a self-healing query access plan
US20080195578A1 (en) * 2007-02-09 2008-08-14 Fabian Hueske Automatically determining optimization frequencies of queries with parameter markers

Patent Citations (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5412804A (en) * 1992-04-30 1995-05-02 Oracle Corporation Extending the semantics of the outer join operator for un-nesting queries to a data base
US5544355A (en) * 1993-06-14 1996-08-06 Hewlett-Packard Company Method and apparatus for query optimization in a relational database system having foreign functions
US6510428B2 (en) * 1993-11-16 2003-01-21 Hitachi, Ltd. Method and system of database divisional management for parallel database system
US6061676A (en) * 1996-05-29 2000-05-09 Lucent Technologies Inc. Effecting constraint magic rewriting on a query with the multiset version of the relational algebric theta-semijoin operator
US5819255A (en) * 1996-08-23 1998-10-06 Tandem Computers, Inc. System and method for database query optimization
US5822747A (en) * 1996-08-23 1998-10-13 Tandem Computers, Inc. System and method for optimizing database queries
US6021405A (en) * 1996-08-23 2000-02-01 Tandem Computers, Inc. System and method for optimizing database queries with improved performance enhancements
US6219660B1 (en) * 1997-09-30 2001-04-17 International Business Machines Corporation Access path selection for SQL with variables in a RDBMS
US6735694B1 (en) * 1997-11-21 2004-05-11 International Business Machines Corporation Method and system for certifying authenticity of a web page copy
US6353818B1 (en) * 1998-08-19 2002-03-05 Ncr Corporation Plan-per-tuple optimizing of database queries with user-defined functions
US6330552B1 (en) * 1998-09-28 2001-12-11 Compaq Database query cost model optimizer
US6351742B1 (en) * 1999-03-18 2002-02-26 Oracle Corporation Method and mechanism for database statement optimization
US6205441B1 (en) * 1999-03-31 2001-03-20 Compaq Computer Corporation System and method for reducing compile time in a top down rule based system using rule heuristics based upon the predicted resulting data flow
US6529901B1 (en) * 1999-06-29 2003-03-04 Microsoft Corporation Automating statistics management for query optimizers
US6865567B1 (en) * 1999-07-30 2005-03-08 Basantkumar John Oommen Method of generating attribute cardinality maps
US6757670B1 (en) * 1999-09-13 2004-06-29 Hitachi, Ltd. Method and system for query processing
US6356891B1 (en) * 2000-04-20 2002-03-12 Microsoft Corporation Identifying indexes on materialized views for database workload
US6735594B1 (en) * 2000-06-23 2004-05-11 International Business Machines Corporation Transparent parameter marker support for a relational database over a network
US20050192951A1 (en) * 2001-10-25 2005-09-01 International Business Machines Corporation Method and apparatus for optimizing queries in a logically partitioned computer system
US6928451B2 (en) * 2001-11-14 2005-08-09 Hitachi, Ltd. Storage system having means for acquiring execution information of database management system
US6957211B1 (en) * 2002-05-06 2005-10-18 Oracle International Corporation Query optimizer cost model
US7051034B1 (en) * 2002-12-18 2006-05-23 Oracle International Corporation Dynamic optimization for processing a restartable sub-tree of a query execution plan
US7299226B2 (en) * 2003-06-19 2007-11-20 Microsoft Corporation Cardinality estimation of joins
US20050065921A1 (en) * 2003-09-22 2005-03-24 International Business Machines Corporation System and method for performing a query in a computer system to retrieve data from a database
US20050071346A1 (en) * 2003-09-26 2005-03-31 International Business Machines Corporation Method, system, and program for optimized parameter binding
US20050097078A1 (en) * 2003-10-31 2005-05-05 Lohman Guy M. System, method, and computer program product for progressive query processing
US20050223019A1 (en) * 2004-03-31 2005-10-06 Microsoft Corporation Block-level sampling in statistics estimation
US20050267866A1 (en) * 2004-05-28 2005-12-01 Markl Volker G Determining validity ranges of query plans based on suboptimality
US20080215531A1 (en) * 2004-05-28 2008-09-04 International Business Machines Corporation Determining validity ranges of query plans based on suboptimality
US20070179947A1 (en) * 2004-06-22 2007-08-02 Oracle International Corporation Efficient interaction among cost-based transformations
US20060074875A1 (en) * 2004-09-30 2006-04-06 International Business Machines Corporation Method and apparatus for predicting relative selectivity of database query conditions using respective cardinalities associated with different subsets of database records
US20080228831A1 (en) * 2004-12-17 2008-09-18 International Business Machines Method, system and program for prioritizing maintenance of database tables
US20060136396A1 (en) * 2004-12-22 2006-06-22 Ncr Corporation Self-adjusting database-query optimizer
US20060212429A1 (en) * 2005-03-17 2006-09-21 Microsoft Corporation Answering top-K selection queries in a relational engine
US20070192296A1 (en) * 2005-11-10 2007-08-16 Louis Burger Managing the execution of a query
US20080052266A1 (en) * 2006-08-25 2008-02-28 Microsoft Corporation Optimizing parameterized queries in a relational database management system
US20080109424A1 (en) * 2006-11-08 2008-05-08 Paul Reuben Day Method of Querying Relational Database Management Systems
US20080195577A1 (en) * 2007-02-09 2008-08-14 Wei Fan Automatically and adaptively determining execution plans for queries with parameter markers
US20080222093A1 (en) * 2007-02-09 2008-09-11 Wei Fan Automatically and adaptively determining execution plans for queries with parameter markers

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080195578A1 (en) * 2007-02-09 2008-08-14 Fabian Hueske Automatically determining optimization frequencies of queries with parameter markers
US7987178B2 (en) 2007-02-09 2011-07-26 International Business Machines Corporation Automatically determining optimization frequencies of queries with parameter markers
US8392405B2 (en) * 2008-06-23 2013-03-05 Oracle International Corporation Performing cost-based optimizations of authorization checks in database systems
US20090319477A1 (en) * 2008-06-23 2009-12-24 Oracle International Corporation Performing cost-based optimizations of authorization checks in database systems
US20100161593A1 (en) * 2008-12-23 2010-06-24 Andrew Paulsen Graphical result set representation and manipulation
US8438177B2 (en) * 2008-12-23 2013-05-07 Apple Inc. Graphical result set representation and manipulation
US20110060731A1 (en) * 2009-09-04 2011-03-10 Al-Omari Awny K System and method for optimizing queries
US8380699B2 (en) * 2009-09-04 2013-02-19 Hewlett-Packard Development Company, L.P. System and method for optimizing queries
US20170024433A1 (en) * 2014-04-29 2017-01-26 Hewlett Packard Enterprise Development Lp Query plan post optimization analysis and reoptimization
US11023476B2 (en) 2016-07-22 2021-06-01 International Business Machines Corporation Testing pairings to determine whether they are publically known
US11023475B2 (en) 2016-07-22 2021-06-01 International Business Machines Corporation Testing pairings to determine whether they are publically known
US11334538B2 (en) * 2019-05-31 2022-05-17 Microsoft Technology Licensing, Llc System and method for cardinality estimation feedback loops in query processing
US20220237162A1 (en) * 2019-05-31 2022-07-28 Microsoft Technology Licensing, Llc System and method for cardinality estimation feedback loops in query processing

Also Published As

Publication number Publication date
US7987178B2 (en) 2011-07-26
US20080195578A1 (en) 2008-08-14

Similar Documents

Publication Publication Date Title
US7987178B2 (en) Automatically determining optimization frequencies of queries with parameter markers
US5664171A (en) System and method for query optimization using quantile values of a large unordered data set
US6801903B2 (en) Collecting statistics in a database system
US8078652B2 (en) Virtual columns
US7797286B2 (en) System and method for externally providing database optimizer statistics
US7136850B2 (en) Self tuning database retrieval optimization using regression functions
US7984024B2 (en) Statistics management
US7877373B2 (en) Executing alternative plans for a SQL statement
US8914354B2 (en) Cardinality and selectivity estimation using a single table join index
US20130198165A1 (en) Generating statistical views in a database system
US7509311B2 (en) Use of statistics on views in query optimization
US8122046B2 (en) Method and apparatus for query rewrite with auxiliary attributes in query processing operations
US20070061287A1 (en) Method, apparatus and program storage device for optimizing a data warehouse model and operation
US20050187917A1 (en) Method for index tuning of a SQL statement, and index merging for a multi-statement SQL workload, using a cost-based relational query optimizer
US20090018992A1 (en) Management of interesting database statistics
US20040225639A1 (en) Optimizer dynamic sampling
US8527502B2 (en) Method, system and computer-readable media for software object relationship traversal for object-relational query binding
US9471617B2 (en) Schema evolution via transition information
US20070208695A1 (en) Selective automatic refreshing of stored execution plans
US20030135485A1 (en) Method and system for rowcount estimation with multi-column statistics and histograms
CA2378582A1 (en) Real-time database object statistics collection
US11113277B2 (en) System and method for real-time materialized view maintenance
US12111805B2 (en) Automatic data store architecture detection
US20030167275A1 (en) Computation of frequent data values
US6272486B1 (en) Determining the optimal number of tasks for building a database index

Legal Events

Date Code Title Description
REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Expired due to failure to pay maintenance fee

Effective date: 20150726