US20080168058A1 - System for addressing inefficient query processing - Google Patents

System for addressing inefficient query processing Download PDF

Info

Publication number
US20080168058A1
US20080168058A1 US12052063 US5206308A US2008168058A1 US 20080168058 A1 US20080168058 A1 US 20080168058A1 US 12052063 US12052063 US 12052063 US 5206308 A US5206308 A US 5206308A US 2008168058 A1 US2008168058 A1 US 2008168058A1
Authority
US
Grant status
Application
Patent type
Prior art keywords
query
system
queries
performance
predicate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12052063
Inventor
Mark R. Gordon
Original Assignee
Gordon Mark R
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor ; File system structures therefor in structured data stores
    • G06F17/30587Details of specialised database models
    • G06F17/30592Multi-dimensional databases and data warehouses, e.g. MOLAP, ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor ; File system structures therefor in structured data stores
    • G06F17/30289Database design, administration or maintenance
    • G06F17/30306Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor ; File system structures therefor in structured data stores
    • G06F17/30386Retrieval requests
    • G06F17/30424Query processing
    • G06F17/30442Query optimisation
    • G06F17/30448Query rewriting and transformation
    • G06F17/30463Plan optimisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99932Access augmentation or optimizing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching

Abstract

A system for analyzing predicate usage and proposing solutions to improve query performance. The system comprises a system for collecting performance statistics for database queries; an aggregation system that organizes the performance statistics based on predicate combinations used in the database queries; and an evaluation system that compares the performances of queries having similar predicate combinations. Further features include a problem identification system that identifies a predicate, which when added to a query, results in a low performing query, and a solution proposal system that automatically proposes a solution to address the low performing query.

Description

  • [0001]
    This continuation application claims priority to co-pending U.S. patent application Ser. No. 10/746,961 entitled SYSTEM AND METHOD FOR ADDRESSING INEFFICIENT QUERY PROCESSING, filed on Dec. 24, 2003, the contents of which are hereby incorporated by reference in their entirety.
  • BACKGROUND OF THE INVENTION
  • [0002]
    1. Technical Field
  • [0003]
    The present invention relates generally to query processing, and more specifically relates to a system and method for identifying predicates causing inefficient query processing, and proposing and implementing solutions to improve query performance.
  • [0004]
    2. Related Art
  • [0005]
    In an environment such as SAP BW (or any environment using star schema or complex SQL), determining the cause of slow query performance can be difficult. For instance, individual queries must be identified for analysis, and then the SQL must be captured, explained, and analyzed to determine the possible reason for the performance problem.
  • [0006]
    When end-users use drill-down in queries, additional predicates are added to the query SQL to support the grouping or selection in the drill-down. Predicates may include any element of a search condition that expresses or implies a comparison operation. This means that a user drill-down creates SQL with similar sets of predicates, which are usually based on a core set of required predicates packaged in the query. The addition (or removal) of predicates can cause changes in query performance. If, for example, a predicate which would filter well is added to a query, but the predicate is not applied early in query processing (due to missing catalog statistics, missing indexes, optimizer problems, etc.), then an automated method of finding the predicate(s) associated with the slow performance can help to reduce the effort necessary to solve the problem.
  • [0007]
    Unfortunately, there exist no automated query tools capable of determining how predicate usage impacts query performance. Accordingly, a need exists for a system and method that can both analyze predicate usage and propose solutions to improve query performance.
  • SUMMARY OF THE INVENTION
  • [0008]
    The present invention addresses the above-mentioned problems, as well as others, by providing a system and method for analyzing predicate usage and proposing solutions to improve query performance. In a first aspect, the invention provides a query predicate analysis system comprising: a system for collecting performance statistics for database queries; an aggregation system that organizes the performance statistics based on predicate combinations used in the database queries; and an evaluation system that compares the performances of queries having similar predicate combinations. Further features include a problem identification system that identifies a predicate, which when added to a query, results in a low performing query; and a solution proposal system that automatically proposes a solution to address the low performing query.
  • [0009]
    In a second aspect, the invention provides a method of analyzing query performance based on predicate usage, comprising: collecting performance statistics for database queries; organizing the performance statistics based on predicate combinations of the database queries; and analyzing performances of queries having similar predicate combinations.
  • [0010]
    In a third aspect, the invention provides a program product stored on a recordable medium for analyzing query performance based on predicate usage, comprising: means for collecting performance statistics for database queries; means for organizing the performance statistics based on predicate combinations of the database queries; and means for analyzing the performances of queries having similar predicate combinations.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • [0011]
    These and other features of this invention will be more readily understood from the following detailed description of the various aspects of the invention taken in conjunction with the accompanying drawings in which:
  • [0012]
    FIG. 1 depicts a query predicate analysis system in accordance with the present invention.
  • [0013]
    FIGS. 2-6 depict a set of tables related to an overview example.
  • [0014]
    FIGS. 7-15 depict a set of tables related to an SAP example.
  • DETAILED DESCRIPTION OF THE INVENTION
  • [0015]
    Referring now to the drawings, FIG. 1 depicts a query predicate analysis system 10 that operates in conjunction with a database system 11. It should be noted that system 10 could be integrated into the database system 11, or be implemented as a separate stand-alone system. Query predicate analysis system 10 examines query data 24 to improve query performance in database system 11. In particular, system 10 analyzes execution data associated with queries (e.g., query processing rate, local predicates, etc.) and compares statistics for queries having similar characteristics. More specifically, queries referencing the same fact table, and having similar predicates are analyzed and compared. System 10 then determines which predicate combinations are associated with a slowdown in query processing rates. That is, system 10 identifies predicates which when added to a query cause the new query to run more slowly than the original query. Potential problems can then be identified, and solutions can be proposed. Thus, decisions regarding candidates for changes (e.g., predicate columns, columns in tables, etc., which may need additional statistics or indexes) are made based on symptoms of problems, as indicated by query performance statistics.
  • [0016]
    The metrics used by system 10 to determine good or bad query performances may include any commonly used performance statistics. A unique feature of this process is that system 10 utilizes a data-mining process that correlates unique queries (that is, unique sets of local predicates) with their performance characteristics, and then looks at the performance of similar queries. If a slow query has predicates (e.g., ABC) that are a superset of a fast query (e.g., AB), this process examines the difference between the two sets to find the cause of the performance degradation, since the introduction of the new predicate must be related to the cause of the performance degradation
  • [0017]
    It is understood that while the invention and exemplary embodiments are described generally with reference to query processing in SAP BW running, e.g., on DB2/390, the invention could be extended to any DB system that maintains statistics similar to those described below. Namely, the present invention recognizes that an important indicator of performance problems (i.e., slow selection rate per second) is the same on any platform. Accordingly, while the specific actions taken to address the identified problem might depend on the DB system being used, the general concepts described herein to identify the problem could be applied to any DB system.
  • [0018]
    The process utilizes several commonly used performance related statistics for queries, including:
  • [0019]
    (1) count(*)—rows that satisfy the predicates;
  • [0020]
    (2) query elapsed time;
  • [0021]
    (3) getpages—(GP or equivalent metric for DB data/index pages referenced); and
  • [0022]
    (4) rows processed—rows returned to the application after GROUP BY.
  • [0000]
    From the above, one derives two more commonly used metrics:
  • [0023]
    (5) query selection rate=count(*)/query elapsed time—this is the key indicator of query performance; and
  • [0024]
    (6) pages per row selected=getpages/count(*).
  • [0025]
    (Note that these statistics are currently not all available via a single interface from DB2 or SAP. At this time, they are individually available, e.g., from catalog statistics, DB2 ifcid 318, SAP query statistics, etc.).
  • Exemplary Overview
  • [0026]
    Consider the following example involving an exemplary star schema object (taken from the SAP BW star schema model) used in the queries below:
      • F fact table with columns DIMID1, DIMID2, DIMID3, f_column1, f_column2
      • D1—dimension table with columns DIMID1, M1SID1, M2SID2
      • D2—dimension table with columns DIMID2, M3SID3, M4SID4
      • D3—dimension table with columns DIMID3, M5SID5, M6SID6
      • M1—master data table with columns MISID1, col1, col1a, . . .
      • M2—master data table with columns M2SID2, col2, col2a, . . . Etc.
      • M6—master data table with columns M6SID6, col6, col6a, . . .
  • [0034]
    The DIMIDx columns join the Fact to Dimension tables, and the MxSIDx columns join master data to dimension tables.
  • [0035]
    In a first step, query data 24 is gathered for each query executed, and includes information about performance and the local predicates in the SQL. For example, if the query is:
  • [0000]
    select f_column1 from F where F.DIMID1=D1.DIMID1 and F.DIMID2=D2.DIMID2 and D1.M1SID1=M1.M1SID1
  • [0036]
    and D2.M3SID3=M3.M3SID3 and M1.col1a=‘findone’ and M3.col3a=‘findtwo’ group by M1.col1, M3.col3,
  • [0037]
    then FIG. 2 depicts an example of the query data 24 saved for each query. Since the column list must contain the set of all columns that appear in either local predicates, or in the GROUP BY clause, when columns appear in the GROUP BY, but not in the local predicates, as here:
  • [0000]
    select f_column1 from F where F.DIMID1=D1.DIMID1 and F.DIMID2=D2.DIMID2 and D1.M1SID1=M1.M1SID1
  • [0038]
    and D2.M3SID3=M3.M3SID3 and M1.col1a=‘findone’ and M3.col3a=‘findtwo’
  • [0039]
    group by M1.col1, M3.col3, M4.col4.
  • [0040]
    The result would be noted in the query performance statistics as shown in FIG. 3. These raw statistics are then aggregated by aggregation system 12, which lists unique sets of local predicates and statistics for each unique predicate combination, such as that shown in FIG. 4. Thus, as shown in FIG. 4, performance statistics are organized by aggregation system 12 based on predicate combinations (e.g., columns).
  • [0041]
    Next, predicate evaluation system 14 compares queries to determine the performance differences when predicates are added to queries. In this example, statistics are gathered for queries that differ by one predicate. That is, the predicates of one query are a proper subset of the predicates in another query, and the superset contains only one additional predicate. This comparison could be done with other variants, such as comparing queries that differ by up to N predicates, or for queries that differ by only one added dimension. If there were more differences, then there would be more hypothetical fixes to be tested, as described below.
  • [0042]
    An ID number column as shown in FIG. 5-7 is added for reference purposes. The other columns are taken or computed from the summarized statistics shown in FIG. 4. Two new values are calculated:
  • [0043]
    (a) Time, which is the sum of time for queries containing the added predicate; and
  • [0044]
    (b) Estimated improvement, which is an estimate for how much time could be saved for queries with the added predicate, if they ran at the same count/second rate as the queries without the added predicate.
  • [0045]
    Estimated improvement=Time−(((count/sec with added predicate)/(count/second))*Time).
  • [0046]
    Since there is additional database (db) activity needed to process the added predicate, the estimated improvement would likely be an overstatement of the improvement, but it is a reasonable starting point for evaluating the impact.
  • [0047]
    As an alternative, statistics could be evaluated by grouping the queries by sets of tables/columns in the predicates, without operators, such as that shown in FIG. 6. If there were few queries, this could help to find patterns in the added predicate more easily. The disadvantage is that there would be more variation in the queries grouped together.
  • [0048]
    Next, for each fact table depicted in FIG. 5, queries can be ordered by “estimated improvement.” Then, problem identification system 16 and solution proposal system 18 can evaluate possible causes and changes for each fact table starting with queries with the largest “estimated improvement.” Problem identification system 16 can utilize any logic to identify the cause of a slowdown. For instance, if count/second has a large decrease when a predicate is added to a query, this can point to an issue such as that the new predicate filters well, but is not applied early enough in the join sequence. This can be confirmed by checking whether the GP/count has a large increase. If there is a large increase in GP/count, it means that the database system 11 had to search more pages to find the qualifying rows. The query statistics in FIG. 5, ID number 1, are an example of this.
  • [0049]
    In one exemplary embodiment, problem identification system 16 may select the identified problem from a “problem list.” An exemplary problem list for a slowdown in count/second may, e.g., include the following:
  • [0050]
    (1) missing or insufficient statistics on the added column (so database system 11 cannot make the best optimizer choice);
  • [0051]
    (2) missing index on new column (again, database system 11 cannot make the best optimizer choice);
  • [0052]
    (3) missing multi-column index on the fact table (in the case there are multiple filtering predicates in different dimensions);
  • [0053]
    (4) predicate operator is not index matching (e.g., “status not delivered,” rather than “status in (backordered, open)”); and
  • [0054]
    (5) there is correlation between the added predicate column and a column in the base comparison set of predicates.
  • [0055]
    Next, solution proposal system 18 could propose possible solutions. Changes could be proposed for administrators, or using feedback outlined below, could be automatically added and tested. Having found a predicate (or table.column) which when added slows the query, problem identification system 16 and solution proposal system 18 can make a hypothesis as to the cause based on the possible problems listed above (or other possible reasons) and then implement a solution. For example, check for column statistics in DB catalog, and create them if they do not exist, or check for indexes containing the column, and add one if it does not exist, and so on.
  • [0056]
    A feedback system 20 can be utilized to assess the efficacy of an implemented solution. As feedback, after the proposed fix has been implemented, query statistics as shown in FIG. 4 for the period before and after the change can be compared. If the count/second rate improves, then the proposed fix would be retained. If the performance did not improve, then the proposed fix would be removed, and another proposed fix could be implemented and evaluated, based on the list of possible solutions to the problem, as outlined above.
  • [0057]
    A regression testing system 22 could also be implemented to assess the overall impact of an implemented solution. As a regression test, one would need to evaluate the performance statistics of queries referencing the changed element. If a new column index on a master data table were created, all queries referencing that column (found in the local predicates in FIG. 4) would be examined. Likewise, if a new multi-column fact table index were created, all queries using the dimensions in the index (found in the dimensions column in FIG. 4) would be examined to determine if they improved, or ran slower.
  • Exemplary SAP Implementation
  • [0058]
    SAP currently provides, via its RSA1 transaction, a method of proposing aggregates (summary tables) based on characteristics (i.e., predicates) used in queries. Characteristics are columns in dimension tables or master data tables of the star schema. Such summary tables can help to improve performance of frequently executed queries that use characteristics matching the aggregates.
  • [0059]
    In an ad-hoc query environment, where an aggregate may not be suitable for improving query performance, due to the wide variety of characteristics used in the queries, adding new multi-column indexes on the fact table can help to improve performance. A multicolumn fact-table index containing only index columns for the dimensions used in a query can provide access to the fact table that is more efficient than access with the standard SAP index structure. The standard SAP index structure is made up of M single column indexes for the M dimensions of the fact table, and a single M column index containing columns for all the M dimensions on the fact table.
  • [0060]
    This method is based on the observation made on SAP systems that when the rate of “rows selected per second” is slow, the cause may be that a bad join order was used by DB2, such that the best filtering dimensions were applied after the fact table is accessed. If the filtering dimensions are applied after the fact table is accessed, DB2 must read many extraneous rows out of the fact table, which are filtered after the fact table is accessed, which reduces the “selected per second” rate. This method recognizes the symptom of a possible join order problem, proposes an index, and then checks the performance of the index to determine whether the index helped and should be kept.
  • [0061]
    Since SAP query statistics are gathered based on characteristics, which reside in dimensions or master data tables, and DB2 indexes are created based on dimensions, a method is needed to determine the dimensions where the characteristics reside, and to determine the sets of dimensions used by the queries. Once the sets of dimensions used by slow queries are found, a method is needed to order the index columns for the dimensions.
  • [0062]
    In the past, this problem was “solved” by examining the SQL for long running queries: explaining the statement, and determining the dimensions used. This method is based on sampling, and can be automated.
  • [0063]
    The present invention addresses this problem as follows. For each query executed, characteristic columns are converted to dimensions to determine a set of dimensions used by the query. Then, all queries in the evaluation interval are summarized to evaluate the total DB time (and then weight) of queries using each unique set of dimensions. If two dimension combinations support queries with similar aggregate db time, but one dimension combination has a slower ‘count(*)/second’ rate, the dimension combination with the slower rate will be favored as an index creation candidate. Based on the weight of each dimension combination, new multi-column indexes on fact tables are proposed to optimize access to the fact table.
  • [0064]
    The process is based on symptoms of access path problems (that is, low ‘rows selected per second’ rate) rather than based on analysis of predicates and database statistics.
  • [0065]
    The process contains a feedback loop, where the benefit of new indexes can be tested, to determine if the problem symptom (slow select rate) was helped by the new index. Query statistics are aggregated before and after creation of an index, and the ‘count(*)/second’ rate is compared for groups of queries using the same set of dimensions. If the count(*)/second rate improves with a new index, then the new index has helped performance.
  • [0066]
    The process contains a weighting system for created indexes, where different indexes can be evaluated, to determine which provides the most benefit to system performance. The performance benefit of a new index is used in conjunction with the query statistics to determine the ‘time saved’ by the new index. Different indexes can be compared, to determine which has the largest ‘time saved’, and thus the largest impact on performance.
  • (A) Proposal of New Multi-Column Fact Table Indexes
  • [0067]
    The first step (Step 1) is to extract information about query statistics over a time interval, and build an array of statistics rows, where each row contains the columns:
    • a) STATUID—the unique query identifier in SAP
    • b) Infocube specified at query execution (from RSDDSTAT.INFOCUBE)
    • c) DB request time (from RSDDSTAT.QTIMEDB)
    • d) Rows satisfying the predicates—that is, count(*) (RSDDSTAT.QDBSEL)
    • e) Rows after grouping (RSDDSTAT.QDBTRANS)
    • f) characteristics (collect all values of RSDDSTATAGGRDEF.IOBJNM where RSDDSTATAGGRDEF.AGGRST NE space)
    • g) querycube used at execution time (RSDDSTATAGGRDEF.QUERYCUBE)
    • h) dimensions (empty at this stage)
  • [0076]
    FIG. 7 depicts a sample from the first step containing four queries. In a second step (step 2), the dimensions are concatenated and sorted as follows:
  • [0077]
    for each row in the array built above:
  • [0078]
    for each unique characteristic (IOBJNM) in the row
  • [0079]
    select dimension from RSDDIMEIOBJ where RSDDIMEIOBJ.INFOCUBE=querycube and RSDDIMEIOBM.IOBJNM=characteristic
  • [0080]
    concatenate the dimension to the current dimension list, if it is not a member of the list
  • [0081]
    endfor characteristic
  • [0082]
    sort the dimension list
  • [0083]
    endfor row
  • [0000]
    If an SAP data dictionary (table RSDDIMEIOBJ) contains:
  • [0000]
    dimension infocube iobjnm
    D1 Z A
    D1 Z B
    D2 Z C
    D2 Z D
    D3 Z E
    D4 Z F

    Then step 2 yields an array containing rows such as these, where there is one row for each SAP query, querycube pair, as shown in FIG. 8.
  • [0084]
    The next step (step 3) is to merge (i.e., aggregate) all rows containing identical dimension lists, summing db time, count(*), and rows after grouping, and creating execution count. “Weighted db time” is (db time/sqrt(count(*)/db time)), in order to give larger weight to dimension combinations that return data slower. For example:
  • [0000]
    db time dimensions count(*) weighted db time
    100 D1 D3 100000 3.16
    60 D3 D4 10000 4.64
  • [0085]
    So in this case, even though D3 D4 has a smaller db time, it has a higher weighted db time, due to its slow ‘count(*)/db time’ ratio.
  • [0086]
    Other formulas could be used to create a “weighted db time,” depending on how much one wants to favor slow dimension combinations. The goal of the formula is that if there are two dimension combinations with similar db time, that the combination with slow count(*)/dbtime will be favored as a candidate for index creation. See for instance FIG. 9.
  • [0087]
    Next, proposed index column order can be assigned in the following way:
  • [0088]
    for each row in array (3) where the count of dimensions is more than one
  • [0089]
    recursively on the dimensions of the row starting with the dimension count (X) until the count of dimensions is 1
  • [0090]
    find the set of X−1 dimensions with the largest “weighted db time” whose dimensions are a proper subset of the set of X dimensions
  • [0091]
    assign position X to the dimension which is in the set of X dimensions, but not in the set of X−1 dimensions with the longest “weighted db time”
  • [0092]
    add the exec count, db time, count(*), rows after grouping, and weighted db time for the set of X−1 dimensions with the longest weighted db time to the statistics for the X dimension set
  • [0093]
    reduce X by 1
  • [0000]
    end recursion
    end for each row
  • [0094]
    This yields the table depicted in FIG. 10. Next, as shown in FIG. 11, the list is ordered by count of dimensions, and weighted db time; the single column indexes is deleted since by default there are already single-column indexes for each dimension of the fact table.
  • [0095]
    For each N, where N is the count of dimensions in the proposed index, the set of dimensions with the longest query database time, and thus the greatest potential for improvement, would be at the top of the list. One could adjust the process to discard any proposed indexes with more than N columns. If an index with N columns were chosen for creation, then any indexes with fewer columns, where the smaller indexes match the leading columns of the N column index, would not be creation candidates, as these smaller indexes are contained in the N column index.
  • (B) Feedback to Evaluate Effectiveness of Indexes
  • [0096]
    After a multi-column index has been created on the fact table, the query statistics for queries before and after the creation of the index can be compared, to determine if the index has improved performance, such as that shown in FIG. 12. Summing the queries by dimension list, before and after the addition of the index, yields the table depicted in FIG. 13.
  • [0097]
    Next, for each unique set of dimensions, the performance benefit of the indexes is calculated, by computing the ratio of count(*)/second after and before for each set of characteristic combinations contained in the indexes:
  • [0000]

    D1 D2 D4->(7300/200)/(7300/830)=36.5/8.79=4.15
  • [0000]
    Three dimension queries execute 415% times faster than before.
  • [0000]

    D1 D2->(85000/580)/(90000/700)=146.55/128.57=1.13
  • [0000]
    Two dimension queries execute 13% faster than before.
  • [0098]
    If the index helped some queries, and hindered others, then the total valuation done below will calculate the total benefit of the index, to determine if the index should be kept.
  • (C) Comparing Multi-Column Indexes, to Determine Which Provides More Benefit:
  • [0099]
    If there are several multi-column indexes that have been created, one can evaluate which provides the largest overall system benefit. Having computed the performance benefit from a new index for individual characteristic combinations, one can compare indexes in this way:
  • [0000]
    Index Performance
    Columns querycube benefit
    D1 D2 D4 Z 4.15
    D1 D2 Z 1.13
    D1 D3 M 1.4
    D1 D3 D4 M 1.7

    First, the saved time based on the query statistics is computed as shown in FIG. 14 wherein: savedtime=((db time*perf benefit)−db time). Next, as shown in FIG. 15, all the characteristic combinations that could be supported by each index are summed. If saved time is negative, then the index has hindered performance, and would be deleted.
  • (D) Note on Symptom Based Evaluation
  • [0100]
    When a new index with high ‘weighted db time’ is proposed in step 1, the index performance benefit can be calculated (Section B) to verify that it helped performance, and then the index can be compared to existing indexes (Section C), to determine from among several indexes which index is least valuable, in order to delete indexes which provide little benefit.
  • [0101]
    The valuation process (Section B) and comparison process (Section C) do not need to know whether the new index was used at execution time. They are symptom based, and not access path based. The feedback uses symptoms (that is a change in the rate of rows per second) to determine whether the index helped or not. It is not necessary to explain the statement and find the access path, to determine whether the change was helpful. In fact, if the new index was used, and caused performance to be worse, the feedback process and index valuation would recognize that so the index could be deleted.
  • [0102]
    This method does not estimate the value of an index before it is created. It looks for index candidates, and evaluates them after creation, to learn what worked, and what did not work.
  • [0103]
    It is understood that the systems, functions, mechanisms, methods, and modules described herein can be implemented in hardware, software, or a combination of hardware and software. They may be implemented by any type of computer system or other apparatus adapted for carrying out the methods described herein. A typical combination of hardware and software could be a general-purpose computer system with a computer program that, when loaded and executed, controls the computer system such that it carries out the methods described herein. Alternatively, a specific use computer, containing specialized hardware for carrying out one or more of the functional tasks of the invention could be utilized. The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods and functions described herein, and which—when loaded in a computer system—is able to carry out these methods and functions. Computer program, software program, program, program product, or software, in the present context mean any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: (a) conversion to another language, code or notation; and/or (b) reproduction in a different material form.
  • [0104]
    The foregoing description of the preferred embodiments of the invention has been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teachings. Such modifications and variations that are apparent to a person skilled in the art are intended to be included within the scope of this invention as defined by the accompanying claims.

Claims (14)

  1. 1. A query predicate analysis system comprising:
    a system for collecting performance statistics for database queries, wherein the database queries include statements submitted by end-users having different predicate combinations;
    an aggregation system that organizes the performance statistics based on the different predicate combinations of the database queries; and
    an evaluation system that analyzes performance of a subset of the database queries having different predicate combinations that include common predicates.
  2. 2. The system of claim 1, further comprising a problem identification system that identifies a first predicate which when added to an identified database query results in a substantial degradation of database query performance.
  3. 3. The system of claim 2, further comprising a solution proposal system that automatically proposes a solution to address the substantial degradation of database query performance.
  4. 4. The system of claim 3, further comprising a feedback system for comparing the performance of the identified database query before and after an implementation of a proposed solution.
  5. 5. The system of claim 3, further comprising a regression testing system for evaluating performance of a second subset of database queries referencing a changed predicate.
  6. 6. The system of claim 1, wherein the performance statistics include: rows that satisfy the predicates, query elapsed time, index pages referenced, and rows processed.
  7. 7. A computer program product comprising a computer usable medium having computer usable program code for analyzing query performance based on predicate usage, which when executed on a computer comprises:
    means for collecting performance statistics for database queries comprising database statements entered by end-users having different predicate combinations;
    means for organizing the performance statistics based on the different predicate combinations of the database queries;
    means for analyzing the performances of a subset of the database queries having different predicate combinations that include common predicates; and
    means for generating analysis results.
  8. 8. The computer program product of claim 7, wherein each predicate combination comprises a set of dimensions.
  9. 9. The computer program product of claim 7, further comprising means for generating a performance improvement measure for each of a second subset of the database queries based on an order of the predicates.
  10. 10. The computer program product of claim 7, further comprising means for proposing a solution to address a database query from the subset of database queries identified as a low performer.
  11. 11. The computer program product of claim 10, further comprising means for comparing the performance of a database query from the subset of database queries before and after an implementation of a proposed solution.
  12. 12. The computer program product of claim 7, wherein the performance statistics are selected from the group consisting of: rows that satisfy a set of predicates, query elapsed time, index pages referenced, and rows processed.
  13. 13. A query predicate analysis system comprising:
    a system for collecting performance statistics for database queries, wherein the database queries include statements submitted by end-users having different predicate combinations;
    an aggregation system that organizes the performance statistics based on the different predicate combinations of the database queries; and
    an evaluation system that analyzes performance of a subset of the database queries having different predicate combinations that include common predicates, wherein the evaluation system calculates an estimate improvement for adding a predicate given by: Estimated improvement=Time−(((count/second with added predicate)/(count/second))*Time), where Time is a sum of time for database queries containing an added predicate, count/second is a rate for running a database query without the added predicate and count/second with added predicate is a rate for running the database query with the added predicate.
  14. 14. A query predicate analysis system comprising:
    a system for collecting performance statistics for database queries, wherein the database queries include statements submitted by end-users having different predicate combinations;
    an aggregation system that organizes the performance statistics based on the different predicate combinations of the database queries;
    an evaluation system that analyzes performance of a subset of the database queries having different predicate combinations that include common predicates; and
    a problem identification system that identifies a first predicate which when added to an identified database query results in a substantial degradation of database query performance, wherein the problem identification system selects a problem from a problem list including: missing statistics on an added column; missing index on a new column; missing multi-column index on a fact table; and the first predicate contains a non-indexable operator.
US12052063 2003-12-24 2008-03-20 System for addressing inefficient query processing Abandoned US20080168058A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US10746961 US7376638B2 (en) 2003-12-24 2003-12-24 System and method for addressing inefficient query processing
US12052063 US20080168058A1 (en) 2003-12-24 2008-03-20 System for addressing inefficient query processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12052063 US20080168058A1 (en) 2003-12-24 2008-03-20 System for addressing inefficient query processing

Publications (1)

Publication Number Publication Date
US20080168058A1 true true US20080168058A1 (en) 2008-07-10

Family

ID=34794650

Family Applications (2)

Application Number Title Priority Date Filing Date
US10746961 Active 2025-03-24 US7376638B2 (en) 2003-12-24 2003-12-24 System and method for addressing inefficient query processing
US12052063 Abandoned US20080168058A1 (en) 2003-12-24 2008-03-20 System for addressing inefficient query processing

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10746961 Active 2025-03-24 US7376638B2 (en) 2003-12-24 2003-12-24 System and method for addressing inefficient query processing

Country Status (1)

Country Link
US (2) US7376638B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090077017A1 (en) * 2007-09-18 2009-03-19 Oracle International Corporation Sql performance analyzer
US20090106219A1 (en) * 2007-10-17 2009-04-23 Peter Belknap SQL Execution Plan Verification
US20100082517A1 (en) * 2008-09-29 2010-04-01 Oracle International Corporation Multi-database, runtime database query performance monitoring
US20130018890A1 (en) * 2011-07-13 2013-01-17 Salesforce.Com, Inc. Creating a custom index in a multi-tenant database environment
US8903801B2 (en) 2007-09-14 2014-12-02 Oracle International Corporation Fully automated SQL tuning
US20170147686A1 (en) * 2015-11-25 2017-05-25 International Business Machines Corporation Managing complex queries with predicates
US9971831B2 (en) * 2015-11-25 2018-05-15 International Business Machines Corporation Managing complex queries with predicates

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7809694B2 (en) * 2005-03-15 2010-10-05 International Business Machines Corporation Analysis of performance data from a relational database system for applications using stored procedures or SQL
US7949444B2 (en) * 2005-10-07 2011-05-24 Honeywell International Inc. Aviation field service report natural language processing
US7464083B2 (en) * 2005-10-24 2008-12-09 Wolfgang Otter Combining multi-dimensional data sources using database operations
JP5068062B2 (en) * 2006-10-30 2012-11-07 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Maschines Corporation System for integrating databases, methods, and programs
US20080235182A1 (en) * 2007-03-23 2008-09-25 Baranczyk Shawn J Isolating Database Queries for Performance Processing
US20080313136A1 (en) * 2007-06-13 2008-12-18 International Business Machines Corporation Method and apparatus for producing up-to-date query results from tables including data from a data warehouse
US9692856B2 (en) * 2008-07-25 2017-06-27 Ca, Inc. System and method for filtering and alteration of digital data packets
US8401990B2 (en) * 2008-07-25 2013-03-19 Ca, Inc. System and method for aggregating raw data into a star schema
US8108421B2 (en) * 2009-03-30 2012-01-31 Microsoft Corporation Query throttling during query translation
US8417691B2 (en) * 2009-12-09 2013-04-09 International Business Machines Corporation Client and database problem determination and monitoring
US20130132139A1 (en) * 2011-11-15 2013-05-23 Pvelocity Inc. Method And System For Providing Business Intelligence Data
CN104216891B (en) * 2013-05-30 2018-02-02 国际商业机器公司 Optimization methods and equipment relational database query statements
US9529865B2 (en) 2014-02-12 2016-12-27 Sap Se Interval based fuzzy database search
US9582540B2 (en) 2014-08-21 2017-02-28 International Business Machines Corporation Feedback mechanism providing row-level filtering earlier in a plan
US20160378634A1 (en) * 2015-06-26 2016-12-29 Microsoft Technology Licensing, Llc Automated validation of database index creation

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778354A (en) * 1995-06-07 1998-07-07 Tandem Computers Incorporated Database management system with improved indexed accessing
US5848408A (en) * 1997-02-28 1998-12-08 Oracle Corporation Method for executing star queries
US5857184A (en) * 1996-05-03 1999-01-05 Walden Media, Inc. Language and method for creating, organizing, and retrieving data from a database
US5864840A (en) * 1997-06-30 1999-01-26 International Business Machines Corporation Evaluation of existential and universal subquery in a relational database management system for increased efficiency
US5899986A (en) * 1997-02-10 1999-05-04 Oracle Corporation Methods for collecting query workload based statistics on column groups identified by RDBMS optimizer
US6189004B1 (en) * 1998-05-06 2001-02-13 E. Piphany, Inc. Method and apparatus for creating a datamart and for creating a query structure for the datamart
US6289334B1 (en) * 1994-01-31 2001-09-11 Sun Microsystems, Inc. Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system
US20020198867A1 (en) * 2001-06-06 2002-12-26 International Business Machines Corporation Learning from empirical results in query optimization
US6513028B1 (en) * 1999-06-25 2003-01-28 International Business Machines Corporation Method, system, and program for searching a list of entries when search criteria is provided for less than all of the fields in an entry
US6651055B1 (en) * 2001-03-01 2003-11-18 Lawson Software, Inc. OLAP query generation engine
US20040111408A1 (en) * 2001-01-18 2004-06-10 Science Applications International Corporation Method and system of ranking and clustering for document indexing and retrieval
US20040210563A1 (en) * 2003-04-21 2004-10-21 Oracle International Corporation Method and system of collecting execution statistics of query statements

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6289334B1 (en) * 1994-01-31 2001-09-11 Sun Microsystems, Inc. Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system
US5778354A (en) * 1995-06-07 1998-07-07 Tandem Computers Incorporated Database management system with improved indexed accessing
US5857184A (en) * 1996-05-03 1999-01-05 Walden Media, Inc. Language and method for creating, organizing, and retrieving data from a database
US5899986A (en) * 1997-02-10 1999-05-04 Oracle Corporation Methods for collecting query workload based statistics on column groups identified by RDBMS optimizer
US5848408A (en) * 1997-02-28 1998-12-08 Oracle Corporation Method for executing star queries
US5864840A (en) * 1997-06-30 1999-01-26 International Business Machines Corporation Evaluation of existential and universal subquery in a relational database management system for increased efficiency
US6189004B1 (en) * 1998-05-06 2001-02-13 E. Piphany, Inc. Method and apparatus for creating a datamart and for creating a query structure for the datamart
US6513028B1 (en) * 1999-06-25 2003-01-28 International Business Machines Corporation Method, system, and program for searching a list of entries when search criteria is provided for less than all of the fields in an entry
US20040111408A1 (en) * 2001-01-18 2004-06-10 Science Applications International Corporation Method and system of ranking and clustering for document indexing and retrieval
US6651055B1 (en) * 2001-03-01 2003-11-18 Lawson Software, Inc. OLAP query generation engine
US20020198867A1 (en) * 2001-06-06 2002-12-26 International Business Machines Corporation Learning from empirical results in query optimization
US20040210563A1 (en) * 2003-04-21 2004-10-21 Oracle International Corporation Method and system of collecting execution statistics of query statements

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8903801B2 (en) 2007-09-14 2014-12-02 Oracle International Corporation Fully automated SQL tuning
US9720941B2 (en) 2007-09-14 2017-08-01 Oracle International Corporation Fully automated SQL tuning
US9734200B2 (en) 2007-09-14 2017-08-15 Oracle International Corporation Identifying high risk database statements in changing database environments
US20090077017A1 (en) * 2007-09-18 2009-03-19 Oracle International Corporation Sql performance analyzer
US8341178B2 (en) * 2007-09-18 2012-12-25 Oracle International Corporation SQL performance analyzer
US20090106320A1 (en) * 2007-10-17 2009-04-23 Benoit Dageville Automatic Recognition and Capture of SQL Execution Plans
US8335767B2 (en) 2007-10-17 2012-12-18 Oracle International Corporation Maintaining and utilizing SQL execution plan histories
US20090106219A1 (en) * 2007-10-17 2009-04-23 Peter Belknap SQL Execution Plan Verification
US8600977B2 (en) 2007-10-17 2013-12-03 Oracle International Corporation Automatic recognition and capture of SQL execution plans
US20090106321A1 (en) * 2007-10-17 2009-04-23 Dinesh Das Maintaining and Utilizing SQL Execution Plan Histories
US9189522B2 (en) 2007-10-17 2015-11-17 Oracle International Corporation SQL execution plan baselines
US8700608B2 (en) 2007-10-17 2014-04-15 Oracle International Corporation SQL execution plan verification
US8700602B2 (en) * 2008-09-29 2014-04-15 Oracle International Corporation Multi-database, runtime database query performance monitoring
US20100082517A1 (en) * 2008-09-29 2010-04-01 Oracle International Corporation Multi-database, runtime database query performance monitoring
US20130018890A1 (en) * 2011-07-13 2013-01-17 Salesforce.Com, Inc. Creating a custom index in a multi-tenant database environment
US20170147686A1 (en) * 2015-11-25 2017-05-25 International Business Machines Corporation Managing complex queries with predicates
US20170147642A1 (en) * 2015-11-25 2017-05-25 International Business Machines Corporation Managing complex queries with predicates
US9892162B2 (en) * 2015-11-25 2018-02-13 International Business Machines Corporation Managing complex queries with predicates
US9971831B2 (en) * 2015-11-25 2018-05-15 International Business Machines Corporation Managing complex queries with predicates

Also Published As

Publication number Publication date Type
US20050165741A1 (en) 2005-07-28 application
US7376638B2 (en) 2008-05-20 grant

Similar Documents

Publication Publication Date Title
Pei et al. Mining frequent itemsets with convertible constraints
Mehta et al. SLIQ: A fast scalable classifier for data mining
Chaudhuri et al. Optimized stratified sampling for approximate query processing
Dageville et al. Automatic SQL tuning in Oracle 10g
US6105033A (en) Method and apparatus for detecting and removing obsolete cache entries for enhancing cache system operation
US6108647A (en) Method, apparatus and programmed medium for approximating the data cube and obtaining approximate answers to queries in relational databases
US6778977B1 (en) Method and system for creating a database table index using multiple processors
Chaudhuri et al. On random sampling over joins
US7158996B2 (en) Method, system, and program for managing database operations with respect to a database table
US4811199A (en) System for storing and manipulating information in an information base
US6665661B1 (en) System and method for use in text analysis of documents and records
US6553366B1 (en) Analytic logical data model
US5852821A (en) High-speed data base query method and apparatus
US6480836B1 (en) System and method for determining and generating candidate views for a database
US5899992A (en) Scalable set oriented classifier
US7139749B2 (en) Method, system, and program for performance tuning a database query
US7171427B2 (en) Methods of navigating a cube that is implemented as a relational object
US6691120B1 (en) System, method and computer program product for data mining in a normalized relational database
Sullivan et al. A system for managing large databases of network traffic
US20030018629A1 (en) Document clustering device, document searching system, and FAQ preparing system
US20060161559A1 (en) Methods and systems for analyzing XML documents
US20070061287A1 (en) Method, apparatus and program storage device for optimizing a data warehouse model and operation
US6272487B1 (en) Query optimization through the use of multi-column statistics to avoid the problems of non-indexed column correlation
US6301575B1 (en) Using object relational extensions for mining association rules
Tsaparas et al. Ranked join indices