WO2003075174A2 - Programme d'optimisation d'index base sur la charge de travail de la getpage - Google Patents

Programme d'optimisation d'index base sur la charge de travail de la getpage Download PDF

Info

Publication number
WO2003075174A2
WO2003075174A2 PCT/EP2003/002129 EP0302129W WO03075174A2 WO 2003075174 A2 WO2003075174 A2 WO 2003075174A2 EP 0302129 W EP0302129 W EP 0302129W WO 03075174 A2 WO03075174 A2 WO 03075174A2
Authority
WO
WIPO (PCT)
Prior art keywords
activity
queries
read
index
database
Prior art date
Application number
PCT/EP2003/002129
Other languages
English (en)
Other versions
WO2003075174A3 (fr
Inventor
Hans-Peter Grasshoff
Jörg Klosterhalfen
Guido Breuer
Original Assignee
Software Engineering Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/273,976 external-priority patent/US7047231B2/en
Application filed by Software Engineering Gmbh filed Critical Software Engineering Gmbh
Priority to AU2003210396A priority Critical patent/AU2003210396A1/en
Priority to CA002477580A priority patent/CA2477580A1/fr
Publication of WO2003075174A2 publication Critical patent/WO2003075174A2/fr
Publication of WO2003075174A3 publication Critical patent/WO2003075174A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures

Definitions

  • the present invention relates to a method for selecting optimized indexes for a relational database.
  • the present invention relates to an index optimization (XOP) for IBM mainframe DB2 database servers and the like. More specifically, the invention relates to an index recommendation program that is based on relevant SQL statements and their getpage activity as monitored by a DB2 trace for a representative period of time.
  • a DB2 database on an IBM mainframe server is a relational database management system.
  • data is stored in one or more tables, each containing a specific number of rows .
  • the rows of a table are divided into columns . Rows can be retrieved and manipulated at the column level .
  • the language for DB2 data retrieval and manipulation is SQL (Structured Query Language) .
  • Instructions to the database for either retrieval or manipulation are hereafter referred to as SQL statements or more generally, as queries.
  • the major SQL operations commands are SELECT, INSERT, UPDATE, and DELETE.
  • the performance of SQL statements can be enhanced through the use of indexes to accessing the rows in the DB2 tables more efficiently.
  • An index is an ordered set of pointers to the data in the table i.e., it orders the values of columns in a table. Indexes are stored separately from the data.
  • the decision on whether and how to use an existing index for a given SQL statement that retrieves/manipulates table data is made internally by the IBM DB2 optimizer. This decision is called the access path of an SQL statement.
  • the access path and the performance of an SQL statement are directly related to index design. In large-scale production environments, thousands of tables may exist.
  • index design is a manual process that includes estimates rather than actual production activity. The manual process of index design in such an environment is both inefficient and slow. To allow an easy and fast access to the physically stored dada, it is appropriate to have an optimized index.
  • An optimized index reduces the time for accessing the data and thus increases the physical computational performance.
  • An SQL statement is internally subdivided into one or more query blocks by the IBM DB2 optimizer. Every query block has its own access path with one or more tables and may use one or more indexes, e.g., the query SELECT * FROM TAB1 UNION SELECT * FROM TAB2 consists of 2 query blocks SELECT * FROM TAB1 and SELECT * FROM TAB2. Other examples of query blocks are VIEW or SUBSELECT MATERIALIZATION.
  • the query blocks and their selected access paths of SQL statements can be analysed using the EXPLAIN interface of DB2.
  • Statistical data about the tables and indexes are a major influence on the access path of an SQL statement selected using the IBM DB2 optimizer.
  • the DB2 catalog is an internal set of tables of data about a DB2 database including information on tables, columns, indexes, key columns, and SQL statements.
  • Buffer pools also known as virtual buffer pools, are areas of virtual storage in which DB2 temporarily stores pages of data from tables and indexes.
  • the DB2 data manager retrieves the pages containing the rows of the table from the DB2 buffer pool manager.
  • the number of pages retrieved for a table or index to satisfy an SQL statement is the "getpages" for the table or index.
  • getpages measured in the buffers of the DB2 database server are always the same for an SQL statement and a given set of indexes on the tables referenced by the SQL statement, regardless of the state of the DB2 database server and the machine.
  • getpages are different from disk I/Os which may vary for the same SQL statement at different times.
  • Predicates are located in the WHERE, HAVING and ON clauses of SQL statements and describe attributes of table data.
  • the IBM DB2 optimizer internally uses predicate filter factors to determine which index is best. Each such index can serve one or more columns of a table.
  • a filter factor is a floating value between 0 and 1 that describes the proportion of rows in a table for which the predicate is true. It is supplied by the IBM DB2 optimizer EXPLAIN interface and stays the same regardless of the indexes deployed, i.e., filter factors are static for the predicates of a given query.
  • a predicate with a small filter factor, i.e., close to zero is very effective because it selects only a few rows out of all rows in a table.
  • a predicate with a high filter factor, i.e., close to 1, is very ineffective since it encompasses most rows in the table .
  • Getp(T,Q) Number of getpages (equates to number of pages read) for table T for all invocations of SQL statement Q (regardless whether Q is a SELECT, UPDATE, INSERT or DELETE statement)
  • Getp(I,Q) Number of getpages (equates to number of pages read) for index I for all invocations of SQL statement Q (regardless whether Q is a SELECT, UPDATE, INSERT or DELETE statement)
  • Pred(T,Q,B) ⁇ P ...,P n ⁇ is the set of predicates in SQL statement Q and query block B with a column in table T
  • indexing n such as T? a , should be understood as a generic placeholder that differs for each table, index and query block.
  • the present invention describes an index recommendation XOP program that computes new and improved performance indexes based on the getpage workload of a given set of SQL statements.
  • the term getpage workload of an SQL statement is defined as the read/change activity
  • the results of the XOP program are pure computations; the XOP program itself does not use the DB2 optimizer to determine the usefulness of potential indexes. By using optimized indexes the number of and time for read activities can be decreased. Thus, the hardware requirements for fast data access can be reduced.
  • Fig. 1 is the first page of the flowchart of the overall processing by the XOP program.
  • Fig. 2 is the second page of the flowchart of the overall processing by the XOP program.
  • Fig. 3 is a flowchart of the table ranking for index recommendation.
  • Fig. 4 is a flowchart showing the retrieval of the filter factors of the SQL statement predicates.
  • Fig. 5 is a flowchart showing the retrieval of the concatenations of the SQL statement predicates .
  • Fig. 6 is a flowchart showing the process of building column sets.
  • Fig. 7 is a flowchart of the recommendation of new indexes .
  • Figr. 8 is a flowchart of column set pruning and the calculation of the total savings in getpages on the table level for each column set permutation.
  • Fig. 9 is a flowchart of building column permutations and calculating their filter factors and savings in getpages on query block level .
  • Fig. 10 is a flowchart showing the decision taking path for index recommendations that support index only access.
  • Fig. 11 is a flowchart of building the best column combinations to be added to existing indexes for index only access .
  • Fig. 12 is a flowchart of column combination pruning .
  • Fig. 13 is a flowchart of building indexes for index only access.
  • Fig 1 & 2 form a high level flowchart of the XOP program.
  • the programs major objective is to recommend new or improved performance indexes based on their ability to minimize the overall SQL statement getpage workload i.e., the read/change activity on each table and each index currently used for all invocations of an SQL statement.
  • the initial step in running the XOP program is to extract production statistics from the existing database.
  • a conventional DB2 trace is run in the production environment 10. This captures SQL statements and their getpage activities on the tables and index levels for a representative time interval.
  • the trace data is expanded with static SQL from the database's catalog 11 and condensed to give the SQL frequency and to allow assignment of getpage counts to the SQL 12.
  • the test environment consist of corresponding tables and indexes to those in the production environment, but no data is required to be in the tables.
  • the trace data is reloaded into the test environment 13.
  • Production catalog statistics must be simulated in order to do a one time EXPLAIN of the relevant SQL statements.
  • the EXPLAIN function of DB2 is used to provide the access path for each query block, the predicate filter factors, and the tables and indexes used by the relevant SQL statements. As a direct result, a comparison of indexes that exist with the ones being used, allows identification of the indexes currently not used 15.
  • the program identifies the indexable predicates in each SQL, it then assigns a getpage count to each table in the SQL 14.
  • indexes to be used in the production database is done through a series of steps 200-800 which will determine the XOP selection process 16. As more fully discussed below, the tables are ranked 200, filter factors are obtained for all predicates 300, concatanation for all predicates is obtained 400, column permutations and filter factors and savings are calculated 500, column permutations are consolidated and indexes recommended 600, indexes for index only access are selected 800. Upon completion the recommended indexes are used to create indexes in the production database 17.
  • Fig. 3 illustrates the process of ranking tables. This ranking identifies the tables most likely to benefit as a result of new performance indexes .
  • a set of tables TAB is introduced in the first step and initialized to empty 201. Then each table T 202 is checked whether its read activity Getp r exceeds the change activity Chng r by a predefined threshold percentage 203. A table T with a high volume of changed data is not considered for further processing because additional indexes on table T increase the change activity Chng r to an amount that is higher than the savings in read activity Getp r . Otherwise, table T is added to the set of tables TAB 204 which collects all tables for ranking.
  • the read activity Getp r of a table T is defined as the total sum of getpage requests of all SQL statement invocations for this table T:
  • the change activity Chng r of a table T is the total sum of pages changed of all UPDATE, DELETE and INSERT statement invocations for this table T:
  • FIG. 4 The flowchart of Fig. 4 shows the process of retrieving filter factors of all predicates.
  • FIG. 5 The flowchart of Fig. 5 illustrates the retrieval of the predicate concatenations, i.e., the boolean operators AND and OR.
  • each predicate out of Pred(T,Q,B) ⁇ P ...,P n ⁇ 404 is investigated to determine how it is concatenated to the next predicate and whether the next predicate in the concatenation of SQL statement Q belongs to Pred(T,Q,B), i.e., contains columns of table T 405. If no such columns are found, such predicates are replaced by TRUE and removed from the predicate expression using boolean rules. Note that expressions in inner brackets are evaluated first.
  • the program verifies whether there is enough read activity Getp r left over for all tables T not yet processed. If the remaining read activity is sufficient according to a predefined threshold, the process of building column sets and the calculation of the total savings in getpages on table level for each column set permutation continues for the next table T 501.
  • T are sorted in descending order 601, so that the permutations Pe with the highest savings in getpages are first.
  • the index key columns E 17 ...,E n are put in either ascending or descending order.
  • the sorted list is searched for elements Sav TPx of the same table T where the permutation Px is covered by the index key (E,,...,E just recommended. Such elements Sav ri - X are ignored for further processing and removed from the sorted list 602.
  • the flowchart 508 of Fig. 8 shows returning the savings Sav rgS)ft for each permutation Pe and query block B inside every SQL statement Q using table T 704 Fig. 7 :
  • the filter factor FF r>e)B) e f° r query block B and permutation Pe and the filter factor FF reB for the complete query block B are calculated 714.
  • the getpage number Getp r ⁇ S on query block level B is calculated based on FF r ⁇ ⁇ 715.
  • the query block B currently uses a matching index access 718:
  • FilterFactorl stands for the filter factor of the used index. It is calculated for a specific query block B with the same AND and OR rule already described. Assume Predi are the predicates in query block B with columns covered by the index, then the FilterFactorl is calculated as follows:
  • FilterFactorl Rule ⁇ r ⁇ flnrfr ⁇ [ FF(Pred. ) Concat(Pred 2 ) FF(Pred 2 ) ...
  • FF(Pred i ) are the filter factors of Predi and Concat (Pred i+I ) are the concatenations AND and OR to the next predicate Pred i+1 .
  • the query block B currently uses a non-matching index access 719:
  • Getp(I,Q) is the number of index getpages. Normally, Getp(I,Q) exceeds Getp re ⁇ by far and will be theoretically eliminated by a new index with matching access .
  • the query block B does not currently use any index 716, i.e., uses a table space scan 720:
  • each table T being processed in ranking order 801 is checked whether its read activity Getp r exceeds the total read activity of all tables by a predefined threshold percentage 802. If this threshold is not reached, this table T and all tables that follow table T in ranking order are not considered any further and processing continues with flowchart Fig. 13 809. Otherwise, if the read activity on table T is still sufficient, there should be no index recommendation already made for this table T by the XOP program 803. If this is true, all existing indexes I for table T are retrieved (804) and sets AddCol T/J are initialized to "empty" for table T and all indexes I.
  • a set AddCol TI will store elements (B, ⁇ C. , ... , C,- ⁇ ,Getp reB ) which represent the column combination ⁇ C.,...,C,. ⁇ and its savings Getp r ⁇ jB in getpages for a query block B 805 inside SQL statements Q using table T (806) that is to be added to index I for index only access.
  • the construction of AddCol TI is illustrated in flowchart Fig. 11 where all indexes 810 are investigated for query block B which columns ⁇ C j ,...,C-- ⁇ are to be added 811 to the index key for index only access.
  • Processing returns to flowchart Fig. 10 at 808 if all indexes I on table T are checked 820 or the sum of getpages in the remaining query blocks is below a predefined threshold 827.
  • a new index should be built with a key containing the columns copied from the existing index I plus the columns ⁇ C.,...,C ⁇ ⁇ 833 or whether the columns ⁇ C.,...,C n ⁇ should be just added to the existing index I 834.
  • the recommendation for a new index is always made if the existing index is unique or clustering 832. Otherwise, recommendations are made to modify the existing index 834.
  • the recommendations can then be used to create indexes in the production database 17 which are accordingly optimized based on getpage work load.

Abstract

Programme d'optimisation d'index (XOP) consistant en un procédé servant à recommander de nouveaux indices de performance, ou des indices de performance améliorés, afin qu'il soient utilisés par un serveur de base de données DB2 d'ordinateur central IBM pour un ensemble donné dénoncés SQL. Ces recommandations seront calculées en fonction d'un appel unique du programme d'optimisation DB2 pour chaque énoncé SQL avec l'ensemble d'indices existant, d'une analyse des énoncés SQL, des facteurs de filtre des prédicats dénoncés SQL et de la charge de travail de la getpage d'un énoncé SQL. Cette dernière consiste en l'activité de lecture/modification de chaque table et de chaque indexe normalement utilisé pour la totalité des appels d'un énoncé SQL. Ce programme XOP calcule de nouveaux indices de performance ou des indices de performance améliorés en fonction de la charge de travail de la getpage de la totalité des énoncés SQL appropriés et n'utilisent pas le programme d'optimisation DB2 pour déterminer l'inutilité des indices potentiels.
PCT/EP2003/002129 2002-03-01 2003-02-28 Programme d'optimisation d'index base sur la charge de travail de la getpage WO2003075174A2 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2003210396A AU2003210396A1 (en) 2002-03-01 2003-02-28 Getpage - workload based index optimizer
CA002477580A CA2477580A1 (fr) 2002-03-01 2003-02-28 Programme d'optimisation d'index base sur la charge de travail de la getpage

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US36091102P 2002-03-01 2002-03-01
US60/360,911 2002-03-01
US10/273,976 US7047231B2 (en) 2002-03-01 2002-10-18 Getpage-workload based index optimizer
US10/273,976 2002-10-18
EP03002348.5 2003-02-04
EP03002348A EP1341098A3 (fr) 2002-03-01 2003-02-04 Getpage - Optimiseur d'indexation basé sur la charge de travail

Publications (2)

Publication Number Publication Date
WO2003075174A2 true WO2003075174A2 (fr) 2003-09-12
WO2003075174A3 WO2003075174A3 (fr) 2004-09-02

Family

ID=27791880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2003/002129 WO2003075174A2 (fr) 2002-03-01 2003-02-28 Programme d'optimisation d'index base sur la charge de travail de la getpage

Country Status (3)

Country Link
AU (1) AU2003210396A1 (fr)
CA (1) CA2477580A1 (fr)
WO (1) WO2003075174A2 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1065606A2 (fr) * 1999-07-02 2001-01-03 Robert S. Lenzie Méthode et appareil pour identifier les index préférés dans des bases de données
US6182079B1 (en) * 1995-12-20 2001-01-30 British Telecommunications Public Limited Company Specifying indexes by evaluating costs savings for improving operation in relational databases

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6182079B1 (en) * 1995-12-20 2001-01-30 British Telecommunications Public Limited Company Specifying indexes by evaluating costs savings for improving operation in relational databases
EP1065606A2 (fr) * 1999-07-02 2001-01-03 Robert S. Lenzie Méthode et appareil pour identifier les index préférés dans des bases de données

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GARY BRONZIET: "Index Optimization White Paper" COGITO LIMITED, [Online] 7 July 2000 (2000-07-07), XP002263073 Retrieved from the Internet: <URL:http://www.cogito.co.uk/Downloads/Doc s/XOP%20Whitepaper.pdf> [retrieved on 2003-11-27] *
ROB LENZIE: "Index Design Analysis - Why and How ?" THE IDUG SOLUTIONS JOURNAL, [Online] vol. 8, no. 1, 1 May 2001 (2001-05-01), XP002263074 Retrieved from the Internet: <URL:http://www.idug.org/idug/member/journ al/may01/articl05.cfm> [retrieved on 2003-11-27] *

Also Published As

Publication number Publication date
CA2477580A1 (fr) 2003-09-12
AU2003210396A1 (en) 2003-09-16
WO2003075174A3 (fr) 2004-09-02

Similar Documents

Publication Publication Date Title
US7047231B2 (en) Getpage-workload based index optimizer
CN101133388B (zh) 基于多索引的信息检索系统
Papadias et al. An optimal and progressive algorithm for skyline queries
US5758144A (en) Database execution cost and system performance estimator
US7272589B1 (en) Database index validation mechanism
US8620900B2 (en) Method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface
US5924088A (en) Index selection for an index access path
US10762071B2 (en) Value-ID-based sorting in column-store databases
US6801903B2 (en) Collecting statistics in a database system
US8150850B2 (en) Multiple dimensioned database architecture
US7171399B2 (en) Method for efficient query execution using dynamic queries in database environments
US9805077B2 (en) Method and system for optimizing data access in a database using multi-class objects
US8745033B2 (en) Database query optimization using index carryover to subset an index
US20100114976A1 (en) Method For Database Design
CN106991141B (zh) 一种基于深度剪枝策略的关联规则挖掘方法
US20180096006A1 (en) Workload-driven recommendations for columnstore and rowstore indexes in relational databases
US7725448B2 (en) Method and system for disjunctive single index access
US6999967B1 (en) Semantically reducing the number of partitions involved in a join
Feldman et al. A knowledge-based approach for index selection in relational databases
Rupley Jr Introduction to query processing and optimization
WO2003075174A2 (fr) Programme d&#39;optimisation d&#39;index base sur la charge de travail de la getpage
US20090049035A1 (en) System and method for indexing type-annotated web documents
Xu et al. Efficiently answer top-k queries on typed intervals
US6694324B1 (en) Determination of records with a specified number of largest or smallest values in a parallel database system
Gibas et al. Online index recommendations for high-dimensional databases using query workloads

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2477580

Country of ref document: CA

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP