US20080177700A1 - Automated and dynamic management of query views for database workloads - Google Patents
Automated and dynamic management of query views for database workloads Download PDFInfo
- Publication number
- US20080177700A1 US20080177700A1 US11/624,876 US62487607A US2008177700A1 US 20080177700 A1 US20080177700 A1 US 20080177700A1 US 62487607 A US62487607 A US 62487607A US 2008177700 A1 US2008177700 A1 US 2008177700A1
- Authority
- US
- United States
- Prior art keywords
- cache
- queries
- mqt
- query
- mqts
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 21
- 230000008901 benefit Effects 0.000 claims description 64
- 238000013459 approach Methods 0.000 description 12
- 230000002068 genetic effect Effects 0.000 description 12
- 230000003068 static effect Effects 0.000 description 10
- 210000000349 chromosome Anatomy 0.000 description 9
- 230000035772 mutation Effects 0.000 description 9
- 238000011156 evaluation Methods 0.000 description 8
- 230000015654 memory Effects 0.000 description 8
- 230000003044 adaptive effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000005457 optimization Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000005215 recombination Methods 0.000 description 4
- 230000006798 recombination Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000002035 prolonged effect Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000001568 sexual effect Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000016571 aggressive behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000009828 non-uniform distribution Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000002922 simulated annealing Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24539—Query rewriting; Transformation using cached or materialised query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/2454—Optimisation of common expressions
Definitions
- the embodiments of the invention provide a method, program storage device, etc. for automated and dynamic management of query views for database workloads.
- a materialized view, or materialized query table (also referred to herein as “MQT” or “data table”), is an auxiliary table with precomputed data that can be used to significantly improve the performance of a database query.
- MQT materialized query table
- data table data table
- a query rewritten to utilize the MQT has one join operation instead of two, thus allowing its query processing cost to be reduced significantly. Since the creation of MQTs can be expensive compared to the benefit of the MQTs to a single query, MQTs are usually created for the whole batch query workload so that the accumulated benefits exceed the cost of their materialization.
- a method begins by executing queries, which includes accessing a set of data tables (also referred to herein as “materialized views”) for each of the queries.
- the data tables summarize common portions of the queries.
- the method accesses a required data table from a cache if the required data table is present in the cache.
- the method creates the required data table if the required data table is not present in the cache and if a benefit of accessing the required data table exceeds a cost of creating the required data table.
- the accessing of the required data table from the cache has a lower processing cost than accessing the required data table from a base table.
- created data tables are stored in the cache, wherein one or more of the created data tables are removed from the cache when the cache becomes full.
- the cache Prior to the executing of the queries, the cache comprises zero required data tables.
- the method reorders the queries. This can include creating workloads such that each of the workloads represents an ordering of the queries, wherein the workloads are recombined and/or mutated to create new orderings of the queries. Next, one of the new orderings of the queries is identified as an ordering having a lowest processing cost.
- the method also includes calculating a net benefit of a data table by subtracting a cost of executing a query with the data table from a cost of executing the query without the data table and multiplying by a total number of occurrences of the data table within the queries. The reordering of the queries can be based on a ranking of net benefits of the data tables.
- the embodiments of the invention provide an automated, dynamic view management scheme that materializes views on-demand as a workload is executing and manages the views with an least recently used (LRU) cache.
- LRU least recently used
- the scheme makes an adaptive tradeoff between the view materializations, base table accesses, and the benefit of view hits in the cache.
- a genetic method is used to search the N! solution space.
- FIG. 1 illustrates a classification of MQT management scenarios table
- FIG. 2 illustrates a pseudocode for a genetic search method
- FIG. 3 illustrates MQT creation
- FIG. 4 illustrates early MQT creation optimization
- FIG. 5 illustrates query preemption
- FIG. 6 is a flow diagram illustrating a method for automated and dynamic management of query views for database workloads
- FIG. 7 is a flow diagram illustrating another method for automated and dynamic management of query views for database workloads.
- FIG. 8 is a diagram illustrating a program storage device for automated and dynamic management of query views for database workloads.
- the embodiments herein provide an automated, dynamic view management scheme that materializes views on-demand as a workload is executing and manages the views with an LRU cache.
- the scheme makes an adaptive tradeoff between the view materializations, base table accesses, and the benefit of view hits in the cache.
- a genetic method is used to search the N! solution space.
- MQTs are required in OLAP (Online Analytical Processing) applications in which the query workloads tend to have complex structure and syntax
- a Materialized Query Table Advisor such as the IBM DB2 Design Advisor [1], available from International Business Machines, Armonk, N.Y., USA, is often required to recommend MQTs and appropriate indexes on them. While referring to an MQT, embodiments herein assume that it includes its appropriate indexes.
- An MQTA takes a workload (the read and write queries to the database system) and the database space size allocated for MQTs (i.e. MQT cache size) as the input. It first performs workload compression to remove those insignificant queries which are inexpensive or infrequent. It then performs multi-query optimization [2] to derive common parts in the workload and generates candidate MQTs.
- the MQTA calculates the benefits of these candidate MQTs in terms of resource time reduction and calculates the overhead (in terms of resource time) for refreshing MQTs by incorporating database updates and estimates the size of the MQTs.
- the MQTA calculates the utility of each MQT by dividing net benefit (i.e. benefit minus overhead of creating the MQT) by the size of the MQT and its index size. The MQTA then recommends the MQTs whose utility values are higher than a given threshold.
- the embodiments of the invention follow a complementary approach and present an automated, dynamic view management scheme that materializes MQTs on-demand as a batch workload executes and manages the MQTs with an LRU cache.
- the scheme makes an adaptive tradeoff between the cost of MQT materializations, the cost of accessing base tables in lieu of the MQT, and the benefit of MQT cache hits.
- the order of the queries in the workload are permuted, and the permutation that produces the overall highest benefit is found using a self-adapting genetic method to search the N! permutation solution space.
- MQT management scenarios are classified into four categories based on two conditions, namely whether or not query workloads can be reordered and whether or not MQTs can be materialized and replaced dynamically.
- MQT advisors such as described in [1]
- MQTs are recommended and created to fill the disk space that users specify.
- the recommended MQTs (and their indexes) are materialized in advance before workloads are executed.
- the MQTs are not replaced.
- the MQTs are fixed during the workload execution, whether or not the workload is reordered will not make any difference.
- Scenarios ( 3 ) and ( 4 ) represent dynamic materialization and replacement of MQTs. Without the possibility of workload reordering, an MQT is materialized, e.g., MQT x , as long as its net benefit (i.e. total benefit minus materialization cost) is positive before it is replaced by another MQT y at T 0 .
- scenario ( 4 ) With the possibility of reordering the query workload as in scenario ( 4 ), if there are queries that arrive after T 0 and can benefit from MQT x , it may be desirable to execute these queries before swapping out MQT x . Therefore, in the scenario ( 4 ) the highest flexibility for managing MQTs is offered to minimize the response time of a query workload.
- the embodiments of the invention focus on scenario ( 4 ), which subsumes ( 3 ).
- a batch workload's execution and interaction with the MQTs can be modeled in the following manner.
- the workload is represented as a queue of N queries.
- an MQT Advisor product then generates a list of M candidate MQTs that are beneficial to the workload.
- the M candidate MQTs are then randomly mapped to the N queries to model the situation where each query makes use of a small set of MQTs.
- the number of MQTs per query is randomly chosen for each query based on a nonuniform distribution ranging from 0 MQTs/query to 4 MQTs/query.
- the assignment of which MQTs to belong to a given query is determined with a uniform random distribution.
- the size of individual MQTs is determined by a Gaussian random variable with a varying mean determined by experiment; the sizes are typically on the order of tens to hundreds of MBytes.
- the queries in the workload are executed sequentially in the order that they appear in the queue. For each query, the assigned MQTs on disk are in turn accessed sequentially.
- MQTs Three approaches to MQT management are modeled.
- the MQTs are pre-materialized before the workload begins and are used throughout the workload. If an MQT does not exist for a given query, the query must access the base table to get the data.
- the MQTs are aggressively materialized on-demand and managed in an LRU cache. When a query executes and its needed MQT does not exist (i.e. upon cache miss), the MQT is always materialized.
- the dynamic advanced model is a compromise between the previous models. An LRU cache is still maintained, but MQTs are not always materialized when there is a cache miss. Instead, a subset of available candidate MQTs that are managed via the cache are created, and for those MQTs not in this set, queries access their respective data by reading from the base tables.
- the dynamic simple model is too aggressive in materializing MQTs on-demand; this aggression is throttled in the dynamic advanced model.
- the benefit of an individual MQT is a measure of how much it improves the execution time of a query.
- the embodiments herein follow a simplified benefit model that calculates the benefit B i of the i th MQT as follows. First, let ⁇ be the cost of a query to execute without an MQT in units of time. This includes the cost for the query to access the base tables. Second, let ⁇ be the cost of a query to execute with an MQT in units of time.
- the difference ⁇ is the benefit of one use of the MQT.
- the benefit B i is then simply the sum of all the benefits of MQT i across all queries in the workload. Once B i is calculated for all the MQTs, the MQTs are sorted based on this score. The score is the benefits divided by MQT size.
- the system materializes the MQTs in decreasing benefit order until a disk usage limit is reached. These pre-materialized MQTs are kept on disk throughout the execution of the workload. If a query requires an MQT that has not been materialized, the query must access a base table.
- the MQTs are not pre-materialized. Instead, MQTs are materialized on-demand when queries execute and are managed in an LRU cache. Such an approach makes a tradeoff between the negative cost of materialization time and the positive benefit of MQT hits in the cache which obviate the need to access base tables.
- a workload comprises five queries, each of which access the same MQT.
- the cost of materializing the MQT is 2000 seconds
- the cost of executing the query with the MQT is 100 seconds
- the on-demand materialization approach provides greater benefit than running the workload without the MQT materialized because MQT hits in the cache makes accessing the base tables unnecessary.
- the static approach not having a needed MQT available can be likely if the disk limit was already reached during the pre-materialization phase.
- the dynamic models execute by sequentially running the queries in the workload queue.
- Each query has its own MQT set, and for each MQT in the set, the MQT's benefit is calculated based upon whether or not the MQT is present in the cache. If there is an MQT hit, the MQT is accessed. If there is an MQT miss, the MQT is materialized at that moment and placed into the cache. If an MQT must be removed from the cache to make room, eviction follows LRU policy; however, if a cached MQT is to be evicted but is in the set of required MQTs for the current query, then the MQT is kept in the cache.
- the quantitative benefit of the MQTs for the workload can be calculated as follows. First, let N be the number of queries in the workload queue; let S i be the set of MQTs required by query i and
- hit(i) be 1 if accessing MQT i incurs a cache hit and 0 otherwise; and, let miss(i) be 1 if accessing MQT i incurs a cache miss and 0 otherwise.
- a cache hit and a cache miss are mutually exclusive. It is apparent that hit(i) and mat(i) vary over time based on the cache state. Furthermore, for simplicity, it is assumed that ⁇ , ⁇ , and ⁇ are constant for all MQTs.
- the net benefit B of executing the queries with the MQTs over executing without the MQTs can be calculated using the following equation.
- the inner summation represents the net benefit for executing one query with its set of MQTs.
- the outer summation represents the net benefit for all the queries in the workload in the order that they appear.
- a dynamic advanced model is also provided wherein a subset of the candidate MQT set is managed via the cache. This subset is found as follows. After the workload is simulated once and negative-valued MQTs are found, a binary search is performed on the size of the candidate MQT set.
- the candidate MQTs are sorted by their net benefit from the previous simulation round and are selected on decreasing order.
- the dynamic advanced model produces a better query workload execution than either the static model (which produces too many base table accesses) or the dynamic simple model (which produces too many materializations).
- the net benefit depends on the occurrence of MQT hits and misses in the cache, which in turn is a consequence of the query order in the workload. Reordering is performed due to the cache's LRU replacement policy; it is desirable to have as many cache hits as possible, but this situation prefers that MQT accesses be grouped together to exploit temporal locality before eviction.
- GM self-adapting genetic method
- the dynamic MQT management problem lends itself well to a GM because potential solutions can be represented as a permutation of unique integers identifying the queries in the workload. A given ordering of the integers represents a particular query order, which in turn determines the order that the MQTs are accessed.
- This permutation-based representation is known in GM research and allows the leveraging of prior research in effective chromosome recombination (e.g. [8]).
- a GM proceeds as follows. Initially a random set of chromosomes (also referred to herein as “workloads”) is created for the population. The chromosomes are evaluated (hashed) to some metric, and the best ones are chosen to be parents. Thus, the evaluation produces the net benefit of executing the workload, accessing MQTs, and materializing/evicting MQTs in the cache. The parents recombine to produce children, simulating sexual crossover, and occasionally a mutation may arise which produces new characteristics that were not available in either parent. An adaptive mutation scheme is further provided whereby the mutation rate is increased when the population stagnates (i.e. fails to improve its workload benefit metric) over a prolonged number of generations.
- the children are ranked based on the evaluation function, and the best subset of the children is chosen to be the parents of the next generation, simulating natural selection.
- the generational loop ends after some stopping condition is met; e.g., end after 1000 generations had passed, as this value is a tradeoff between simulation execution time and thoroughness of the search. Converging toward and finding the global optimum is not guaranteed because the recombination and mutation are stochastic.
- the chromosomes are permutations of unique integers. Recombination of chromosomes is applied to two parents to produce a new child using a two-point crossover scheme [8] where a randomly chosen contiguous subsection of the first parent is copied to the child, and then all remaining items in the second parent (that have not already been taken from the first parent's subsection) are then copied to the child in order of appearance.
- the uni-chromosome mutation scheme chooses two random items from the chromosome and reverses the elements between them, inclusive. Other recombination and mutation schemes may also be utilized.
- a GM component is the evaluation function. Given a particular chromosome representing one workload permutation, the function deterministally calculates the net benefit of using MQTs managed by an LRU cache during the workload. The calculations are based on the following equation:
- the evaluation function can be replaced if desired; for example, other evaluation function can model different cache replacement policies or execution of queries in parallel.
- MQT materialization is considered as the needed “pre-staging” of a query or set of queries.
- the scheduling method defines the partial order of events, namely query execution and MQT materialization/eviction.
- other components such as the query patroller or workload manager will take the schedule and execute them as efficiently as possible (sequentially or in parallel) to yield the shortest elapsed time.
- the system needs to know various parameters such as CPU utilization, I/O bandwidth, and the number of connections.
- the dynamic MQTs are materialized dynamically, which imposes a
- FIG. 3 illustrates MQT creation, where the MQTs are materialized on-demand when an MQT is scheduled to execute. For example, when query 320 is scheduled, its needed MQT (MQT C) is materialized into the cache.
- MQT C its needed MQT
- FIG. 3 illustrates a query workload having Query 1 , Query 2 , and Query 3 .
- Query 1 accesses MQT A and MQT B;
- Query 2 accesses MQT C;
- Query 3 accesses MQT D.
- the MQT cache (having a maximum capacity of two MQTs) includes MQT A and MQT B.
- the MQT cache includes MQT C and MQT A.
- the MQT cache includes MQT C and MQT D.
- FIG. 4 shows optimization where the materialization is performed early. For example, before query 3 executes, its needed MQT (MQT D) is materialized while query 2 is executing. This approach prefers that the system look into the cache to see if there is available space to hold the needed MQTs; if there is not, previous MQTs are evicted, but only if they are not currently being used by an executing query. If the needed MQTs cannot be materialized, the system performs materialization at the time new query executes, essentially falling back to the approach and analysis presented above.
- query preemption relies on an already-generated batch workload schedule; it is diagrammed in FIG. 5 .
- the MQTs are materialized dynamically and reside in the cache according to the query order generated by the genetic method.
- the query order is known, the period of time that each MQT stays materialized in the cache is also known; this is called the MQT materialization schedule.
- a new query arrives (e.g., Query 4 )
- its required MQTs is observed (e.g., MQT C) and the MQT materialization schedule is searched for a time slot when the query's MQT is materialized in the cache (e.g., during the execution of Query 2 ). If no time slot exists that will satisfy all of the query's needed MQTs, the slot with the most beneficial MQTs is chosen.
- the query is then inserted into the query workload schedule, thereby preempting all subsequent queries in the workload (e.g., Query 3 ).
- the tradeoff is that the queries that were already scheduled will be delayed unfairly, but the benefit will be that the incoming queries will be satisfied in this framework.
- new incoming queries can preempt queries in the in-progress workload.
- the position of the preemption in the workload is chosen to maximize MQT use by the incoming query.
- the drawback to this scheme is that in-progress queries are unfairly pushed back.
- an entirely new schedule for both incoming and existing queries can be created.
- a batch query workload has been scheduled, when new queries arrive, they can aggregated together to create a new workload batch; this new batch is no longer added to either when a periodic timer runs out or when a batch size limit is reached.
- the new batch is ready, the previous batch may or may not have already ended.
- the new batch is scheduled as in the original method.
- the remainder of the current batch workload can be combined with the new batch, and the genetic method can be run to produce a new schedule.
- the GM will ideally produce a very tight schedule. Running the GM in this case takes into consideration that some MQTs have already been materialized and are already in the cache.
- the above two methods may be combined to suit the nature of the incoming queries.
- a user can switch dynamically between the two strategies.
- a method begins by executing queries, which includes accessing a set of data tables (also referred to herein as “MQTs” or “materialized views”) for each of the queries.
- the data tables summarize common portions of the queries.
- a database query optimizer can explore the possibility of reducing the query processing cost by appropriately replacing parts of a query with existing and matched MQTs.
- the method accesses a required data table from a cache if the required data table is present in the cache.
- the method creates the required data table if the required data table is not present in the cache and if a benefit of accessing the required data table exceeds a cost of creating the required data table.
- the dynamic advanced model is a compromise between the static model and the dynamic simple model. An LRU cache is still maintained, but MQTs are not always materialized when there is a cache miss. Instead, a subset of available candidate MQTs that are managed via the cache are created, and for those MQTs not in this set, queries access their respective data by reading from the base tables. The accessing of the required data table from the cache has a lower processing cost than accessing the required data table from a base table.
- created data tables are stored in the cache, wherein one or more of the created data tables are removed from the cache when the cache becomes full.
- eviction follows LRU policy; however, if a cached MQT is to be evicted but is in the set of required MQTs for the current query, then the MQT is kept in the cache.
- the cache Prior to the executing of the queries, the cache comprises zero required data tables.
- the method reorders the queries.
- the net benefit depends on the occurrence of MQT hits and misses in the cache, which in turn is a consequence of the query order in the workload. Reordering is performed due to the cache's LRU replacement policy; it is desirable to have as many cache hits as possible, but this situation prefers MQT accesses be grouped together to exploit temporal locality before eviction.
- workloads can be created such that each of the workloads represents an ordering of the queries.
- a random set of workloads is initially created for the population.
- the workloads are evaluated (hashed) to some metric, and the best ones are chosen to be parents.
- the evaluation produces the net benefit of executing the workload, accessing MQTs, and materializing/evicting MQTs in the cache.
- the workloads are recombined and/or mutated to create new orderings of the queries.
- the parents recombine to produce children, simulating sexual crossover, and occasionally a mutation may arise which produces new characteristics that were not available in either parent.
- An adaptive mutation scheme is further provided whereby the mutation rate is increased when the population stagnates (i.e.
- the method also includes calculating a net benefit of a materialized view by subtracting a cost of executing a query with the materialized view from a cost of executing the query without the materialized view and multiplying by a total number of occurrences of the materialized view within the queries.
- the reordering of the queries can be based on a ranking of net benefits of the materialized views.
- FIG. 6 is a flow diagram illustrating a method 600 for automated and dynamic management of query views for database workloads.
- the method 600 begins in item 610 by executing queries, which includes accessing a set of data tables for each of the queries.
- a database query optimizer can explore the possibility of reducing the query processing cost by appropriately replacing parts of a query with existing and matched MQTs.
- the method 600 accesses a required data table from a cache if the required data table is present in the cache, creates the required data table if the required data table is not present in the cache and if a benefit of accessing the required data table exceeds a cost of creating the required data table, and stores created data tables in the cache.
- the dynamic advanced model is a compromise between the static model and the dynamic simple model. An LRU cache is still maintained, but MQTs are not always materialized when there is a cache miss.
- the accessing of the required data table from the cache comprises a lower processing cost than accessing the required data table from a base table.
- the cache prior to the executing of the queries, the cache comprises zero required data tables. Also during the executing of the queries, in item 626 , at least one of the created data tables is removed from the cache when the cache becomes full. As discussed above, if an MQT must be removed from the cache to make room, eviction follows LRU policy; however, if a cached MQT is to be evicted but is in the set of required MQTs for the current query, then the MQT is kept in the cache.
- the method 600 reorders the queries. As discussed above, reordering is performed due to the cache's LRU replacement policy; it is desirable to have as many cache hits as possible, but this situation prefers MQT accesses be grouped together to exploit temporal locality before eviction.
- Reordering includes, in item 632 , creating workloads such that each of the workloads represents an ordering of the queries; and recombining and/or mutating the workloads to create new orderings of the queries.
- a GM simulates Darwinian natural selection by having population members (genetic workloads) compete against one another over successive generations in order to converge toward the best solution. Workloads evolve through multiple generations of adaptation and selection.
- one of the new orderings of the queries is identified as an ordering comprising a lowest processing cost.
- the evaluation produces the net benefit of executing the workload, accessing MQTs, and materializing/evicting MQTs in the cache.
- FIG. 7 illustrates another flow diagram comprising item 710 , wherein a query workload is submitted to the database and intercepted by the dynamic MQT scheduler (DMS).
- the query workload is sent to the MQT advisor for recommending candidate MQT and indexes (item 720 ); and, the MQT advisor returns candidate MQTs/indexes, their size/benefits/creation cost, as well as queries to MQTs/index mapping (item 730 ).
- the DMS Based on the information sent by the MQT advisor, in item 740 , the DMS generates execution sequence (query, MQT/index creation. Based on the execution sequence in item 740 , the DMS issues commands to the database system in item 750 .
- the query processor receives the commands from the DMS and executes them. Both base tables (item 770 ) and MQTs/index (item 780 ) are used in the query processing if MQTs/indexes generate benefits. If a MQT/index needs to be created (and issued by the DMS), and there is no space left in the LRU cache, the LRU cache ejects less recently used objects to have space for the requested MQTs/indexes (item 790 ).
- the embodiments of the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment including both hardware and software elements.
- the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- the embodiments of the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
- Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
- Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
- a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
- I/O devices can be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- FIG. 8 A representative hardware environment for practicing the embodiments of the invention is depicted in FIG. 8 .
- the system comprises at least one processor or central processing unit (CPU) 10 .
- the CPUs 10 are interconnected via system bus 12 to various devices such as a random access memory (RAM) 14 , read-only memory (ROM) 16 , and an input/output (I/O) adapter 18 .
- RAM random access memory
- ROM read-only memory
- I/O input/output
- the I/O adapter 18 can connect to peripheral devices, such as disk units 11 and tape drives 13 , or other program storage devices that are readable by the system.
- the system can read the inventive instructions on the program storage devices and follow these instructions to execute the methodology of the embodiments of the invention.
- the system further includes a user interface adapter 19 that connects a keyboard 15 , mouse 17 , speaker 24 , microphone 22 , and/or other user interface devices such as a touch screen device (not shown) to the bus 12 to gather user input.
- a communication adapter 20 connects the bus 12 to a data processing network 25
- a display adapter 21 connects the bus 12 to a display device 23 which may be embodied as an output device such as a monitor, printer, or transmitter, for example.
- the embodiments of the invention provide an automated, dynamic view management scheme that materializes views on-demand as a workload is executing and manages the views with an LRU cache.
- the scheme makes an adaptive tradeoff between the view materializations, base table accesses, and the benefit of view hits in the cache.
- a genetic method is used to search the N! solution space.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The embodiments of the invention provide a method, program storage device, etc. for automated and dynamic management of query views for database workloads. More specifically, a method begins by executing queries, which includes accessing a set of data tables for each of the queries. During the executing of the queries, the method accesses a required data table from a cache if the required data table is present in the cache and creates the required data table if the required data table is not present in the cache. The accessing of the required data table from the cache has a lower processing cost than accessing the required data table from a base table. Also during the executing of the queries, created data tables are stored in the cache, wherein one or more of the created data tables are removed from the cache when the cache becomes full.
Description
- 1. Field of the Invention
- The embodiments of the invention provide a method, program storage device, etc. for automated and dynamic management of query views for database workloads.
- 2. Description of the Related Art
- Within this application several publications are referenced by arabic numerals within parentheses. Full citations for these, and other, publications may be found at the end of the specification immediately preceding the claims. The disclosures of all these publications in their entireties are hereby expressly incorporated by reference into the present application for the purposes of indicating the background of the present invention and illustrating the state of the art.
- A materialized view, or materialized query table (also referred to herein as “MQT” or “data table”), is an auxiliary table with precomputed data that can be used to significantly improve the performance of a database query. With its MQT matching capability, a database query optimizer can explore the possibility of reducing the query processing cost by appropriately replacing parts of a query with existing and matched MQTs.
- A query rewritten to utilize the MQT has one join operation instead of two, thus allowing its query processing cost to be reduced significantly. Since the creation of MQTs can be expensive compared to the benefit of the MQTs to a single query, MQTs are usually created for the whole batch query workload so that the accumulated benefits exceed the cost of their materialization.
- The embodiments of the invention provide a method, program storage device, etc. for automated and dynamic management of query views for database workloads. More specifically, a method begins by executing queries, which includes accessing a set of data tables (also referred to herein as “materialized views”) for each of the queries. The data tables summarize common portions of the queries. During the executing of the queries, the method accesses a required data table from a cache if the required data table is present in the cache. The method creates the required data table if the required data table is not present in the cache and if a benefit of accessing the required data table exceeds a cost of creating the required data table. The accessing of the required data table from the cache has a lower processing cost than accessing the required data table from a base table.
- Also during the executing of the queries, created data tables are stored in the cache, wherein one or more of the created data tables are removed from the cache when the cache becomes full. Prior to the executing of the queries, the cache comprises zero required data tables.
- In addition, the method reorders the queries. This can include creating workloads such that each of the workloads represents an ordering of the queries, wherein the workloads are recombined and/or mutated to create new orderings of the queries. Next, one of the new orderings of the queries is identified as an ordering having a lowest processing cost. The method also includes calculating a net benefit of a data table by subtracting a cost of executing a query with the data table from a cost of executing the query without the data table and multiplying by a total number of occurrences of the data table within the queries. The reordering of the queries can be based on a ranking of net benefits of the data tables.
- Accordingly, the embodiments of the invention provide an automated, dynamic view management scheme that materializes views on-demand as a workload is executing and manages the views with an least recently used (LRU) cache. In order to maximize the benefit of executing queries with materialized views, the scheme makes an adaptive tradeoff between the view materializations, base table accesses, and the benefit of view hits in the cache. To find the workload permutation that produces the overall highest net benefit, a genetic method is used to search the N! solution space.
- These and other aspects of the embodiments of the invention will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments of the invention and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments of the invention without departing from the spirit thereof, and the embodiments of the invention include all such modifications.
- The embodiments of the invention will be better understood from the following detailed description with reference to the drawings, in which:
-
FIG. 1 illustrates a classification of MQT management scenarios table; -
FIG. 2 illustrates a pseudocode for a genetic search method; -
FIG. 3 illustrates MQT creation; -
FIG. 4 illustrates early MQT creation optimization; -
FIG. 5 illustrates query preemption; -
FIG. 6 is a flow diagram illustrating a method for automated and dynamic management of query views for database workloads -
FIG. 7 is a flow diagram illustrating another method for automated and dynamic management of query views for database workloads; and -
FIG. 8 is a diagram illustrating a program storage device for automated and dynamic management of query views for database workloads. - The embodiments of the invention and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. It should be noted that the features illustrated in the drawings are not necessarily drawn to scale. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments of the invention. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments of the invention may be practiced and to further enable those of skill in the art to practice the embodiments of the invention. Accordingly, the examples should not be construed as limiting the scope of the embodiments of the invention.
- The embodiments herein provide an automated, dynamic view management scheme that materializes views on-demand as a workload is executing and manages the views with an LRU cache. In order to maximize the benefit of executing queries with materialized views, the scheme makes an adaptive tradeoff between the view materializations, base table accesses, and the benefit of view hits in the cache. To find the workload permutation that produces the overall highest net benefit, a genetic method is used to search the N! solution space.
- As MQTs are required in OLAP (Online Analytical Processing) applications in which the query workloads tend to have complex structure and syntax, a Materialized Query Table Advisor (MQTA), such as the IBM DB2 Design Advisor [1], available from International Business Machines, Armonk, N.Y., USA, is often required to recommend MQTs and appropriate indexes on them. While referring to an MQT, embodiments herein assume that it includes its appropriate indexes.
- An MQTA takes a workload (the read and write queries to the database system) and the database space size allocated for MQTs (i.e. MQT cache size) as the input. It first performs workload compression to remove those insignificant queries which are inexpensive or infrequent. It then performs multi-query optimization [2] to derive common parts in the workload and generates candidate MQTs.
- First, the MQTA calculates the benefits of these candidate MQTs in terms of resource time reduction and calculates the overhead (in terms of resource time) for refreshing MQTs by incorporating database updates and estimates the size of the MQTs. Next, the MQTA calculates the utility of each MQT by dividing net benefit (i.e. benefit minus overhead of creating the MQT) by the size of the MQT and its index size. The MQTA then recommends the MQTs whose utility values are higher than a given threshold.
- In addition to IBM DB2 Design Advisor [1, 3], several vendors have MQTAs, including Oracle 10g [4] and SQL Server [5]. These advisors deploy a common static approach to managing database views: the views are prematerialized prior to executing the workloads. While this approach is sound when the size of the viewset (i.e., a set of views) on disk is small, it will not be able to materialize all views when faced with real-world constraints (such as view maintenance costs or disk space limits) and thus will fail to exploit the potentially large benefits of those views not selected for materialization.
- Previous industry and academic research efforts in this area have concentrated on the aspect of finding the best candidate MQT set to pre-materialize. The embodiments of the invention follow a complementary approach and present an automated, dynamic view management scheme that materializes MQTs on-demand as a batch workload executes and manages the MQTs with an LRU cache. To maximize the benefit of executing queries with cached MQTs, the scheme makes an adaptive tradeoff between the cost of MQT materializations, the cost of accessing base tables in lieu of the MQT, and the benefit of MQT cache hits. To achieve high MQT cache hits, the order of the queries in the workload are permuted, and the permutation that produces the overall highest benefit is found using a self-adapting genetic method to search the N! permutation solution space.
- In
FIG. 1 , MQT management scenarios are classified into four categories based on two conditions, namely whether or not query workloads can be reordered and whether or not MQTs can be materialized and replaced dynamically. Using current MQT advisors, such as described in [1], MQTs are recommended and created to fill the disk space that users specify. The recommended MQTs (and their indexes) are materialized in advance before workloads are executed. During the execution of the workloads, the MQTs are not replaced. As the MQTs are fixed during the workload execution, whether or not the workload is reordered will not make any difference. These are scenarios (1) and (2). - Scenarios (3) and (4) represent dynamic materialization and replacement of MQTs. Without the possibility of workload reordering, an MQT is materialized, e.g., MQTx, as long as its net benefit (i.e. total benefit minus materialization cost) is positive before it is replaced by another MQTy at T0.
- With the possibility of reordering the query workload as in scenario (4), if there are queries that arrive after T0 and can benefit from MQTx, it may be desirable to execute these queries before swapping out MQTx. Therefore, in the scenario (4) the highest flexibility for managing MQTs is offered to minimize the response time of a query workload. The embodiments of the invention focus on scenario (4), which subsumes (3).
- For example, a batch workload's execution and interaction with the MQTs can be modeled in the following manner. The workload is represented as a queue of N queries. It is assumed that an MQT Advisor product then generates a list of M candidate MQTs that are beneficial to the workload. N is typically larger than M (for example, in some experiments N=200 and M=20). It is further assumed that the MQTs are read-only and that the queries are mutually independent of one another.
- The M candidate MQTs are then randomly mapped to the N queries to model the situation where each query makes use of a small set of MQTs. The number of MQTs per query is randomly chosen for each query based on a nonuniform distribution ranging from 0 MQTs/query to 4 MQTs/query. The assignment of which MQTs to belong to a given query is determined with a uniform random distribution. The size of individual MQTs is determined by a Gaussian random variable with a varying mean determined by experiment; the sizes are typically on the order of tens to hundreds of MBytes. The queries in the workload are executed sequentially in the order that they appear in the queue. For each query, the assigned MQTs on disk are in turn accessed sequentially.
- Three approaches to MQT management are modeled. First, in the static model, the MQTs are pre-materialized before the workload begins and are used throughout the workload. If an MQT does not exist for a given query, the query must access the base table to get the data. Next, in the dynamic simple model, the MQTs are aggressively materialized on-demand and managed in an LRU cache. When a query executes and its needed MQT does not exist (i.e. upon cache miss), the MQT is always materialized. Additionally, the dynamic advanced model is a compromise between the previous models. An LRU cache is still maintained, but MQTs are not always materialized when there is a cache miss. Instead, a subset of available candidate MQTs that are managed via the cache are created, and for those MQTs not in this set, queries access their respective data by reading from the base tables.
- The dynamic simple model is too aggressive in materializing MQTs on-demand; this aggression is throttled in the dynamic advanced model.
- In the static model commonly used by current commercial database products, a subset of the candidate MQTs is chosen to be pre-materialized before the workload is executed. Previous research has focused on finding the best MQTs to place into the candidate set. The candidate MQTs produced from the MQT advisor are then typically first scanned and then sorted by decreasing benefit. Informally, the benefit of an individual MQT is a measure of how much it improves the execution time of a query. The embodiments herein follow a simplified benefit model that calculates the benefit Bi of the ith MQT as follows. First, let γ be the cost of a query to execute without an MQT in units of time. This includes the cost for the query to access the base tables. Second, let κ be the cost of a query to execute with an MQT in units of time.
- The difference γ−κ is the benefit of one use of the MQT. The benefit Bi is then simply the sum of all the benefits of MQT i across all queries in the workload. Once Bi is calculated for all the MQTs, the MQTs are sorted based on this score. The score is the benefits divided by MQT size.
- Given a list of these MQTs sorted on benefit score, the system materializes the MQTs in decreasing benefit order until a disk usage limit is reached. These pre-materialized MQTs are kept on disk throughout the execution of the workload. If a query requires an MQT that has not been materialized, the query must access a base table.
- When the size of the candidate MQT set is small and can fit on disk, this approach is sound. However, due to real world limits, not all beneficial MQTs will be pre-materialized. For instance, the view maintenance cost may be too high or the disk space may be too small. Those queries whose required MQTs have not been materialized instead incur the cost of accessing the base tables, which can have a substantially negative impact in workload performance. The static approach thus will fail to exploit the potentially large benefits of those MQTs that were not selected for materialization.
- In the dynamic models of the embodiments herein, the MQTs are not pre-materialized. Instead, MQTs are materialized on-demand when queries execute and are managed in an LRU cache. Such an approach makes a tradeoff between the negative cost of materialization time and the positive benefit of MQT hits in the cache which obviate the need to access base tables.
- The rationale for the dynamic models can be seen with the following intuitive example. A workload comprises five queries, each of which access the same MQT. The cost of materializing the MQT is 2000 seconds, the cost of executing the query with the MQT is 100 seconds, and the cost of executing the query without the MQT is 500 seconds. If all five queries execute without the MQT, the workload execution time is 500×5=2500 seconds. On the other hand, if the MQT is used and materialized on-demand in the cache, then the execution time is 2000+100×5=2500, which represents a one-time materialization cost and five successive hits in the cache. It can be seen that if there are six or more hits of the MQT in the cache, then the on-demand materialization approach provides greater benefit than running the workload without the MQT materialized because MQT hits in the cache makes accessing the base tables unnecessary. In the case of the static approach, not having a needed MQT available can be likely if the disk limit was already reached during the pre-materialization phase.
- The dynamic models execute by sequentially running the queries in the workload queue. Each query has its own MQT set, and for each MQT in the set, the MQT's benefit is calculated based upon whether or not the MQT is present in the cache. If there is an MQT hit, the MQT is accessed. If there is an MQT miss, the MQT is materialized at that moment and placed into the cache. If an MQT must be removed from the cache to make room, eviction follows LRU policy; however, if a cached MQT is to be evicted but is in the set of required MQTs for the current query, then the MQT is kept in the cache.
- The quantitative benefit of the MQTs for the workload can be calculated as follows. First, let N be the number of queries in the workload queue; let Si be the set of MQTs required by query i and |Si| be the set's size; and, let γ be the cost of a query to execute without an MQT in units of time. This includes the cost for the query to access the base tables (same as in the static model). Further, let κ be the cost of a query to execute with an MQT in units of time (same as in the static model); and, let λ be the cost to materialize an MQT in units of time. Let hit(i) be 1 if accessing MQT i incurs a cache hit and 0 otherwise; and, let miss(i) be 1 if accessing MQT i incurs a cache miss and 0 otherwise. For a given MQT, a cache hit and a cache miss are mutually exclusive. It is apparent that hit(i) and mat(i) vary over time based on the cache state. Furthermore, for simplicity, it is assumed that γ, κ, and λ are constant for all MQTs.
- The net benefit B of executing the queries with the MQTs over executing without the MQTs can be calculated using the following equation.
-
- The inner summation represents the net benefit for executing one query with its set of MQTs. The difference γ−κ is always positive, whereas γ=λ is negative. The outer summation represents the net benefit for all the queries in the workload in the order that they appear.
- In the dynamic simple model, all the MQTs in the candidate MQT set (as suggested by the MQT Advisor) are managed via the LRU cache. However, this approach may be too aggressive in its materializations: some MQTs suggested by the MQTA are not used often enough during the workload to warrant multiple materializations and evictions via the cache. The end result for these MQTs is that their net benefit across the workload is a negative value. Thus, a dynamic advanced model is also provided wherein a subset of the candidate MQT set is managed via the cache. This subset is found as follows. After the workload is simulated once and negative-valued MQTs are found, a binary search is performed on the size of the candidate MQT set. For these reduced sizes, the candidate MQTs are sorted by their net benefit from the previous simulation round and are selected on decreasing order. The dynamic advanced model produces a better query workload execution than either the static model (which produces too many base table accesses) or the dynamic simple model (which produces too many materializations).
- In both dynamic models, the net benefit depends on the occurrence of MQT hits and misses in the cache, which in turn is a consequence of the query order in the workload. Reordering is performed due to the cache's LRU replacement policy; it is desirable to have as many cache hits as possible, but this situation prefers that MQT accesses be grouped together to exploit temporal locality before eviction.
- Given these observations, the complexity of maximizing the benefit attained via the dynamic models reduces to the problem of finding the optimal permutation of the workload queue that produces the highest net benefit of MQT use. Although an LRU cache can be used to manage the MQTs, this selection of replacement policy is only a matter of choice: because the common nature of replacement policies is to exploit locality of reference in the access stream, the fundamental problem is finding an optimal permutation of the workload that can take advantage of whatever policy is being used.
- With a queue size of N queries, there are N! permutations to search. Even with a small workload (e.g. N=20), the search space is prohibitively large for an exhaustive search. Thus, a genetic search heuristic is provided for finding the optimum permutation.
- Given a search space of N! permutations of the query workload, the problem is to find the optimal workload order that produces the highest benefit via the use of the MQTs. To examine this solution space, a self-adapting genetic method (GM) search heuristic is used [6, 7]. A GM simulates Darwinian natural selection by having population members (genetic chromosomes) compete against one another over successive generations in order to converge toward the best solution. As shown in
FIG. 2 , chromosomes evolve through multiple generations of adaptation and selection. In this code, the variable t represents the current generation and P(t) represents the population at that generation. - Although other search heuristics exist that can solve optimization problems (e.g. simulated annealing or steepestascent hill-climbing), the dynamic MQT management problem lends itself well to a GM because potential solutions can be represented as a permutation of unique integers identifying the queries in the workload. A given ordering of the integers represents a particular query order, which in turn determines the order that the MQTs are accessed. This permutation-based representation is known in GM research and allows the leveraging of prior research in effective chromosome recombination (e.g. [8]).
- A GM proceeds as follows. Initially a random set of chromosomes (also referred to herein as “workloads”) is created for the population. The chromosomes are evaluated (hashed) to some metric, and the best ones are chosen to be parents. Thus, the evaluation produces the net benefit of executing the workload, accessing MQTs, and materializing/evicting MQTs in the cache. The parents recombine to produce children, simulating sexual crossover, and occasionally a mutation may arise which produces new characteristics that were not available in either parent. An adaptive mutation scheme is further provided whereby the mutation rate is increased when the population stagnates (i.e. fails to improve its workload benefit metric) over a prolonged number of generations. The children are ranked based on the evaluation function, and the best subset of the children is chosen to be the parents of the next generation, simulating natural selection. The generational loop ends after some stopping condition is met; e.g., end after 1000 generations had passed, as this value is a tradeoff between simulation execution time and thoroughness of the search. Converging toward and finding the global optimum is not guaranteed because the recombination and mutation are stochastic.
- As mentioned, the chromosomes are permutations of unique integers. Recombination of chromosomes is applied to two parents to produce a new child using a two-point crossover scheme [8] where a randomly chosen contiguous subsection of the first parent is copied to the child, and then all remaining items in the second parent (that have not already been taken from the first parent's subsection) are then copied to the child in order of appearance. The uni-chromosome mutation scheme chooses two random items from the chromosome and reverses the elements between them, inclusive. Other recombination and mutation schemes may also be utilized.
- A GM component is the evaluation function. Given a particular chromosome representing one workload permutation, the function deterministally calculates the net benefit of using MQTs managed by an LRU cache during the workload. The calculations are based on the following equation:
-
- The evaluation function can be replaced if desired; for example, other evaluation function can model different cache replacement policies or execution of queries in parallel.
- The above analysis provides a compact metric for resource utilization. The response time for the user can be further improved with an additional optimization. In the scheduling method, MQT materialization is considered as the needed “pre-staging” of a query or set of queries. The scheduling method defines the partial order of events, namely query execution and MQT materialization/eviction. In terms of the query execution, other components such as the query patroller or workload manager will take the schedule and execute them as efficiently as possible (sequentially or in parallel) to yield the shortest elapsed time. To yield the shortest query execution, the system needs to know various parameters such as CPU utilization, I/O bandwidth, and the number of connections.
- The embodiments herein consider the pre-staging of the MQTs in preparation for the
- queries to execute. The dynamic MQTs are materialized dynamically, which imposes a
- materialization time cost (in addition to the resource usage cost noted above).
FIG. 3 illustrates MQT creation, where the MQTs are materialized on-demand when an MQT is scheduled to execute. For example, when query 320 is scheduled, its needed MQT (MQT C) is materialized into the cache. - Specifically,
FIG. 3 illustrates a queryworkload having Query 1,Query 2, andQuery 3.Query 1 accesses MQT A and MQT B;Query 2 accesses MQT C; and,Query 3 accesses MQT D. Thus, duringQuery 1, the MQT cache (having a maximum capacity of two MQTs) includes MQT A and MQT B. DuringQuery 2, the MQT cache includes MQT C and MQT A. Finally, duringQuery 3, the MQT cache includes MQT C and MQT D. - This materialization time cost can be potentially hidden from the user by having the MQT materialized before the query is executed, so that by the time the query is due to start, all its needed MQTs are already in the cache.
FIG. 4 shows optimization where the materialization is performed early. For example, beforequery 3 executes, its needed MQT (MQT D) is materialized whilequery 2 is executing. This approach prefers that the system look into the cache to see if there is available space to hold the needed MQTs; if there is not, previous MQTs are evicted, but only if they are not currently being used by an executing query. If the needed MQTs cannot be materialized, the system performs materialization at the time new query executes, essentially falling back to the approach and analysis presented above. - The description above naturally supports batch query workloads, which is a common scenario. Incoming queries can be supported in two ways. First, query preemption relies on an already-generated batch workload schedule; it is diagrammed in
FIG. 5 . When a batch query workload is scheduled, the MQTs are materialized dynamically and reside in the cache according to the query order generated by the genetic method. Thus, given that the query order is known, the period of time that each MQT stays materialized in the cache is also known; this is called the MQT materialization schedule. When a new query arrives (e.g., Query 4), its required MQTs is observed (e.g., MQT C) and the MQT materialization schedule is searched for a time slot when the query's MQT is materialized in the cache (e.g., during the execution of Query 2). If no time slot exists that will satisfy all of the query's needed MQTs, the slot with the most beneficial MQTs is chosen. The query is then inserted into the query workload schedule, thereby preempting all subsequent queries in the workload (e.g., Query 3). The tradeoff is that the queries that were already scheduled will be delayed unfairly, but the benefit will be that the incoming queries will be satisfied in this framework. - Thus, with query preemption, new incoming queries can preempt queries in the in-progress workload. The position of the preemption in the workload is chosen to maximize MQT use by the incoming query. The drawback to this scheme is that in-progress queries are unfairly pushed back.
- Secondly, an entirely new schedule for both incoming and existing queries can be created. Considering the same scenario above where a batch query workload has been scheduled, when new queries arrive, they can aggregated together to create a new workload batch; this new batch is no longer added to either when a periodic timer runs out or when a batch size limit is reached. When the new batch is ready, the previous batch may or may not have already ended. In the former case, the new batch is scheduled as in the original method. In the latter case, the remainder of the current batch workload can be combined with the new batch, and the genetic method can be run to produce a new schedule. By working on the aggregate workload that contains both prior and new queries, the GM will ideally produce a very tight schedule. Running the GM in this case takes into consideration that some MQTs have already been materialized and are already in the cache.
- In addition, the above two methods may be combined to suit the nature of the incoming queries. Depending on the distribution of the incoming queries' workload size and arrival rate, a user can switch dynamically between the two strategies.
- Accordingly, the embodiments of the invention provide methods for automated and dynamic management of query views for database workloads. More specifically, a method begins by executing queries, which includes accessing a set of data tables (also referred to herein as “MQTs” or “materialized views”) for each of the queries. The data tables summarize common portions of the queries. As discussed above, with its MQT matching capability, a database query optimizer can explore the possibility of reducing the query processing cost by appropriately replacing parts of a query with existing and matched MQTs.
- During the executing of the queries, the method accesses a required data table from a cache if the required data table is present in the cache. The method creates the required data table if the required data table is not present in the cache and if a benefit of accessing the required data table exceeds a cost of creating the required data table. As discussed above, the dynamic advanced model is a compromise between the static model and the dynamic simple model. An LRU cache is still maintained, but MQTs are not always materialized when there is a cache miss. Instead, a subset of available candidate MQTs that are managed via the cache are created, and for those MQTs not in this set, queries access their respective data by reading from the base tables. The accessing of the required data table from the cache has a lower processing cost than accessing the required data table from a base table.
- Also during the executing of the queries, created data tables are stored in the cache, wherein one or more of the created data tables are removed from the cache when the cache becomes full. As discussed above, if an MQT must be removed from the cache to make room, eviction follows LRU policy; however, if a cached MQT is to be evicted but is in the set of required MQTs for the current query, then the MQT is kept in the cache. Prior to the executing of the queries, the cache comprises zero required data tables.
- In addition, the method reorders the queries. As discussed above, the net benefit depends on the occurrence of MQT hits and misses in the cache, which in turn is a consequence of the query order in the workload. Reordering is performed due to the cache's LRU replacement policy; it is desirable to have as many cache hits as possible, but this situation prefers MQT accesses be grouped together to exploit temporal locality before eviction.
- Thus, workloads can be created such that each of the workloads represents an ordering of the queries. As discussed above, a random set of workloads is initially created for the population. The workloads are evaluated (hashed) to some metric, and the best ones are chosen to be parents. Thus, the evaluation produces the net benefit of executing the workload, accessing MQTs, and materializing/evicting MQTs in the cache. Further, the workloads are recombined and/or mutated to create new orderings of the queries. As discussed above, the parents recombine to produce children, simulating sexual crossover, and occasionally a mutation may arise which produces new characteristics that were not available in either parent. An adaptive mutation scheme is further provided whereby the mutation rate is increased when the population stagnates (i.e. fails to improve its workload benefit metric) over a prolonged number of generations. Next, one of the new orderings of the queries is identified as an ordering having a lowest processing cost. As discussed above, the children are ranked based on the evaluation function, and the best subset of the children is chosen to be the parents of the next generation, simulating natural selection. The method also includes calculating a net benefit of a materialized view by subtracting a cost of executing a query with the materialized view from a cost of executing the query without the materialized view and multiplying by a total number of occurrences of the materialized view within the queries. The reordering of the queries can be based on a ranking of net benefits of the materialized views.
-
FIG. 6 is a flow diagram illustrating amethod 600 for automated and dynamic management of query views for database workloads. Themethod 600 begins initem 610 by executing queries, which includes accessing a set of data tables for each of the queries. As discussed above, with its MQT matching capability, a database query optimizer can explore the possibility of reducing the query processing cost by appropriately replacing parts of a query with existing and matched MQTs. - During the executing of the queries, in
item 620, themethod 600 accesses a required data table from a cache if the required data table is present in the cache, creates the required data table if the required data table is not present in the cache and if a benefit of accessing the required data table exceeds a cost of creating the required data table, and stores created data tables in the cache. As discussed above, the dynamic advanced model is a compromise between the static model and the dynamic simple model. An LRU cache is still maintained, but MQTs are not always materialized when there is a cache miss. - In
item 622, the accessing of the required data table from the cache comprises a lower processing cost than accessing the required data table from a base table. Initem 624, prior to the executing of the queries, the cache comprises zero required data tables. Also during the executing of the queries, initem 626, at least one of the created data tables is removed from the cache when the cache becomes full. As discussed above, if an MQT must be removed from the cache to make room, eviction follows LRU policy; however, if a cached MQT is to be evicted but is in the set of required MQTs for the current query, then the MQT is kept in the cache. - Following this, in
item 630, themethod 600 reorders the queries. As discussed above, reordering is performed due to the cache's LRU replacement policy; it is desirable to have as many cache hits as possible, but this situation prefers MQT accesses be grouped together to exploit temporal locality before eviction. Reordering includes, initem 632, creating workloads such that each of the workloads represents an ordering of the queries; and recombining and/or mutating the workloads to create new orderings of the queries. As discussed above, a GM simulates Darwinian natural selection by having population members (genetic workloads) compete against one another over successive generations in order to converge toward the best solution. Workloads evolve through multiple generations of adaptation and selection. Next, initem 640, one of the new orderings of the queries is identified as an ordering comprising a lowest processing cost. As discussed above, the evaluation produces the net benefit of executing the workload, accessing MQTs, and materializing/evicting MQTs in the cache. -
FIG. 7 illustrates another flowdiagram comprising item 710, wherein a query workload is submitted to the database and intercepted by the dynamic MQT scheduler (DMS). Next, the query workload is sent to the MQT advisor for recommending candidate MQT and indexes (item 720); and, the MQT advisor returns candidate MQTs/indexes, their size/benefits/creation cost, as well as queries to MQTs/index mapping (item 730). Based on the information sent by the MQT advisor, initem 740, the DMS generates execution sequence (query, MQT/index creation. Based on the execution sequence initem 740, the DMS issues commands to the database system initem 750. Following this, initem 760, the query processor receives the commands from the DMS and executes them. Both base tables (item 770) and MQTs/index (item 780) are used in the query processing if MQTs/indexes generate benefits. If a MQT/index needs to be created (and issued by the DMS), and there is no space left in the LRU cache, the LRU cache ejects less recently used objects to have space for the requested MQTs/indexes (item 790). - The embodiments of the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment including both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- Furthermore, the embodiments of the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
- A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
- Input/output (I/O) devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- A representative hardware environment for practicing the embodiments of the invention is depicted in
FIG. 8 . This schematic drawing illustrates a hardware configuration of an information handling/computer system in accordance with the embodiments of the invention. The system comprises at least one processor or central processing unit (CPU) 10. TheCPUs 10 are interconnected viasystem bus 12 to various devices such as a random access memory (RAM) 14, read-only memory (ROM) 16, and an input/output (I/O)adapter 18. The I/O adapter 18 can connect to peripheral devices, such asdisk units 11 and tape drives 13, or other program storage devices that are readable by the system. The system can read the inventive instructions on the program storage devices and follow these instructions to execute the methodology of the embodiments of the invention. The system further includes auser interface adapter 19 that connects akeyboard 15,mouse 17,speaker 24,microphone 22, and/or other user interface devices such as a touch screen device (not shown) to thebus 12 to gather user input. Additionally, acommunication adapter 20 connects thebus 12 to adata processing network 25, and adisplay adapter 21 connects thebus 12 to adisplay device 23 which may be embodied as an output device such as a monitor, printer, or transmitter, for example. - Accordingly, the embodiments of the invention provide an automated, dynamic view management scheme that materializes views on-demand as a workload is executing and manages the views with an LRU cache. In order to maximize the benefit of executing queries with materialized views, the scheme makes an adaptive tradeoff between the view materializations, base table accesses, and the benefit of view hits in the cache. To find the workload permutation that produces the overall highest net benefit, a genetic method is used to search the N! solution space.
- The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments of the invention have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments of the invention can be practiced with modification within the spirit and scope of the appended claims.
-
- [1] Daniel C. Zilio, Calisto Zuzarte, Sam Lightstone, Wenbin Ma, Roberta Cochrane Guy M. Lohman, Hamid Pirahesh, Latha S. Colby, Jarek Gryz, Eric Alton, Dongming Liang, and Gary Valentin. Recommending Materialized Views and Indexes with IBM DB2 Design Advisor. In Proceedings of the International Conference on Autonomic Computing, 2004.
- [2] W. Lehner, B. Cochrane, H. Pirahesh, and M. Zaharioudakis. Applying Mass Query Optimization to Speed up Automatic Summary Table Refresh. In Proceedings of the International Conference on Data Engineering, 2001.
- [3] Daniel C. Zilio, Jun Rao, Sam Lightstone, Guy M. Lohman, Adam Storm, Christian Garcia-Arellano, and Scott Fadden. DB2 Design Advisor: Integrated Automatic Physical Database Design. In Proceedings of the International Conference on Very Large Data Bases, 2004.
- [4] Oracle Corp. http://www.oracle.com/.
- [5] S. Agrawal, S. Chaudhuri, and V. Narasayya. Automated Selection of Materialized Views and Indexes for SQL Database. In Proceedings of the International Conference on Very Large Data Bases, 2000. 14
- [6] J. Holland. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press, 1992.
- [7] D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Kluwer Academic Publishers, 1989.
- [8] L. Davis. Job Shop Scheduling with Genetic Algorithms. In Proceedings of the International Conference on Genetic Algorithms, pages 136-140, 1985.
- [9] Elke A. Rundensteiner, Andreas Koeller, and Xin Zhang. Maintaining data warehouses over changing information sources. Communications of the ACM, 43(6):57-62, 2000.
- [10] Latha S. Colby, Timothy Griffin, Leonid Libkin, Inderpal Singh Mumick, and Howard Trickey. Algorithms for Deferred View Maintenance. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 469-480, June 1996.
- [11] Dallan Quass, Ashish Gupta, Inderpal Mumick, and Jennifer Widom. Making Views Self-Maintainable for Data Warehousing. In Proceedings of the International Conference on Parallel and Distributed Information Systems, December 1996.
- [12] D. Agrawal, A. El Abbadi, A. Singh, and T. Yurek. Efficient View Maintenance in Data Warehouses. In Proceedings of the 1997 ACM International Conference on Management of Data, pages 417-427, May 1997.
- [13] K. Salem, K. S. Beyer, R. Cochrane, and B. G. Lindsay. How to roll a join: Asynchronous Incremental View Maintenance. In Proceedings of the 2000 ACM International Conference on Management of Data, pages 129-140, May 2000.
- [14] Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In Proceedings of the International Conference on Data Engineering, 2000.
- [15] P. Shivam, A. Iamnitchi, A. R. Yumerefendi, and J. S. Chase. Model-driven placement of compute tasks and data in a networked utility. In Proceedings of the International Conference on Autonomic Computing, pages 344-345, 2005.
- [16] S. Adali, K. S. Candan, Y. Papakonstantinou, and V. S. Subrahmaninan. Query Caching and Optimization in Distributed Mediator Systems. In Proceedings of the 1996 ACM International Conference on Management of Data, pages 137-148, May 1996.
- [17] J. Shim, P. Scheuermann, and R. Vingralek. Dynamic Caching of Query Results for Decision Support Systems. In Proceedings of the 1999 Symposium on Statistical and Scientific Data Base Management, pages 254-263, 1999.
- [18] Q. Luo, J. F. Naughton, R. Krishnamurthy, P. Cao, and Y. Li. Active Query Caching for Database Web Servers. In Proceedings of the 2000 WebDB (informal proceedings), pages 29-34, 2000.
- [19] C. Chen and N. Roussopoulos. The Implementation and Performance Evaluation of the ADMS Query Optimizer: Integrating Query Result Caching and Matching. In Proceedings of the 1994 Conference on Extending Data Base Technology, pages 323-336, 1994.
- [20] K. Amiri, S. Park, R. Tewari, and S. Padmanabhan. DBProxy: A Dynamic Data Cache for Web Applications. In Proceedings of the International Conference on Data Engineering, 2003.
- [21] The TimesTen Team. Mid-tier Caching: the TimesTen Approach. In Proceedings of 2002 ACM SIGMOD Conference, Madison, Wis., USA, June 2002.
- [22] Paul Larson, Jonathan Goldstein, and Jingren Zhou. MTCache: Transparent Mid-Tier Database Caching in SQL Server. In Proceedings of the International Conference on Data Engineering, 2004. 15
- [23] Jonathan Goldstein and Paul Larson. Optimizing Queries Using Materialized Views: A Practical, Scalable Solution. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2001.
- [24] Gen M. and R. Cheng. Genetic Algorithms and Engineering Optimization. Wiley-Interscience, 1999.
Claims (2)
1. A method, comprising:
executing queries, comprising accessing a set of data tables for each of said queries, wherein said data tables summarize common portions of said queries; and
during said executing of said queries,
accessing a required data table from a cache if said required data table is present in said cache,
creating said required data table if said required data table is not present in said cache and if a benefit of accessing said required data table exceeds a cost of creating said required data table, and
storing created data tables in said cache, wherein said accessing of said required data table from said cache comprises a lower processing cost than accessing said required data table from a base table, wherein, prior to said executing of said queries, said cache comprises zero required data tables, and
reordering said queries, wherein said reordering of said queries comprises:
creating workloads such that each of said workloads represents an ordering of said queries; and
at least one of recombining and mutating said workloads to create new orderings of said queries;
identifying one of said new orderings of said queries as an ordering comprising a lowest processing cost; and
during said executing of said queries, removing at least one of said created data tables from said cache when said cache becomes full.
2-20. (canceled)
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/624,876 US20080177700A1 (en) | 2007-01-19 | 2007-01-19 | Automated and dynamic management of query views for database workloads |
EP08701525A EP2122502B1 (en) | 2007-01-19 | 2008-01-16 | Automated and dynamic management of query views for database workloads |
PCT/EP2008/050458 WO2008087162A1 (en) | 2007-01-19 | 2008-01-16 | Automated and dynamic management of query views for database workloads |
AT08701525T ATE510262T1 (en) | 2007-01-19 | 2008-01-16 | AUTOMATED AND DYNAMIC MANAGEMENT OF REQUEST VIEWS FOR DATABASE LOADWAYS |
US12/055,461 US7716214B2 (en) | 2007-01-19 | 2008-03-26 | Automated and dynamic management of query views for database workloads |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/624,876 US20080177700A1 (en) | 2007-01-19 | 2007-01-19 | Automated and dynamic management of query views for database workloads |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/055,461 Continuation US7716214B2 (en) | 2007-01-19 | 2008-03-26 | Automated and dynamic management of query views for database workloads |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080177700A1 true US20080177700A1 (en) | 2008-07-24 |
Family
ID=39186078
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/624,876 Abandoned US20080177700A1 (en) | 2007-01-19 | 2007-01-19 | Automated and dynamic management of query views for database workloads |
US12/055,461 Active 2027-03-18 US7716214B2 (en) | 2007-01-19 | 2008-03-26 | Automated and dynamic management of query views for database workloads |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/055,461 Active 2027-03-18 US7716214B2 (en) | 2007-01-19 | 2008-03-26 | Automated and dynamic management of query views for database workloads |
Country Status (4)
Country | Link |
---|---|
US (2) | US20080177700A1 (en) |
EP (1) | EP2122502B1 (en) |
AT (1) | ATE510262T1 (en) |
WO (1) | WO2008087162A1 (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080228697A1 (en) * | 2007-03-16 | 2008-09-18 | Microsoft Corporation | View maintenance rules for an update pipeline of an object-relational mapping (ORM) platform |
US20110196857A1 (en) * | 2010-02-09 | 2011-08-11 | International Business Machines Corporation | Generating Materialized Query Table Candidates |
US9208260B1 (en) * | 2010-06-23 | 2015-12-08 | Google Inc. | Query suggestions with high diversity |
US20150363399A1 (en) * | 2014-06-12 | 2015-12-17 | International Business Machines Corporation | Generating and accessing a data table |
US9569477B1 (en) * | 2010-12-29 | 2017-02-14 | EMC IP Holding Company LLC | Managing scanning of databases in data storage systems |
US9740721B2 (en) | 2014-06-12 | 2017-08-22 | International Business Machines Corporation | Generating and accessing a data table |
US10037164B1 (en) | 2016-06-29 | 2018-07-31 | EMC IP Holding Company LLC | Flash interface for processing datasets |
US10055351B1 (en) | 2016-06-29 | 2018-08-21 | EMC IP Holding Company LLC | Low-overhead index for a flash cache |
US10089025B1 (en) | 2016-06-29 | 2018-10-02 | EMC IP Holding Company LLC | Bloom filters in a flash memory |
CN108769729A (en) * | 2018-05-16 | 2018-11-06 | 东南大学 | Caching arrangement system based on genetic algorithm and caching method |
US10146438B1 (en) | 2016-06-29 | 2018-12-04 | EMC IP Holding Company LLC | Additive library for data structures in a flash memory |
US10261704B1 (en) | 2016-06-29 | 2019-04-16 | EMC IP Holding Company LLC | Linked lists in flash memory |
US10331561B1 (en) * | 2016-06-29 | 2019-06-25 | Emc Corporation | Systems and methods for rebuilding a cache index |
US10423620B2 (en) * | 2017-04-22 | 2019-09-24 | International Business Machines Corporation | Runtime creation of remote derived sources for query offload |
US10810196B2 (en) | 2017-12-13 | 2020-10-20 | Hewlett-Packard Development Company, L.P. | Materialized view generation |
US11663179B2 (en) | 2020-12-21 | 2023-05-30 | International Business Machines Corporation | Data simulation for regression analysis |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7917462B1 (en) * | 2007-11-09 | 2011-03-29 | Teradata Us, Inc. | Materializing subsets of a multi-dimensional table |
US8108340B2 (en) * | 2008-03-28 | 2012-01-31 | Yahoo! Inc. | Search engine configured to minimize performance degradation under high load |
US9355129B2 (en) * | 2008-10-14 | 2016-05-31 | Hewlett Packard Enterprise Development Lp | Scheduling queries using a stretch metric |
CN101931609B (en) * | 2009-06-22 | 2014-07-30 | Sap股份公司 | Layout abiding service-level agreement for multiple-tenant database |
US9152668B1 (en) * | 2010-01-29 | 2015-10-06 | Asana, Inc. | Asynchronous computation batching |
US8356027B2 (en) * | 2010-10-07 | 2013-01-15 | Sap Ag | Hybrid query execution plan generation and cost model evaluation |
US9224121B2 (en) | 2011-09-09 | 2015-12-29 | Sap Se | Demand-driven collaborative scheduling for just-in-time manufacturing |
US8660949B2 (en) | 2011-09-09 | 2014-02-25 | Sap Ag | Method and system for working capital management |
US8744888B2 (en) | 2012-04-04 | 2014-06-03 | Sap Ag | Resource allocation management |
US10268721B2 (en) * | 2013-11-07 | 2019-04-23 | Salesforce.Com, Inc | Protected handling of database queries |
EP2966600B1 (en) * | 2014-07-07 | 2018-08-15 | derivo GmbH | Abstraction refinement for scalable type reasoning in ontology-based data repositories |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6026391A (en) * | 1997-10-31 | 2000-02-15 | Oracle Corporation | Systems and methods for estimating query response times in a computer system |
US20030088541A1 (en) * | 2001-06-21 | 2003-05-08 | Zilio Daniel C. | Method for recommending indexes and materialized views for a database workload |
US20040181521A1 (en) * | 1999-12-22 | 2004-09-16 | Simmen David E. | Query optimization technique for obtaining improved cardinality estimates using statistics on pre-defined queries |
US20060036576A1 (en) * | 1999-12-22 | 2006-02-16 | International Business Machines Corporation | Using data in materialized query tables as a source for query optimization statistics |
US20070050328A1 (en) * | 2005-08-29 | 2007-03-01 | International Business Machines Corporation | Query routing of federated information systems for fast response time, load balance, availability, and reliability |
US20070130107A1 (en) * | 2005-12-02 | 2007-06-07 | Microsoft Corporation | Missing index analysis and index useage statistics |
US20070174292A1 (en) * | 2006-01-26 | 2007-07-26 | Wen-Syan Li | Autonomic recommendation and placement of materialized query tables for load distribution |
US20070250524A1 (en) * | 2006-04-19 | 2007-10-25 | Jian Le | Method and apparatus for workload and model based materialized query table or view recommendation technique |
US20080091642A1 (en) * | 2006-10-12 | 2008-04-17 | Robert Joseph Bestgen | Advising the generation of a maintained index over a subset of values in a column of a table |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6487641B1 (en) * | 1999-04-19 | 2002-11-26 | Oracle Corporation | Dynamic caches with miss tables |
US8200659B2 (en) * | 2005-10-07 | 2012-06-12 | Bez Systems, Inc. | Method of incorporating DBMS wizards with analytical models for DBMS servers performance optimization |
US8224813B2 (en) * | 2006-10-20 | 2012-07-17 | Oracle International Corporation | Cost based analysis of direct I/O access |
-
2007
- 2007-01-19 US US11/624,876 patent/US20080177700A1/en not_active Abandoned
-
2008
- 2008-01-16 EP EP08701525A patent/EP2122502B1/en not_active Not-in-force
- 2008-01-16 WO PCT/EP2008/050458 patent/WO2008087162A1/en active Application Filing
- 2008-01-16 AT AT08701525T patent/ATE510262T1/en not_active IP Right Cessation
- 2008-03-26 US US12/055,461 patent/US7716214B2/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6026391A (en) * | 1997-10-31 | 2000-02-15 | Oracle Corporation | Systems and methods for estimating query response times in a computer system |
US20040181521A1 (en) * | 1999-12-22 | 2004-09-16 | Simmen David E. | Query optimization technique for obtaining improved cardinality estimates using statistics on pre-defined queries |
US20060036576A1 (en) * | 1999-12-22 | 2006-02-16 | International Business Machines Corporation | Using data in materialized query tables as a source for query optimization statistics |
US20030088541A1 (en) * | 2001-06-21 | 2003-05-08 | Zilio Daniel C. | Method for recommending indexes and materialized views for a database workload |
US20070050328A1 (en) * | 2005-08-29 | 2007-03-01 | International Business Machines Corporation | Query routing of federated information systems for fast response time, load balance, availability, and reliability |
US20070130107A1 (en) * | 2005-12-02 | 2007-06-07 | Microsoft Corporation | Missing index analysis and index useage statistics |
US20070174292A1 (en) * | 2006-01-26 | 2007-07-26 | Wen-Syan Li | Autonomic recommendation and placement of materialized query tables for load distribution |
US20070250524A1 (en) * | 2006-04-19 | 2007-10-25 | Jian Le | Method and apparatus for workload and model based materialized query table or view recommendation technique |
US20080091642A1 (en) * | 2006-10-12 | 2008-04-17 | Robert Joseph Bestgen | Advising the generation of a maintained index over a subset of values in a column of a table |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9430552B2 (en) * | 2007-03-16 | 2016-08-30 | Microsoft Technology Licensing, Llc | View maintenance rules for an update pipeline of an object-relational mapping (ORM) platform |
US20080228697A1 (en) * | 2007-03-16 | 2008-09-18 | Microsoft Corporation | View maintenance rules for an update pipeline of an object-relational mapping (ORM) platform |
US10268742B2 (en) | 2007-03-16 | 2019-04-23 | Microsoft Technology Licensing, Llc | View maintenance rules for an update pipeline of an object-relational mapping (ORM) platform |
US20110196857A1 (en) * | 2010-02-09 | 2011-08-11 | International Business Machines Corporation | Generating Materialized Query Table Candidates |
US8620899B2 (en) * | 2010-02-09 | 2013-12-31 | International Business Machines Corporation | Generating materialized query table candidates |
US9208260B1 (en) * | 2010-06-23 | 2015-12-08 | Google Inc. | Query suggestions with high diversity |
US9569477B1 (en) * | 2010-12-29 | 2017-02-14 | EMC IP Holding Company LLC | Managing scanning of databases in data storage systems |
US20150363441A1 (en) * | 2014-06-12 | 2015-12-17 | International Business Machines Corporation | Generating and accessing a data table |
US9679013B2 (en) * | 2014-06-12 | 2017-06-13 | International Business Machines Corporation | Generating and accessing a data table |
US9679014B2 (en) * | 2014-06-12 | 2017-06-13 | International Business Machines Corporation | Generating and accessing a data table |
US9740721B2 (en) | 2014-06-12 | 2017-08-22 | International Business Machines Corporation | Generating and accessing a data table |
US9886463B2 (en) | 2014-06-12 | 2018-02-06 | International Business Machines Corporation | Generating and accessing a data table |
US10713228B2 (en) | 2014-06-12 | 2020-07-14 | International Business Machines Corporation | Generating and accessing a data table |
US20150363399A1 (en) * | 2014-06-12 | 2015-12-17 | International Business Machines Corporation | Generating and accessing a data table |
US10146438B1 (en) | 2016-06-29 | 2018-12-04 | EMC IP Holding Company LLC | Additive library for data structures in a flash memory |
US10936207B2 (en) | 2016-06-29 | 2021-03-02 | EMC IP Holding Company LLC | Linked lists in flash memory |
US10089025B1 (en) | 2016-06-29 | 2018-10-02 | EMC IP Holding Company LLC | Bloom filters in a flash memory |
US10261704B1 (en) | 2016-06-29 | 2019-04-16 | EMC IP Holding Company LLC | Linked lists in flash memory |
US10055351B1 (en) | 2016-06-29 | 2018-08-21 | EMC IP Holding Company LLC | Low-overhead index for a flash cache |
US10318201B2 (en) | 2016-06-29 | 2019-06-11 | EMC IP Holding Company LLC | Flash interface for processing datasets |
US10331561B1 (en) * | 2016-06-29 | 2019-06-25 | Emc Corporation | Systems and methods for rebuilding a cache index |
US10353607B2 (en) | 2016-06-29 | 2019-07-16 | EMC IP Holding Company LLC | Bloom filters in a flash memory |
US10353820B2 (en) | 2016-06-29 | 2019-07-16 | EMC IP Holding Company LLC | Low-overhead index for a flash cache |
US11182083B2 (en) | 2016-06-29 | 2021-11-23 | EMC IP Holding Company LLC | Bloom filters in a flash memory |
US10521123B2 (en) | 2016-06-29 | 2019-12-31 | EMC IP Holding Company LLC | Additive library for data structures in a flash memory |
US10037164B1 (en) | 2016-06-29 | 2018-07-31 | EMC IP Holding Company LLC | Flash interface for processing datasets |
US11113199B2 (en) | 2016-06-29 | 2021-09-07 | EMC IP Holding Company LLC | Low-overhead index for a flash cache |
US11106373B2 (en) | 2016-06-29 | 2021-08-31 | EMC IP Holding Company LLC | Flash interface for processing dataset |
US11106586B2 (en) | 2016-06-29 | 2021-08-31 | EMC IP Holding Company LLC | Systems and methods for rebuilding a cache index |
US11106362B2 (en) | 2016-06-29 | 2021-08-31 | EMC IP Holding Company LLC | Additive library for data structures in a flash memory |
US10423620B2 (en) * | 2017-04-22 | 2019-09-24 | International Business Machines Corporation | Runtime creation of remote derived sources for query offload |
US10810196B2 (en) | 2017-12-13 | 2020-10-20 | Hewlett-Packard Development Company, L.P. | Materialized view generation |
CN108769729A (en) * | 2018-05-16 | 2018-11-06 | 东南大学 | Caching arrangement system based on genetic algorithm and caching method |
US11663179B2 (en) | 2020-12-21 | 2023-05-30 | International Business Machines Corporation | Data simulation for regression analysis |
Also Published As
Publication number | Publication date |
---|---|
EP2122502A1 (en) | 2009-11-25 |
ATE510262T1 (en) | 2011-06-15 |
WO2008087162A1 (en) | 2008-07-24 |
EP2122502B1 (en) | 2011-05-18 |
US20080183667A1 (en) | 2008-07-31 |
US7716214B2 (en) | 2010-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7716214B2 (en) | Automated and dynamic management of query views for database workloads | |
Hernández et al. | Using machine learning to optimize parallelism in big data applications | |
Cheng et al. | Improving performance of heterogeneous mapreduce clusters with adaptive task tuning | |
US9892150B2 (en) | Unified data management for database systems | |
US7917498B2 (en) | Method and system for dynamic join reordering | |
Gounaris et al. | Adaptive query processing: A survey | |
US10922316B2 (en) | Using computing resources to perform database queries according to a dynamically determined query size | |
US9720623B2 (en) | Management of data in multi-storage systems that can include non-volatile and volatile storages | |
US20020087798A1 (en) | System and method for adaptive data caching | |
Hassan et al. | Optimizing the performance of data warehouse by query cache mechanism | |
Zhao et al. | Automatic database knob tuning: a survey | |
Herodotou | AutoCache: Employing machine learning to automate caching in distributed file systems | |
Phan et al. | Dynamic materialization of query views for data warehouse workloads | |
Tos et al. | Achieving query performance in the cloud via a cost-effective data replication strategy | |
US11379375B1 (en) | System and method for cache management | |
Abdul et al. | Database workload management through CBR and fuzzy based characterization | |
Abebe et al. | Tiresias: enabling predictive autonomous storage and indexing | |
Li et al. | S/C: Speeding up Data Materialization with Bounded Memory | |
US11762831B2 (en) | Adaptive sparse indexing in cloud-based data warehouses | |
Madaan et al. | Prioritized dynamic cube selection in data warehouse | |
Bharati et al. | Hybrid Graph Partitioning with OLB Approach in Distributed Transactions. | |
Floratou et al. | Adaptive caching algorithms for big data systems | |
Valavala et al. | A Survey on Database Index Tuning and Defragmentation | |
Du et al. | A web cache replacement strategy for safety-critical systems | |
US11874836B2 (en) | Configuring graph query parallelism for high system throughput |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LI, WEN-SYAN;PHAN, THOMAS;REEL/FRAME:018782/0629;SIGNING DATES FROM 20070102 TO 20070103 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE |