WO2016210200A1 - Automated validation of database index creation - Google Patents

Automated validation of database index creation Download PDF

Info

Publication number
WO2016210200A1
WO2016210200A1 PCT/US2016/039126 US2016039126W WO2016210200A1 WO 2016210200 A1 WO2016210200 A1 WO 2016210200A1 US 2016039126 W US2016039126 W US 2016039126W WO 2016210200 A1 WO2016210200 A1 WO 2016210200A1
Authority
WO
WIPO (PCT)
Prior art keywords
index
validation
validation data
database
data
Prior art date
Application number
PCT/US2016/039126
Other languages
French (fr)
Inventor
Andrija JOVANOVIC
Isidora JOVANDIC
Dejan KRAKOVIC
Milos ACIMOVIC
Zeljko NIKOLICIC
Petar LOTREAN
Miroslav Grbic
Dorde MAKSIMOVIC
Svetlana STANOJEVIC
Original Assignee
Microsoft Technology Licensing, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing, Llc filed Critical Microsoft Technology Licensing, Llc
Publication of WO2016210200A1 publication Critical patent/WO2016210200A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof

Definitions

  • Databases allow some order to be imposed upon the vast world of information that is available. Rather than manually scan through all of the available data until the data of interest is found, queries may be issued to the database, allowing for retrieval of only the results of the query.
  • queries may be issued to the database, allowing for retrieval of only the results of the query.
  • database entities such as tables, views, and so forth
  • the database entities are often indexed over one or more columns. The index essentially tells where data having parameters of particular values can be found.
  • indexing against all columns and values of all database entities becomes an infeasible task.
  • Tuning a database involves adjusting various configurations and structures of the database so that the database responds more efficiently to the queries that tend to be submitted to the database.
  • One aspect of such tuning involves creating indexes on particular columns on particular database entities.
  • At least some embodiments described herein relate to a system that automatically validates index creation on a database entity in an environment that includes a collection of potentially multiple (and potentially many) databases.
  • the validation module validates index impact of a created index by using a validation data store that contains validation data originating from the database collection.
  • the validation data might include, for instance, metadata from queries made to the database collection.
  • Index impact may be estimated by evaluating validation data generated prior to and after the creation of the index to thereby determine whether the created index results in overall improved query performance on the database collection for those queries that target the newly indexed database entity.
  • Such validation data need not even contain private data that was contained within the query itself, and might be, for instance, query performance data, or execution plans associated with the query, with private data redacted.
  • Figure 1 abstractly illustrates a computing system in which some embodiments described herein may be employed
  • Figure 2 illustrates a system in which validation of index creations may occur by verifying improved overall query performance over the database collection for those queries that target the newly indexed database entity;
  • Figure 3 illustrates validation information that is generated by the database collection of Figure 2, which validation information is made available to the tuning portion of Figure 2 and includes missing index data, query performance data, and private data; and
  • Figure 4 illustrates a flowchart of a method for automatically validating the creation of an index.
  • At least some embodiments described herein relate to a system that automatically validates index creation on a database entity in an environment that includes a collection of potentially multiple (and potentially many) databases.
  • the validation module validates index impact of a created index by using a validation data store that contains validation data originating from the database collection.
  • the validation data might include, for instance, metadata from queries made to the database collection.
  • Index impact may be estimated by evaluating validation data generated prior to and after the creation of the index to thereby determine whether the created index results in overall improved query performance on the database collection for those queries that target the newly indexed database entity.
  • Such validation data need not even contain private data that was contained within the query itself, and might be, for instance, query performance data, or execution plans associated with the query, with private data redacted.
  • Computing systems are now increasingly taking a wide variety of forms.
  • Computing systems may, for example, be handheld devices, appliances, laptop computers, desktop computers, mainframes, distributed computing systems, datacenters, or even devices that have not conventionally been considered a computing system, such as wearables (e.g., glasses).
  • the term "computing system” is defined broadly as including any device or system (or combination thereof) that includes at least one physical and tangible processor, and a physical and tangible memory capable of having thereon computer-executable instructions that may be executed by a processor to thereby provision the computing system for a special purpose.
  • the memory may take any form and may depend on the nature and form of the computing system.
  • a computing system may be distributed over a network environment and may include multiple constituent computing systems.
  • a computing system 100 typically includes at least one hardware processing unit 102 and memory 104.
  • the memory 104 may be physical system memory, which may be volatile, non-volatile, or some combination of the two.
  • the term “memory” may also be used herein to refer to non-volatile mass storage such as physical storage media. If the computing system is distributed, the processing, memory and/or storage capability may be distributed as well.
  • the term "executable module” or “executable component” can refer to software objects, routines, or methods that may be executed on the computing system.
  • the different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). With such objects and processes operating upon the computing system, the computing system is the equivalent of a special purpose computer that functions for the special purpose accomplished by the objects.
  • processors of the associated computing system that performs the act
  • computer-executable instructions may be embodied on one or more computer-readable media that form a computer program product.
  • An example of such an operation involves the manipulation of data.
  • the computer-executable instructions (and the manipulated data) may be stored in the memory 104 of the computing system 100.
  • Computing system 100 may also contain communication channels 108 that allow the computing system 100 to communicate with other computing systems over, for example, network 110.
  • the computing system 100 also includes a display 112, which may be used to display visual representations to a user.
  • Embodiments described herein may comprise or utilize a special purpose or general-purpose computing system including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below.
  • Embodiments described herein also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures.
  • Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computing system.
  • Computer-readable media that store computer-executable instructions are physical storage media.
  • Computer-readable media that carry computer-executable instructions are transmission media.
  • embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: storage media and transmission media.
  • Computer-readable storage media includes RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other physical and tangible storage medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computing system.
  • a "network” is defined as one or more data links that enable the transport of electronic data between computing systems and/or modules and/or other electronic devices.
  • a network or another communications connection can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computing system. Combinations of the above should also be included within the scope of computer-readable media.
  • program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to storage media (or vice versa).
  • computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a "NIC"), and then eventually transferred to computing system RAM and/or to less volatile storage media at a computing system.
  • a network interface module e.g., a "NIC”
  • storage media can be included in computing system components that also (or even primarily) utilize transmission media.
  • Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computing system, special purpose computing system, or special purpose processing device to perform a certain function or group of functions.
  • the computer executable instructions may be, for example, binaries or even instructions that undergo some translation (such as compilation) before direct execution by the processors, such as intermediate format instructions such as assembly language, or even source code.
  • the invention may be practiced in network computing environments with many types of computing system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, datacenters, wearables (such as glasses, watches, and so forth) and the like.
  • the invention may also be practiced in distributed system environments where local and remote computing systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
  • program modules may be located in both local and remote memory storage devices.
  • Figure 2 illustrates a system 200 in which the formulation of index recommendations in the context of multiple databases may occur.
  • the system 200 includes a collection 201 of databases. Queries may be submitted (as represented by arrow 202) that target one or more database entities of any, some, or all of the databases, and corresponding query results (as represented by arrow 203) are returned.
  • a "database” is broadly defined as any data store that may be queried against to obtain responses to the query.
  • the database itself may contain structured data, unstructured data, semi -structured data, or combinations thereof.
  • each of the databases generates additional validation data (as represented by arrow 204).
  • Figure 3 illustrates such validation data 300 and includes missing index data 310, query performance data 320, and private data 330. Such information may be even gathered live such that the information is gathered as it is generated.
  • the validation data 300 may include other information as well as represented by the ellipses 340.
  • Such validation data 300 may be generated on a per- database or on a per-database entity basis. However, the aggregate of such validation data from all databases is represented abstractly by arrow 204 in Figure 2.
  • Each database includes multiple and perhaps very many database entities, such as tables, views, and the like.
  • Each database entity keeps track of which parameters it is indexed on.
  • a database entity may be indexed on a parameter so that matching values of that parameter may be quickly found, whether useful for a final or intermediate result of the query.
  • a parameter corresponds to a column of a database entity, and thus conventional indexes of a database entity are indexed by column.
  • the principles described herein are not limited to how a parameter of a database entity is indexed. .
  • a system view is present for each database engine instance, and it is that system view that keeps track of what parameters it is indexed on. If a query is received that targets that database entity, and the query is against a particular parameter which is indexed on for that database entity, the parameter index may then be used to allow a response without fully scanning the database entity in full. However, if a query is received that causes the query engine to look for an index that is missing (i.e., a missing index), the database entity (e.g., the system view corresponding to each database engine) tracks that missing index, counts how many queries resulted in triggering of that missing index, and even estimates how much performance gain may be accomplished if the missing index (a hypothetical index) was provided. Such is collectively referred to herein as "missing index" data 310.
  • the query performance data 320 may include information regarding measured performance in the processing of queries. Such performance information might include, for instance, the following measurements for queries in general (or perhaps broken down by type of query): a number of processors cycles used to process queries, an amount of each type of resource used to process the queries, and so forth. Such information might include, for instance, group statistics (e.g., average, mean, median, extend, and so forth) for the queries. Such group statistics may be grouped over a period of time (such as perhaps a given time interval (e.g., in hours, days or the like). Thus, the query performance data 320 may include time-wise measurements of performance of queries in general, or types of queries.
  • group statistics e.g., average, mean, median, extend, and so forth
  • the database collection 201 is illustrated as including six databases 201A through 201F. However, the ellipses 201G represent that the principles described herein may apply to any collection 201 of multiple databases from as few as two databases, to an unlimited number of databases.
  • the collection 201 may be a cloud computing environment such as a public cloud, a private cloud, or a hybrid cloud that includes a public cloud and a private cloud (and/or an on-premises environment), and potentially include hundreds, thousands, or even millions of databases or more.
  • cloud computing is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services).
  • configurable computing resources e.g., networks, servers, storage, applications, and services.
  • the definition of “cloud computing” is not limited to any of the other numerous advantages that can be obtained from such a model when properly deployed.
  • cloud computing is currently employed in the marketplace so as to offer ubiquitous and convenient on-demand access to the shared pool of configurable computing resources.
  • the shared pool of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly.
  • a cloud computing model can be composed of various characteristics such as on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth.
  • a cloud computing model may also come in the form of various service models such as, for example, Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“IaaS”).
  • SaaS Software as a Service
  • PaaS Platform as a Service
  • IaaS Infrastructure as a Service
  • the cloud computing model may also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth.
  • a "cloud computing environment” is an environment in which cloud computing is employed.
  • the system 200 also includes a tuning portion 210 that serves to tune the database collection 201 to thereby improve the performance of the database collection.
  • the tuning portion 210 includes a gathering module 211.
  • the gathering module 211 is configured to gather at least a portion of the validation data 300.
  • the gathered information is provided into a validation data store 213.
  • the validation data store 213 may be any type of store, such as a single database, or combination of databases, non- database storage, or any combination of the above.
  • the validation data store 213 may be a single database.
  • the gathering module 211 is configured to apply a privacy policy such that private information (e.g., the private information 330 of Figure 3) generated by the databases within validation data 300 is not gathered by the gathering module 211. Accordingly, the privacy of those making queries (e.g., queries 202) into the database collection 210 is preserved.
  • private data or information is customer sensitive data that is to be kept within a compliance boundary associated with the customer.
  • the gathered information might include metadata associated with the queries, and not include the private data that is often included within the queries themselves.
  • each node may have a dedicated process which periodically collects data from the specified views on that node and pushes the data to a centralized storage location for all telemetry in that corresponding region of a cloud database service. From all regional telemetry locations the data is automatically extracted to the validation data storage 213.
  • the system 200 may include an index creation module 230.
  • the index creation module 230 initiates creation of at least one corresponding index on at least one corresponding database of the database collection.
  • the index creation module 230 may use information gathered by the gathering module 211 into the validation data store 213 to decide which index to create on which database entity of which database.
  • a user may simply instruct that an index be created on a corresponding database entity of a corresponding database.
  • a hybrid of this would be that the index creation module 230 uses the information gathered by the gathering module 211 to formulate one or more recommended indexing tasks, and then the user selects one of the recommended indexing tasks. That said, the principles described herein are about validating an index creation, regardless of the cause of the index creation.
  • the system also includes a validation module 220 that operates to automatically validate index creation on a database entity.
  • that aggregated validation data may be represented by the validation data 300 of Figure 3.
  • the aggregated validation data may be, for instance, the query performance data 320 of Figure 3.
  • This automatic validation automatically validates index impact of the created index by using aggregated validation data gather by the gathering module 211 into the validation data store to verify that the index creation results in overall improved query performance on the database collection for those queries that target the newly indexed database entity.
  • "overall" improved query performance does not mean that the performance of every query that targets the newly indexed database entity will be improved as a result of the created index. In fact, the index may potentially cause some of such queries to perform worse than without the index.
  • overall improved query performance means that there tends to be more query improvement that there is query regression as a result of the created index for those queries that target the newly indexed database entity.
  • Figure 4 illustrates a flowchart of a method 400 for validating the creation of an index.
  • the method 400 may be performed by, for instance, the validation module 220 of Figure 2.
  • the validation module determines that creation of the index is to be validated (act 401).
  • the validation module 220 may detect that the index creation module 230 is about to create an index.
  • the validation module 220 determines whether validation data has already been gathered (decision block 402). If so ("Yes" in decision block 402), then validation data already exists that can be evaluated as a benchmark for the period prior to the creation of the index. Accordingly, the index may be created (act 404). If the validation data for the period prior to index creation has not yet been gathered ("No" in decision block 402), then the validation data is gathered (act 403) prior to the index being created (act 404).
  • the validation data may have been continuously gathered live as the validation data is generated, with perhaps older validation data being discarded.
  • This scenario is helpful in that the index may be immediately created, but has the downside of having to expend processing, network, and storage resources regardless of whether an index is going to be created, and thus such resources might not ever be used.
  • This scenario may be helpful, however, if indices are frequently being created on the database collection, thereby increasing the likelihood that such resources will be helpful in validating an index.
  • the validation data generated after the index creation is gathered (act 405).
  • the validation data generated before and after the index creation are then evaluated (act 406) to be able to determine whether or not there has been a significant overall query performance gain as applied to the database collection for those queries that target the newly indexed database entity. If there has been a significant overall query performance gain ("Yes” in decision block 406), then the index is kept (act 407). If there has not been significant query performance gain ("No" in decision block 406), then the index is reverted (act 408).
  • the validation module 220 may, for instance, perform this reversion.
  • the validation decision may be referred to as an index impact of the created index.
  • the system 200 also includes a validation control module 240 that permits a user to control whether or not the validation module 220 validates improved overall query performance on the database collection as a result of an index that the user caused to be created. Also, the validation control module 240 may display output of the validation process, and allow a user to determine whether or not to keep or revert based on that result. For instance, the validation control module 240 may display output and controls on, for example, display 112.
  • the validation module 220 may be performed by a wide variety of created indexes.
  • the validation occurs using an impact verification model.
  • the goal of the impact verification model is to observe per query data on how the index is behaving (i.e., observe whether the index improves overall query performance on the database collection) and arrive at the keep or revert decision previously described. Revert decisions are used to improve the model such that, eventually, the model should get to a state where the overwhelming majority of indexes are kept.
  • the index impact is determined by monitoring resource usage of individual customer queries.
  • Monitored resources such as CPU time, logical IO
  • a desirable index would reduce such resource usage (such as CPU usage and reads) by allowing better query plans to be used, at the cost of more writes due to having to update one more structure on disk.
  • Improvements in logical resource usage regularly lead to improvements in physical resource usage (query duration, physical IO).
  • physical resources and/or logical resources may be monitored.
  • physical resource values tend to have inherent noise. Accordingly, some embodiments may simply measure logical resource usage.
  • the impact verification model enables collecting workload metadata, including customer queries, execution plans and their runtime stats over extended time period, and storing them within the customer database.
  • workload metadata may be the validation data that is stored within the validation data store 213.
  • the index verification module may use the following information from the workload metadata: 1) for each query, the set of execution plans used for the query, 2) for each execution plan, its entire content, with private data being hidden from the plan to comply with the privacy policy, and 3) for each plan and each periodic interval, the following statistics: number of executions, CPU time used per execution (average and standard deviation), logical reads and per execution (average and standard deviation), and logical writes per execution (average and standard deviation).
  • the required workload metadata is automatically exhausted from the customer database, directly from database engine.
  • an eventing structure may be used to extract this metadata and place such into the validation data store 213.
  • the eventing system may operate per database such that, if turned on, an event generating process on each database starts collecting the workload metadata from the corresponding database, including all the execution plans and their runtime statistics per time interval.
  • An advantage of this approach is that the exhausted data is actually only metadata, with all the private data being effectively hidden to avoid violating any customer privacy policy. This whole process scales well across a large number of databases. It is fully automated and takes a reasonable amount of time to validate, in order to provide a big enough validation data sample both prior and after index creation, so that the verdict on the created index is extremely accurate.
  • the workload metadata reaches its destination store where the verification model is run (i.e., the validation data store 213), the following steps may be executed.
  • noise is reduced by throwing away all queries that do not target the table over which the index was created.
  • a local repository of queries, plans, and runtime performance statistics for those queries and plans are merged across different plans and intervals to get the execution count, plus two metrics (average and standard deviation) for three different dimensions (CPU time used, logical reads, logical writes) in two distinct periods - before and after index creation.
  • the metric may be used to provide a verdict (i.e., keep or revert) for the created index, which empirically proved to be very accurate, is the following: if ( EstimatedTotalGain[CPU] > 0 ) then'keep' else 'revert').

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Automated validation of the creation of indices in an environment that include multiple and potential many databases, such as perhaps a cloud computing environment. A validation module validates index impact of a created index by using a validation data store that contains validation data originating from a database collection. Index impact may be estimated by evaluating validation data generated prior to and after the creation of the index to thereby determine whether the created index results in overall improved query performance on the database collection for those queries that target the newly indexed database entity. Such validation data need not even contain private data that was contained within the query itself, and might be, for instance, query performance data, or execution plans associated with the query, with private data redacted.

Description

AUTOMATED VALIDATION OF DATABASE INDEX CREATION
BACKGROUND
[0001] Computing systems and associated networks have revolutionized the way human beings work, play, and communicate. Nearly every aspect of our lives is affected in some way by computing systems. The proliferation of networks has allowed computing systems to share data and communicate, vastly increasing information access. For this reason, the present age is often referred to as the "information age".
[0002] Databases allow some order to be imposed upon the vast world of information that is available. Rather than manually scan through all of the available data until the data of interest is found, queries may be issued to the database, allowing for retrieval of only the results of the query. To allow queries to be efficiently processed over one or more database entities (such as tables, views, and so forth) of one or more databases, the database entities are often indexed over one or more columns. The index essentially tells where data having parameters of particular values can be found. However, there are often a large variety of parameters represented within a database entities and/or a large number of possible database entities. Accordingly, indexing against all columns and values of all database entities becomes an infeasible task.
[0003] In order to improve performance of a database, databases are often "tuned". Tuning a database involves adjusting various configurations and structures of the database so that the database responds more efficiently to the queries that tend to be submitted to the database. One aspect of such tuning involves creating indexes on particular columns on particular database entities.
[0004] The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.
BRIEF SUMMARY
[0005] At least some embodiments described herein relate to a system that automatically validates index creation on a database entity in an environment that includes a collection of potentially multiple (and potentially many) databases. The validation module validates index impact of a created index by using a validation data store that contains validation data originating from the database collection. The validation data might include, for instance, metadata from queries made to the database collection. Index impact may be estimated by evaluating validation data generated prior to and after the creation of the index to thereby determine whether the created index results in overall improved query performance on the database collection for those queries that target the newly indexed database entity. Such validation data need not even contain private data that was contained within the query itself, and might be, for instance, query performance data, or execution plans associated with the query, with private data redacted.
[0006] Accordingly, an efficient mechanism for substantially automating the tuning of large database collections without requiring significant user time is described herein. During the tuning, if a particular index creation does not actually result in overall improved query performance against of the database collection, that index creation may be automatically reverted or a revision recommendation might be issued. The net result is substantial improvement of overall query performance on the database collection through simplified user interface, and allowing to be achieved what would be virtually impossible if the user themselves had to investigate each database for possible tuning opportunities.
[0007] This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] In order to describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description of various embodiments will be rendered by reference to the appended drawings. Understanding that these drawings depict only sample embodiments and are not therefore to be considered to be limiting of the scope of the invention, the embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
[0009] Figure 1 abstractly illustrates a computing system in which some embodiments described herein may be employed;
[0010] Figure 2 illustrates a system in which validation of index creations may occur by verifying improved overall query performance over the database collection for those queries that target the newly indexed database entity;
[0011] Figure 3 illustrates validation information that is generated by the database collection of Figure 2, which validation information is made available to the tuning portion of Figure 2 and includes missing index data, query performance data, and private data; and [0012] Figure 4 illustrates a flowchart of a method for automatically validating the creation of an index.
DETAILED DESCRIPTION
[0013] At least some embodiments described herein relate to a system that automatically validates index creation on a database entity in an environment that includes a collection of potentially multiple (and potentially many) databases. The validation module validates index impact of a created index by using a validation data store that contains validation data originating from the database collection. The validation data might include, for instance, metadata from queries made to the database collection. Index impact may be estimated by evaluating validation data generated prior to and after the creation of the index to thereby determine whether the created index results in overall improved query performance on the database collection for those queries that target the newly indexed database entity. Such validation data need not even contain private data that was contained within the query itself, and might be, for instance, query performance data, or execution plans associated with the query, with private data redacted.
[0014] Accordingly, an efficient mechanism for substantially automating the tuning of large database collections without requiring significant user time is described herein. During the tuning, if a particular index creation does not actually result in overall improved query performance against the database collection, that index creation may be automatically reverted or a revision recommendation might be issued. The net result is substantial improvement of overall query performance on the database collection through simplified user interface, and allowing to be achieved what would be virtually impossible if the user themselves had to investigate each database for possible tuning opportunities.
[0015] Some introductory discussion of a computing system will be described with respect to Figure 1. Then, the automated validation of index creation will be described with respect to subsequent drawings.
[0016] Computing systems are now increasingly taking a wide variety of forms. Computing systems may, for example, be handheld devices, appliances, laptop computers, desktop computers, mainframes, distributed computing systems, datacenters, or even devices that have not conventionally been considered a computing system, such as wearables (e.g., glasses). In this description and in the claims, the term "computing system" is defined broadly as including any device or system (or combination thereof) that includes at least one physical and tangible processor, and a physical and tangible memory capable of having thereon computer-executable instructions that may be executed by a processor to thereby provision the computing system for a special purpose. The memory may take any form and may depend on the nature and form of the computing system. A computing system may be distributed over a network environment and may include multiple constituent computing systems.
[0017] As illustrated in Figure 1, in its most basic configuration, a computing system 100 typically includes at least one hardware processing unit 102 and memory 104. The memory 104 may be physical system memory, which may be volatile, non-volatile, or some combination of the two. The term "memory" may also be used herein to refer to non-volatile mass storage such as physical storage media. If the computing system is distributed, the processing, memory and/or storage capability may be distributed as well. As used herein, the term "executable module" or "executable component" can refer to software objects, routines, or methods that may be executed on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). With such objects and processes operating upon the computing system, the computing system is the equivalent of a special purpose computer that functions for the special purpose accomplished by the objects.
[0018] In the description that follows, embodiments are described with reference to acts that are performed by one or more computing systems. If such acts are implemented in software, one or more processors (of the associated computing system that performs the act) direct the operation of the computing system in response to having executed computer-executable instructions, thereby converting and configuring the computing system for a more specialized purpose than without such direction. For example, such computer-executable instructions may be embodied on one or more computer-readable media that form a computer program product. An example of such an operation involves the manipulation of data. The computer-executable instructions (and the manipulated data) may be stored in the memory 104 of the computing system 100. Computing system 100 may also contain communication channels 108 that allow the computing system 100 to communicate with other computing systems over, for example, network 110. The computing system 100 also includes a display 112, which may be used to display visual representations to a user.
[0019] Embodiments described herein may comprise or utilize a special purpose or general-purpose computing system including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments described herein also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computing system. Computer-readable media that store computer-executable instructions are physical storage media. Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: storage media and transmission media.
[0020] Computer-readable storage media includes RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other physical and tangible storage medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computing system.
[0021] A "network" is defined as one or more data links that enable the transport of electronic data between computing systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computing system, the computing system properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computing system. Combinations of the above should also be included within the scope of computer-readable media.
[0022] Further, upon reaching various computing system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a "NIC"), and then eventually transferred to computing system RAM and/or to less volatile storage media at a computing system. Thus, it should be understood that storage media can be included in computing system components that also (or even primarily) utilize transmission media. [0023] Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computing system, special purpose computing system, or special purpose processing device to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries or even instructions that undergo some translation (such as compilation) before direct execution by the processors, such as intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
[0024] Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computing system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, datacenters, wearables (such as glasses, watches, and so forth) and the like. The invention may also be practiced in distributed system environments where local and remote computing systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
[0025] Figure 2 illustrates a system 200 in which the formulation of index recommendations in the context of multiple databases may occur. The system 200 includes a collection 201 of databases. Queries may be submitted (as represented by arrow 202) that target one or more database entities of any, some, or all of the databases, and corresponding query results (as represented by arrow 203) are returned. In this description and in the claims, a "database" is broadly defined as any data store that may be queried against to obtain responses to the query. The database itself may contain structured data, unstructured data, semi -structured data, or combinations thereof.
[0026] In addition to queries (represented by arrow 202) and responses (represented by arrow 203) thereto, each of the databases generates additional validation data (as represented by arrow 204). Figure 3 illustrates such validation data 300 and includes missing index data 310, query performance data 320, and private data 330. Such information may be even gathered live such that the information is gathered as it is generated. Of course, the validation data 300 may include other information as well as represented by the ellipses 340. Such validation data 300 may be generated on a per- database or on a per-database entity basis. However, the aggregate of such validation data from all databases is represented abstractly by arrow 204 in Figure 2.
[0027] Each database includes multiple and perhaps very many database entities, such as tables, views, and the like. Each database entity keeps track of which parameters it is indexed on. A database entity may be indexed on a parameter so that matching values of that parameter may be quickly found, whether useful for a final or intermediate result of the query. In conventional databases, a parameter corresponds to a column of a database entity, and thus conventional indexes of a database entity are indexed by column. However, the principles described herein are not limited to how a parameter of a database entity is indexed. .
[0028] In one example, a system view is present for each database engine instance, and it is that system view that keeps track of what parameters it is indexed on. If a query is received that targets that database entity, and the query is against a particular parameter which is indexed on for that database entity, the parameter index may then be used to allow a response without fully scanning the database entity in full. However, if a query is received that causes the query engine to look for an index that is missing (i.e., a missing index), the database entity (e.g., the system view corresponding to each database engine) tracks that missing index, counts how many queries resulted in triggering of that missing index, and even estimates how much performance gain may be accomplished if the missing index (a hypothetical index) was provided. Such is collectively referred to herein as "missing index" data 310.
[0029] The query performance data 320 may include information regarding measured performance in the processing of queries. Such performance information might include, for instance, the following measurements for queries in general (or perhaps broken down by type of query): a number of processors cycles used to process queries, an amount of each type of resource used to process the queries, and so forth. Such information might include, for instance, group statistics (e.g., average, mean, median, extend, and so forth) for the queries. Such group statistics may be grouped over a period of time (such as perhaps a given time interval (e.g., in hours, days or the like). Thus, the query performance data 320 may include time-wise measurements of performance of queries in general, or types of queries.
[0030] The database collection 201 is illustrated as including six databases 201A through 201F. However, the ellipses 201G represent that the principles described herein may apply to any collection 201 of multiple databases from as few as two databases, to an unlimited number of databases. For instance, the collection 201 may be a cloud computing environment such as a public cloud, a private cloud, or a hybrid cloud that includes a public cloud and a private cloud (and/or an on-premises environment), and potentially include hundreds, thousands, or even millions of databases or more.
[0031] In this description and the following claims, "cloud computing" is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services). The definition of "cloud computing" is not limited to any of the other numerous advantages that can be obtained from such a model when properly deployed.
[0032] For instance, cloud computing is currently employed in the marketplace so as to offer ubiquitous and convenient on-demand access to the shared pool of configurable computing resources. Furthermore, the shared pool of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly.
[0033] A cloud computing model can be composed of various characteristics such as on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth. A cloud computing model may also come in the form of various service models such as, for example, Software as a Service ("SaaS"), Platform as a Service ("PaaS"), and Infrastructure as a Service ("IaaS"). The cloud computing model may also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth. In this description and in the claims, a "cloud computing environment" is an environment in which cloud computing is employed.
[0034] The system 200 also includes a tuning portion 210 that serves to tune the database collection 201 to thereby improve the performance of the database collection. The tuning portion 210 includes a gathering module 211. The gathering module 211 is configured to gather at least a portion of the validation data 300. In one embodiment, the gathered information is provided into a validation data store 213. The validation data store 213 may be any type of store, such as a single database, or combination of databases, non- database storage, or any combination of the above. In one embodiment, the validation data store 213 may be a single database.
[0035] The gathering module 211 is configured to apply a privacy policy such that private information (e.g., the private information 330 of Figure 3) generated by the databases within validation data 300 is not gathered by the gathering module 211. Accordingly, the privacy of those making queries (e.g., queries 202) into the database collection 210 is preserved. Such private data or information is customer sensitive data that is to be kept within a compliance boundary associated with the customer. For instance, the gathered information might include metadata associated with the queries, and not include the private data that is often included within the queries themselves.
[0036] As an example, in a cloud computing environment, each node may have a dedicated process which periodically collects data from the specified views on that node and pushes the data to a centralized storage location for all telemetry in that corresponding region of a cloud database service. From all regional telemetry locations the data is automatically extracted to the validation data storage 213.
[0037] The system 200 may include an index creation module 230. The index creation module 230 initiates creation of at least one corresponding index on at least one corresponding database of the database collection. As an example only, the index creation module 230 may use information gathered by the gathering module 211 into the validation data store 213 to decide which index to create on which database entity of which database. Alternatively, a user may simply instruct that an index be created on a corresponding database entity of a corresponding database. A hybrid of this would be that the index creation module 230 uses the information gathered by the gathering module 211 to formulate one or more recommended indexing tasks, and then the user selects one of the recommended indexing tasks. That said, the principles described herein are about validating an index creation, regardless of the cause of the index creation.
[0038] The system also includes a validation module 220 that operates to automatically validate index creation on a database entity. For instance, that aggregated validation data may be represented by the validation data 300 of Figure 3. The aggregated validation data may be, for instance, the query performance data 320 of Figure 3.
[0039] This automatic validation automatically validates index impact of the created index by using aggregated validation data gather by the gathering module 211 into the validation data store to verify that the index creation results in overall improved query performance on the database collection for those queries that target the newly indexed database entity. Note that "overall" improved query performance does not mean that the performance of every query that targets the newly indexed database entity will be improved as a result of the created index. In fact, the index may potentially cause some of such queries to perform worse than without the index. However, overall improved query performance means that there tends to be more query improvement that there is query regression as a result of the created index for those queries that target the newly indexed database entity.
[0040] Figure 4 illustrates a flowchart of a method 400 for validating the creation of an index. The method 400 may be performed by, for instance, the validation module 220 of Figure 2. First, the validation module determines that creation of the index is to be validated (act 401). As an example, referring to Figure 2, the validation module 220 may detect that the index creation module 230 is about to create an index.
[0041] The validation module 220 then determines whether validation data has already been gathered (decision block 402). If so ("Yes" in decision block 402), then validation data already exists that can be evaluated as a benchmark for the period prior to the creation of the index. Accordingly, the index may be created (act 404). If the validation data for the period prior to index creation has not yet been gathered ("No" in decision block 402), then the validation data is gathered (act 403) prior to the index being created (act 404).
[0042] In the case of having already gathered the validation data ("Yes" in decision block 402), the validation data may have been continuously gathered live as the validation data is generated, with perhaps older validation data being discarded. This scenario is helpful in that the index may be immediately created, but has the downside of having to expend processing, network, and storage resources regardless of whether an index is going to be created, and thus such resources might not ever be used. This scenario may be helpful, however, if indices are frequently being created on the database collection, thereby increasing the likelihood that such resources will be helpful in validating an index. In the case of deferring the gathering of the validation data ("No" in decision block 402) until it is determined (act 401) that an index is to be created, there is the advantage that processing, storage, and network resources are utilized only when needed to validate the index creation. However, the disadvantage is that there is some delay prior to the index creation being created.
[0043] Once the index is created (act 404), the validation data generated after the index creation is gathered (act 405). The validation data generated before and after the index creation are then evaluated (act 406) to be able to determine whether or not there has been a significant overall query performance gain as applied to the database collection for those queries that target the newly indexed database entity. If there has been a significant overall query performance gain ("Yes" in decision block 406), then the index is kept (act 407). If there has not been significant query performance gain ("No" in decision block 406), then the index is reverted (act 408). The validation module 220 may, for instance, perform this reversion. In one embodiment, the validation decision may be referred to as an index impact of the created index.
[0044] The system 200 also includes a validation control module 240 that permits a user to control whether or not the validation module 220 validates improved overall query performance on the database collection as a result of an index that the user caused to be created. Also, the validation control module 240 may display output of the validation process, and allow a user to determine whether or not to keep or revert based on that result. For instance, the validation control module 240 may display output and controls on, for example, display 112.
[0045] The validation module 220 may be performed by a wide variety of created indexes. In one embodiment, the validation occurs using an impact verification model. The goal of the impact verification model is to observe per query data on how the index is behaving (i.e., observe whether the index improves overall query performance on the database collection) and arrive at the keep or revert decision previously described. Revert decisions are used to improve the model such that, eventually, the model should get to a state where the overwhelming majority of indexes are kept.
[0046] In accordance with the impact verification model, the index impact is determined by monitoring resource usage of individual customer queries. Monitored resources (such as CPU time, logical IO) are those that show how much work each query is supposed to be doing. A desirable index would reduce such resource usage (such as CPU usage and reads) by allowing better query plans to be used, at the cost of more writes due to having to update one more structure on disk. Improvements in logical resource usage regularly lead to improvements in physical resource usage (query duration, physical IO). However, in accordance with the principles described herein, physical resources and/or logical resources may be monitored. However, physical resource values tend to have inherent noise. Accordingly, some embodiments may simply measure logical resource usage.
[0047] The impact verification model enables collecting workload metadata, including customer queries, execution plans and their runtime stats over extended time period, and storing them within the customer database. For instance, in Figure 2, such workload metadata may be the validation data that is stored within the validation data store 213. For instance, the index verification module may use the following information from the workload metadata: 1) for each query, the set of execution plans used for the query, 2) for each execution plan, its entire content, with private data being hidden from the plan to comply with the privacy policy, and 3) for each plan and each periodic interval, the following statistics: number of executions, CPU time used per execution (average and standard deviation), logical reads and per execution (average and standard deviation), and logical writes per execution (average and standard deviation).
[0048] The required workload metadata is automatically exhausted from the customer database, directly from database engine. In one embodiment, an eventing structure may be used to extract this metadata and place such into the validation data store 213. The eventing system may operate per database such that, if turned on, an event generating process on each database starts collecting the workload metadata from the corresponding database, including all the execution plans and their runtime statistics per time interval.
[0049] After the workload metadata is exhausted from database, a dedicated process being run on each node collects this metadata and pushes it to the corresponding regional telemetry location (not shown), from where it is extracted to the validation data store 213 over which the impact verification model is run. All the private data is removed from the execution plans within the engine. Each plan is exhausted once, when it is initially executed, while the runtime statistics are aggregated and exhausted periodically. Thus, the query plan is generated as live data.
[0050] An advantage of this approach is that the exhausted data is actually only metadata, with all the private data being effectively hidden to avoid violating any customer privacy policy. This whole process scales well across a large number of databases. It is fully automated and takes a reasonable amount of time to validate, in order to provide a big enough validation data sample both prior and after index creation, so that the verdict on the created index is extremely accurate.
[0051] Once the workload metadata reaches its destination store where the verification model is run (i.e., the validation data store 213), the following steps may be executed. First, noise is reduced by throwing away all queries that do not target the table over which the index was created. For queries that do target that table, a local repository of queries, plans, and runtime performance statistics for those queries and plans are merged across different plans and intervals to get the execution count, plus two metrics (average and standard deviation) for three different dimensions (CPU time used, logical reads, logical writes) in two distinct periods - before and after index creation.
[0052] These metrics to calculate statistics for Welch's t test. The null hypothesis is that the query performance was not changed by the index operation, and Welch's test gives us the level of statistical significance with which we can discard it. For each of the three resource dimensions: all queries for which Welch's test with a significance level of 0.99 does not reject the null hypotheses are thrown away. For the remaining queries, gains (or losses) are determined as follows: EstimatedQueryGain = ExecutionCount · (AvgUsageBeforelndex - AvgUsageAfterlndex). The total gain/loss is: EstimatedTotalGain = ∑ EstimatedQueryGain. The metric may be used to provide a verdict (i.e., keep or revert) for the created index, which empirically proved to be very accurate, is the following: if ( EstimatedTotalGain[CPU] > 0 ) then'keep' else 'revert').
[0053] Accordingly, what has been described is an efficient mechanism for substantially automating the tuning of large collections of databases without requiring significant user time. The tuning is accomplished via the system validating indexes that improve the overall query performance of the collection of databases as a whole. The net result is substantial improvement of performance of the database collection through simplified user interface, and allowing to be achieved what would be virtually impossible if the user themselves had to investigate each database for possible tuning opportunities.
[0054] The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A computing system comprising a validation module configured to (i) operate to automatically validate index creation on a database entity in an environment that includes multiple databases, and (ii) to automatically validate index impact of a created index by using a validation data store that contains validation data originating from a plurality of databases.
2. The system in accordance with Claim 1, wherein the validation data store comprises a database that contains aggregated validation data from the plurality of databases.
3. The system in accordance with Claim 1, wherein the plurality of databases is contained at least in part within a cloud computing environment.
4. The system in accordance with Claim 1, wherein the validation module is further configured to automatically validate performance improvement by evaluating validation data generated prior to creation of the index, evaluating validation data generated after the creation of the index, and comparing the evaluations.
5. The system in accordance with Claim 4, wherein the validation module is further configured to recommend or decide to revert or keep the created index based on the comparison.
6. The system in accordance with Claim 1, wherein the validation data comprises query performance data.
7. The system in accordance with Claim 1, wherein the validation data comprises live workload data.
8. The system in accordance with Claim 1, wherein the validation data comprises metadata from queries issued against the plurality of databases.
9. The system in accordance with Claim 8, wherein the metadata comprises execution plans for performing queries against the database entity.
10. A computer-implemented method for automatically validating index creation on a database entity in an environment that includes multiple databases, the computer-implemented method being performed by one or more processors executing computer executable instructions for the computer-implemented method, and the computer-implemented method comprising:
at a validation module, determining that creation of an index is to be validated; the validation module communicating with a validation data store that contains validation data originating from a plurality of databases; and evaluating validation data from the validation data store that was generated before and after the creation of the index to estimate index impact.
PCT/US2016/039126 2015-06-26 2016-06-24 Automated validation of database index creation WO2016210200A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/752,575 2015-06-26
US14/752,575 US10061678B2 (en) 2015-06-26 2015-06-26 Automated validation of database index creation

Publications (1)

Publication Number Publication Date
WO2016210200A1 true WO2016210200A1 (en) 2016-12-29

Family

ID=56373132

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/039126 WO2016210200A1 (en) 2015-06-26 2016-06-24 Automated validation of database index creation

Country Status (2)

Country Link
US (1) US10061678B2 (en)
WO (1) WO2016210200A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160378822A1 (en) * 2015-06-26 2016-12-29 Microsoft Technology Licensing, Llc Automated recommendation and creation of database index
US10061678B2 (en) 2015-06-26 2018-08-28 Microsoft Technology Licensing, Llc Automated validation of database index creation
CN108804479B (en) * 2017-05-05 2022-08-19 张汉威 Data analysis system and analysis method thereof
US10810256B1 (en) * 2017-06-19 2020-10-20 Amazon Technologies, Inc. Per-user search strategies
US11068460B2 (en) * 2018-08-06 2021-07-20 Oracle International Corporation Automated real-time index management
US11182360B2 (en) * 2019-01-14 2021-11-23 Microsoft Technology Licensing, Llc Database tuning and performance verification using cloned database
US20220083610A1 (en) * 2020-08-17 2022-03-17 Branch Metrics, Inc. On-device functionality using remote system updates
US20240037079A1 (en) * 2022-07-27 2024-02-01 Dell Products L.P. Automated validation of database deployments

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1302870A2 (en) * 2001-10-12 2003-04-16 NCR International, Inc. Index selection in a database system
US20050165741A1 (en) * 2003-12-24 2005-07-28 Gordon Mark R. System and method for addressing inefficient query processing

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6223171B1 (en) * 1998-08-25 2001-04-24 Microsoft Corporation What-if index analysis utility for database systems
US6266658B1 (en) 2000-04-20 2001-07-24 Microsoft Corporation Index tuner for given workload
US7047231B2 (en) 2002-03-01 2006-05-16 Software Engineering Gmbh Getpage-workload based index optimizer
US7769744B2 (en) 2005-12-02 2010-08-03 Microsoft Cororation Missing index analysis and index useage statistics
US8838574B2 (en) * 2006-06-09 2014-09-16 International Business Machines Corporation Autonomic index creation, modification and deletion
US7761445B2 (en) * 2006-12-15 2010-07-20 Teradata Us, Inc. Automated system for identifying and dropping marginal database indexes
US9213740B2 (en) 2007-10-11 2015-12-15 Sybase, Inc. System and methodology for automatic tuning of database query optimizer
US8510290B1 (en) * 2008-12-30 2013-08-13 Teradata Us, Inc. Index selection in a multi-system database management system
US8386502B2 (en) * 2009-03-12 2013-02-26 Bank Of America Corporation Market identification system
US8489565B2 (en) * 2009-03-24 2013-07-16 Microsoft Corporation Dynamic integrated database index management
US8458167B2 (en) 2009-04-01 2013-06-04 International Business Machines Corporation Client-based index advisor
US9135299B2 (en) * 2009-09-01 2015-09-15 Teradata Us, Inc. System, method, and computer-readable medium for automatic index creation to improve the performance of frequently executed queries in a database system
US8626729B2 (en) * 2009-09-28 2014-01-07 Oracle International Corporation Database index monitoring system
US8805800B2 (en) 2010-03-14 2014-08-12 Microsoft Corporation Granular and workload driven index defragmentation
US10452629B2 (en) * 2010-07-13 2019-10-22 International Business Machines Corporation Automatic maintenance of a set of indexes with different currency characteristics in a database management system
US8707300B2 (en) 2010-07-26 2014-04-22 Microsoft Corporation Workload interference estimation and performance optimization
US8478845B2 (en) 2010-08-16 2013-07-02 International Business Machines Corporation End-to-end provisioning of storage clouds
US8706701B1 (en) 2010-11-18 2014-04-22 Emc Corporation Scalable cloud file system with efficient integrity checks
CN102651007A (en) * 2011-02-28 2012-08-29 国际商业机器公司 Method and device for managing database indexes
US8515927B2 (en) * 2011-09-02 2013-08-20 Bbs Technologies, Inc. Determining indexes for improving database system performance
US9218383B2 (en) * 2013-03-15 2015-12-22 International Business Machines Corporation Differentiated secondary index maintenance in log structured NoSQL data stores
US20140278807A1 (en) 2013-03-15 2014-09-18 Cloudamize, Inc. Cloud service optimization for cost, performance and configuration
US9792321B2 (en) * 2013-07-09 2017-10-17 Oracle International Corporation Online database migration
US9454558B2 (en) * 2014-04-23 2016-09-27 International Business Machines Corporation Managing an index of a table of a database
US20160378822A1 (en) 2015-06-26 2016-12-29 Microsoft Technology Licensing, Llc Automated recommendation and creation of database index
US10061678B2 (en) 2015-06-26 2018-08-28 Microsoft Technology Licensing, Llc Automated validation of database index creation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1302870A2 (en) * 2001-10-12 2003-04-16 NCR International, Inc. Index selection in a database system
US20050165741A1 (en) * 2003-12-24 2005-07-28 Gordon Mark R. System and method for addressing inefficient query processing

Also Published As

Publication number Publication date
US10061678B2 (en) 2018-08-28
US20160378634A1 (en) 2016-12-29

Similar Documents

Publication Publication Date Title
US10061678B2 (en) Automated validation of database index creation
Yadwadkar et al. Selecting the best vm across multiple public clouds: A data-driven performance modeling approach
US11620286B2 (en) Continuous cloud-scale query optimization and processing
EP3259681B1 (en) Method and device for deciding where to execute subqueries of an analytics continuous query
US20160378822A1 (en) Automated recommendation and creation of database index
AU2019280058B2 (en) Cloud-based platform instrumentation and monitoring system for maintenance of user-configured programs
US20170024433A1 (en) Query plan post optimization analysis and reoptimization
US8510721B2 (en) Dynamic calculation of sample profile reports
Ortiz et al. Changing the Face of Database Cloud Services with Personalized Service Level Agreements.
US20170132286A1 (en) Query hint management for a database management system
AU2021244852B2 (en) Offloading statistics collection
US20160292233A1 (en) Discarding data points in a time series
Nguyen et al. Using weaker consistency models with monitoring and recovery for improving performance of key-value stores
Wang et al. Turbo: Dynamic and decentralized global analytics via machine learning
Sangroya et al. Performance assurance model for hiveql on large data volume
US10534774B2 (en) Query performance degradation analysis timing
Ecker Latency-aware stream operator placement optimization in fog computing environments
Napolitano Automated Learning of Quantitative Software Models from System Traces
CN116955388A (en) Query processing based on stochastic prediction model
CN113722141A (en) Method and device for determining delay reason of data task, electronic equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16736672

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16736672

Country of ref document: EP

Kind code of ref document: A1