US12242454B2 - Framework for workload prediction and physical database design - Google Patents
Framework for workload prediction and physical database design Download PDFInfo
- Publication number
- US12242454B2 US12242454B2 US18/462,513 US202318462513A US12242454B2 US 12242454 B2 US12242454 B2 US 12242454B2 US 202318462513 A US202318462513 A US 202318462513A US 12242454 B2 US12242454 B2 US 12242454B2
- Authority
- US
- United States
- Prior art keywords
- workload
- prediction
- sar
- observed
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G06F11/3414—Workload generation, e.g. scripts, playback
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
- G06F16/213—Schema design and management with details for schema evolution support
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/256—Integrating or interfacing systems involving database management systems in federated or virtual databases
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5019—Workload prediction
Definitions
- An enterprise may utilize a cloud computing environment to let users perform tasks. For example, the enterprise might let various users execute an application via the cloud computing environment to process purchase orders, adjust human resources information, generate invoices, etc.
- the cloud computing environment may be associated with one or more databases, and the physical layout of such databases can substantially impact performance and memory consumption. As a result, an inappropriate physical layout can significantly degrade performance or increase the memory consumption and thereby increase the hardware costs of an organization operating the database (the increase of hardware costs may be a consequence of bad performance or increased memory consumption). Because the amount of provisioned Dynamic Random Access Memory (“DRAM”) dominates hardware costs, DataBase-as-a-Service (“DBaaS”) providers in particular must carefully consider the economical tradeoffs between memory consumption and performance.
- DRAM Dynamic Random Access Memory
- DRAM DataBase-as-a-Service
- the physical layout may be timely adapted to workload changes and therefore be optimized for the future workload, such that future high performance and/or low memory consumption is achieved.
- methods and systems may be associated with a cloud computing environment.
- a workload prediction framework may receive observed workload information associated with a database in the cloud computing environment (e.g., a DBaaS). Based on the observed workload information, a Statement Arrival Rate (“SAR”) prediction may be generated. In addition, a host variable assignment prediction may be generated based on the observed workload information. The workload prediction framework may then use the SAR prediction and the host variable assignment prediction to automatically create a workload prediction for the database.
- a physical database design advisor e.g., a table partitioning advisor
- Some embodiments comprise means for receiving, by a computer processor of a workload prediction framework, observed workload information associated with a database in the cloud computing environment; based on the observed workload information, means for generating a SAR prediction; based on the observed workload information, means for generating a host variable assignment prediction; means for automatically creating a workload prediction for the database using the SAR prediction and the host variable assignment prediction; and responsive to the workload prediction, means for automatically generating a recommended physical layout for the database by a physical database design advisor.
- Some technical advantages of some embodiments disclosed herein are improved systems and methods to provide accurate and efficient workload predictions which can serve as an input to a physical database design advisor.
- FIGS. 1 A through 1 B illustrate table partitioning layouts.
- FIG. 2 is a high-level block diagram associated with a cloud-based computing system in accordance with some embodiments.
- FIG. 3 is a more detailed diagram of a system according to some embodiments.
- FIG. 4 is a custom adapter creation method in accordance with some embodiments.
- FIG. 5 illustrates a system according to some embodiments.
- FIG. 6 is a more detailed system in accordance with some embodiments.
- FIG. 7 is an example of workload drift according to some embodiments.
- FIG. 8 illustrates a static workload in accordance with some embodiments.
- FIG. 9 is a SAR method in accordance with some embodiments.
- FIGS. 10 A and 10 B illustrate the relationship between SAR and a discrete Fourier transform of an observed SAR according to some embodiments.
- FIGS. 11 A through 11 F show examples of detection and classification of SAR for various SQL statements in accordance with some embodiments.
- FIG. 12 is a Directed Acyclic Graph (“DAG”) classifier to determine workload drift type according to some embodiments.
- DAG Directed Acyclic Graph
- FIG. 13 is a host variable assignment method according to some embodiments.
- FIGS. 14 A and 14 B illustrates the classification of assignments in accordance with some embodiments.
- FIG. 15 is a probability mass function according to some embodiments.
- FIGS. 16 A through 16 C are examples for the prediction of assignments in accordance with some embodiments.
- FIG. 17 shows the impact of ⁇ on the choice of a future database layout according to some embodiments.
- FIG. 18 is a workload prediction and physical database design display according to some embodiments.
- FIG. 19 is an apparatus or platform according to some embodiments.
- FIG. 20 is a tabular portion of a workload prediction data store in accordance with some embodiments.
- FIG. 21 illustrates a tablet computer according to some embodiments.
- Real-world applications are characterized by workloads where the arrival rate and parameterization of Structured Query Language (“SQL”) statements can change over time. Without continuous adjustments of the physical layout to reflect such workload drifts, memory consumption and performance can deteriorate significantly.
- SQL Structured Query Language
- Existing approaches for automated physical database design, particularly table partitioning advisors fail to address workload drifts because the physical layouts they propose are based on the observed workload.
- Some embodiments described herein provide a framework for the continuous adaptation of physical layout under workload drifts. Some embodiments predict the future workload based on detected workload drifts in the arrival rate and in the parameterization of SQL statements. The predicted workload is then fed into a physical database design advisor. Using a real-world application, accurate predictions of the future workload and a substantial hardware cost reduction on a commercial cloud database can be achieved compared to existing approaches.
- a straightforward method to deal with workload drifts is to repeatedly feed the observed workload into a physical database design advisor, e.g., at fixed intervals.
- a physical database design advisor e.g., at fixed intervals.
- Such an approach is inherently backward-looking as the suggested physical layout lags behind workload drifts and may already be suboptimal when data reorganization (e.g., table repartitioning) starts.
- some embodiments described herein provide a forward-looking approach that determines the new physical layout using a prediction of the future workload.
- Typical workload predictors are only able to forecast a future arrival rate of statements.
- physical database design advisors in particular table partitioning advisors, often rely on fine-granular workload statistics, e.g., the parameterization of SQL statements.
- some embodiments described herein predict the future parameter values assigned to the host variables (in addition to the arrival rate of statements).
- test results that many software development projects store in a database to identify bugs or authorize patches.
- SAP® HANA development project retains statistics on more than 30 billion test runs.
- FIGS. 1 A through 1 B illustrate table partitioning layouts.
- FIG. 1 A is a backward-looking approach 110 where a currently observed workload 111 ( ) is used create partitions P 1 and P 2 in layout L obs 113 (regardless of what the future workload 112 ( ) looks like). Note that most of the statement instantiations are above 72,154,000 in the future workload and only few statement instantiations are below 72,154,000. As a consequence, the four pages of P 2 are hot because only those pages are frequently accessed (and need to stay in DRAM) while all pages of P 1 are cold because those pages are only rarely accessed (and can be evicted to cheaper secondary storage).
- each partition 113 may be associated with multiple columns (e.g., one for status and one for id_test_profile as illustrated by dotted lines in FIG. 1 A ) and the approach 110 is associated with the observed workload during a six-hour time window (from 08:00 to 14:00) on a regular workday.
- Each dot represents a statement execution at a specific time (x-axis) with a specific parameter value (y-axis) assigned to host variable:1. Observe that the workload drifts over time as the later the statement is instantiated, the larger the parameter value becomes.
- FIG. 1 B is a forward-looking approach 120 where a predicted workload 121 ( ) is used to create partitions P 1 , P 2 , and P 3 in layout L obs 123 (closer to what the future workload 122 ( ) looks like as compared to FIG. 1 A ).
- P 1 contains two hot pages (which need to stay in DRAM) to handle statement instantiations above 72,164,000 while P 2 and P 3 have cold pages to handle statement instantiations below 72,164,000.
- the proposed layout L pred 123 groups frequently accessed records into a hot partition P 3 , while rarely and never accessed records are separated into cold partitions P 2 and P 1 .
- the data is stored on pages, not clustered by id_test_profile and no index exists on id_test_profile.
- FIG. 1 B shows whether the pages of the status and id_test_profile columns in L obs 113 and L pred 123 were classified as hot or cold (based on the five-minute rule).
- layout L pred 123 reduces the number of hot pages held in DRAM by a factor of two as compared to L obs 113 as suggested by the backward-looking approach 110 .
- the main reason is that the small, hot partition P 3 of L pred allows for a more aggressive partition pruning during the evaluation of the selection predicate than the large partition P 2 of L obs 113 since most statement instantiations are above 72,164,000 in the actual future workload. Therefore, only partition P 3 in L pred and P 2 in L obs needs to be accessed for most of the statement instantiations. As P 3 in L pred contains only two pages while P 2 in L obs contains four pages the memory consumption with L pred can be reduced by a factor of two as compared to L obs .
- the goal of some embodiments may be to find a predicted workload from an observed workload , such that the predicted workload approximates the future workload .
- various workload drift types e.g., linear, exponential, reoccurring, static, and irregular
- these drift types can also overlap.
- the arrival rate as well as the assignments of parameter values to host variables can be impacted by different drift types. Therefore, a workload predictor must be able to handle multiple drift types (and at least some combinations thereof) at the arrival rate and assignment level.
- a workload prediction framework 210 predicts the future workload.
- the predicted workload is provided to a physical database design advisor 250 along with information about the current physical layout which can then automatically create a recommended future physical layout for a database.
- the current physical layout may be used as an input to the physical database design advisor (since a change in the physical layout might only be advised when the expected benefits of workload cost reductions outweigh the cost of data reorganization).
- the term “automatically” may refer to a process that requires little or no human intervention. As illustrated by the system 300 of FIG.
- a workload prediction framework 310 may predict the future Statement Arrival Rate (“SAR”) 320 and the future assignments of parameter values to host variables 330 to create the predicted workload that is given to a physical database design advisor 350 along with information about the current physical layout. Some embodiments may incorporate table repartitioning costs in the physical design advice phase. Furthermore, the system 300 may allow for the exchanging or extending all components to support workload drifts in novel environments. In addition, both phases may be periodically repeated to ensure a continuous adaption of the physical layout.
- SAR Statement Arrival Rate
- devices may exchange information via any communication network which may be one or more of a Local Area Network (“LAN”), a Metropolitan Area Network (“MAN”), a Wide Area Network (“WAN”), a proprietary network, a Public Switched Telephone Network (“PSTN”), a Wireless Application Protocol (“WAP”) network, a Bluetooth network, a wireless LAN network, and/or an Internet Protocol (“IP”) network such as the Internet, an intranet, or an extranet.
- LAN Local Area Network
- MAN Metropolitan Area Network
- WAN Wide Area Network
- PSTN Public Switched Telephone Network
- WAP Wireless Application Protocol
- Bluetooth a Bluetooth network
- wireless LAN network a wireless LAN network
- IP Internet Protocol
- any devices described herein may communicate via one or more such communication networks.
- the workload prediction framework 310 and physical database design advisor 350 may store information into and/or retrieve information from various data stores (e.g., various predictions or recommendation), which may be locally stored or reside remote from the workload prediction framework 310 and/or physical database design advisor 350 .
- various data stores e.g., various predictions or recommendation
- FIG. 3 a single workload prediction framework 310 and physical database design advisor 350 are shown in FIG. 3 , any number of such devices may be included.
- various devices described herein might be combined according to embodiments of the present invention.
- the workload prediction framework 310 and physical database design advisor 350 might comprise a single apparatus.
- the system 300 functions may be performed by a constellation of networked apparatuses, such as in a distributed processing or cloud-based architecture.
- An administrator may access the system 300 via a remote device (e.g., a Personal Computer (“PC”), tablet, or smartphone) to view information about and/or manage operational information in accordance with any of the embodiments described herein.
- a remote device e.g., a Personal Computer (“PC”), tablet, or smartphone
- GUI Graphical User Interface
- an operator or administrator may let an operator or administrator define and/or adjust certain parameters via the remote device (e.g., to adjust prediction rules or logic, alter ⁇ lin and/or ⁇ exp , map elements to data sources, etc.) and/or provide or receive automatically generated recommendations, results, or alerts associated with the system 300 .
- FIG. 4 is a method 400 that might be performed by some or all of the elements of the systems 200 , 300 described with respect to FIGS. 2 and 3 .
- the flow charts described herein do not imply a fixed order to the steps, and embodiments of the present invention may be practiced in any order that is practicable. Note that any of the methods described herein may be performed by hardware, software, or any combination of these approaches.
- a computer-readable storage medium may store thereon instructions that when executed by a machine result in performance according to any of the embodiments described herein.
- a computer processor of a workload prediction framework may receive observed workload information associated with a database (e.g., a DBaaS), such as one implemented in a cloud computing environment. Based on the observed workload information, the system may generate a SAR prediction at S 420 . Based on the observed workload information, the system may also generate a host variable assignment prediction at S 430 . At S 440 , the system can then automatically create a workload prediction for the database using the SAR prediction and the host variable assignment prediction. Responsive to the workload prediction and a current physical layout, at S 450 a physical database design advisor automatically generates a recommended physical layout for the database.
- a database design advisor Responsive to the workload prediction and a current physical layout, at S 450 a physical database design advisor automatically generates a recommended physical layout for the database.
- q i ⁇ ⁇ q 1 , . . . , q i , . . . , q n ⁇ may be defined as a set of n ⁇ parameterized SQL statements (e.g., SELECT, INSERT, UPDATE, and DELETE statements).
- (t p ,q ip ,v ip ) ⁇ may be defined as a set of p ⁇ statement instantiations.
- o s , o e , f s , f e ⁇ be four timestamps, such that o s ⁇ o e ⁇ f s ⁇ f e .
- an observed workload as , such that ⁇ (t, q i , v i ) ⁇ :o s ⁇ t ⁇ o e
- a future workload as , such that ⁇ (t, q i , v i ) ⁇ :f s ⁇ t ⁇ f e .
- Some embodiments described herein find a predicted workload from an observed workload , where ⁇ (t, q i , v i ) ⁇ :f s ⁇ t ⁇ f e , such that the predicted workload approximates the future workload .
- FIG. 5 is an overview of a system 500 according to some embodiments. It consists of a workload prediction framework 510 and a physical database design advisor 550 .
- the workload prediction framework 510 predicts the future workload based on the observed workload on a per-statement basis. As it may be computationally infeasible to consider all statements in the observed workload for the prediction, some embodiments only utilize statements in the SQL plan cache because they typically represent more than 99% of the workload cost.
- the future workload is predicted in two independent stages.
- SAR detection 521 , SAR classification 522 , and SAR prediction 523 are used to predict the statement arrival rate.
- host variable assignment detection 531 , host variable assignment classification 532 , and host variable assignment prediction 533 are used to predict the future assignments of parameter values to host variables.
- the predicted workload can then be obtained by combining the results of both stages.
- the advisor 550 uses this information along with information about a current physical layout to propose the future physical layout L fut with the smallest combined workload and data reorganization costs based on the predicted workload and the current physical layout L cur .
- the workload prediction framework 510 and advisor 550 periodically repeat these steps to adopt the physical layout in small and cheap adjustments.
- FIG. 6 is a more detailed system 600 in accordance with some embodiments.
- the workload prediction framework 610 predicts the future workload associated with SQL statements based on the observed workload . For each statement q i ⁇ , the future workload is predicted based on a SAR prediction and a host variable assignment prediction.
- the SAR prediction utilizes SAR detection 621 , SAR classification 622 (e.g., via classifier ⁇ ), and SAR prediction 623 (e.g., via predictors ⁇ 1 through ⁇ 2 ).
- the future assignments of parameter values to host variables utilizes host variable assignment detection 631 , host variable assignment classification 632 (e.g., via classifier ⁇ ) and host variable assignment prediction 633 (e.g., via predictors ⁇ 1 through ⁇ 2 ).
- the predicted workload can then be obtained by combining the results of the SAR and host variable assignment predictions.
- the advisor 650 uses along with information about the current physical layout L cur to propose the future physical layout L fut with the smallest combined workload and data reorganization costs. According to some embodiments, the workload prediction framework 610 and advisor 650 periodically repeat these steps to adopt the physical layout.
- workloads can vary over time in various different ways. For example, a real-world application workload might drift in a linear, exponential, reoccurring, static, or irregular fashion.
- a linear or exponential workload drift is characterized by a linearly or exponentially increasing (or decreasing) arrival rate of statements, respectively parameter values assigned to host variables.
- the parameter values of the domain of id_test_profile might grow linearly over time (e.g., as shown by the observed workload 111 in FIG. 1 A ).
- An exponential drift for instance, might occur close to the release of a new version of a system (or the addition of a new feature to a system) when certain statements are instantiated exponentially more frequently than during regular development phases.
- FIG. 7 is an example 700 of workload drift instantiations of this statement between Dec. 11 and 31, 2020 at a certain time (x-axis on graph 710 ) using a specific build ID (y-axis). Two reoccurring drift patterns can be found. First, there is a weekly pattern, where a steep increase of id_make on weekdays alternates with much slighter increases on weekends.
- FIG. 7 illustrates an example 700 of two overlapping drift types.
- a static workload is a workload where no temporal drift can be observed.
- FIG. 8 illustrates 800 a static workload with a graph 810 that shows all instantiations of this statement between 08:00 and 20:00 on a regular workday. No change in the parameter values can be observed during the considered time frame. This might be expected, for example, when the same tests are repeatedly executed over an extended period of time to identify bugs or authorize patches.
- An irregular workload drift is characterized by an abrupt and unexpected change of the statement arrival rate or the parameter values. For example, occasional stress test campaigns, in addition to regular testing, can lead to an irregular drift of the statement arrival rate. Unlike other drift types, irregular workload drifts can neither be modeled nor predicted. Nonetheless, some embodiments described herein may handle irregular workload drifts.
- FIG. 9 is a SAR prediction method 900 in accordance with some embodiments.
- S 910 starts by discretizing the observed workload to reduce noise and other short-term fluctuations.
- a set of equidistant timestamps may be defined using a discretization interval ⁇ >0 between two successive, discrete timestamps.
- ⁇ may be set to five minutes.
- all statement instantiations in the observed workload that fall within interval [t, t+ ⁇ ) may be aggregated for a timestamp t ⁇ (o s , o e ).
- the observed statement arrival rate may be defined as a series of consecutive statement arrival frequencies between the start and end timestamps o s , o e ⁇ N in the observed workload .
- FIGS. 10 A and 10 B illustrate the relationship between SAR and a discrete Fourier transform of an observed SAR according to some embodiments.
- FIG. 10 A shows 1010 the observed SAR( ,q i ) for equidistant timestamps between 08:00 and 14:00 with a discretization interval ⁇ of five minutes, calculated from the observed workload 111 of FIG. 1 A . Note there is a wavelike pattern with peaks at 08:30, 11:00, and 13:00.
- the system may detect which workload drift types are present in the observed SAR.
- a condition-based detector may be used in connection with the observed SAR for each drift type.
- embodiments may use the Pearson correlation between the series of discrete timestamps and the series of statement arrival frequencies (both series being abbreviated as SAR( ,q i )). The condition is satisfied if the Pearson correlation coefficient ⁇ is greater or equal than a threshold ⁇ lin ⁇ [0, 1]: SAR( , q i ) is linear ⁇
- embodiments may take advantage of the fact that a function grows exponentially if its logarithm grows linearly. Therefore, the condition is satisfied if the Pearson correlation coefficient ⁇ between the series of discrete timestamps and the series of the logarithm of each statement arrival frequency is greater or equal than a threshold ⁇ exp ⁇ [0, 1] (note that the logarithm is only applied on the series of statement arrival frequencies F(t, , q i ) in SAR( ,q i ) and not on the series of discrete timestamps): SAR( , q i ) is exponential ⁇
- embodiments may first compute the Discrete Fourier Transform (“DFT”), i.e., convert the observed SAR from the time domain into the frequency domain. Afterwards, the system may evaluate whether a sinusoid with an amplitude greater or equal than a threshold ⁇ ( , q i ) is present in the DFT. To specify the threshold ⁇ ( , q i ), embodiments may first determine the smallest and largest observed statement arrival frequencies lb i and ub i of statement q i in the observed workload.
- DFT Discrete Fourier Transform
- the threshold ⁇ ( , q i ) is obtained as ⁇ cyc times the largest possible amplitude range but at least ⁇ cyc times the smallest statement arrival frequency.
- DFT Discrete Fourier Transform
- embodiments may examine whether the observed SAR is stable, i.e., it fluctuates only within narrow limits around the mean statement arrival frequency ⁇ i . Given a fixed ⁇ static ⁇ [0,1], the condition is satisfied if Root Mean Square Error (“RMSE”) between the observed statement arrival frequencies and u i is less than or equal to ⁇ static times ⁇ i :
- RMSE Root Mean Square Error
- a workload is classified as irregular if none of the above conditions for linear, exponential, reoccurring, or static drifts are met. Note that embodiments described herein are extensible such that other conditions could be plugged in to detect an irregular drift.
- FIG. 12 is a Directed Acyclic Graph (“DAG”) classifier 1200 that uses decisions 1210 determines workload drift type according to some embodiments.
- DAG Directed Acyclic Graph
- the classifier 1200 decides that the observed SAR is the detected drift type. If two or more drift types are detected, the classifier distinguishes compatible and incompatible drift combinations. According to some embodiments, the combination of linear and reoccurring is considered compatible (e.g., FIG. 7 and q 4 in FIG. 11 D ). In contrast, since an exponential drift asymptotically subsumes a linear drift, both together are an incompatible combination. In this case, the classifier 1200 prefers the drift type with the stronger Pearson correlation coefficient ⁇ . Further, an exponential drift is preferred over a reoccurring drift because the exponential drift may appear as a sinusoid in the DFT (e.g., q 2 in FIG. 11 B ).
- Table II shows classification outcomes and the predicted statement arrival frequencies at future discrete timestamps t ⁇ (f s , f e ) for the six SQL statements shown in FIGS. 11 A through 11 F . Because of a 6-hour observation window and a choice of 5 minutes as discretization interval ⁇ , the timestamps of range between 0 and 71, while timestamps of begin at 72 . If none of the conditions are satisfied, the classifier 1200 decides that the observed SAR is irregular.
- the system may predict the future SAR using the observed SAR and the outcome of the prior classification. To do this, the system may calculate the predicted statement arrival frequency F(t, , q i ) at future timestamps t E ⁇ (f s , f e ) for each statement q i ⁇ . The predicted SAR( ,q i ) is then obtained as the series of consecutive statement arrival frequencies between f s and f e . According to some embodiments, the prediction is based on the classification of the observed SAR:
- FIG. 13 is a host variable assignment method 1300 according to some embodiments to predict the future assignments of parameter values to host variables. For pragmatic reasons, note that the system may predict the assignments to a host variable independently of other host variables. To illustrate the main idea of this method 1300 , consider the graph 1410 FIG. 14 A , which is an abstract representation of the observed workload in FIG. 1 A . Each dot represents a single assignment (t,v ij ) ⁇ ( , h ij ) of parameter value v ij (y-axis) to host variable h ij at timestamp t (x-axis).
- an assignment is classified as “uncertain” if it occurs close to the beginning of the observation period, as “regular” if v ij is regularly assigned to h ij (e.g., those assignments are time independent because they occur so often that no periodic/reoccurring pattern can be observed and, therefore, they belong to a static workload), as “fresh” if v ij has not been assigned yet to h ij within a recent time interval, and as “subsequent” otherwise. Furthermore, note that subsequent assignments frequently occur shortly after fresh assignments (and rarely further on). Finally, the regular assignments are distributed across the entire observed workload.
- the system may perform detection and classification.
- a threshold interval of length k ⁇ may be selected, with k ⁇ >0 , as a multiple of the discretization interval ⁇ E >0 .
- the threshold interval k ⁇ is used to distinguish time-dependent and time-independent assignments.
- an assignment (t, v ij ) ⁇ ( , h ij ) is classified as: UNCERTAIN if o s ⁇ t ⁇ o s +k ⁇ REGULAR else if ⁇ ⁇ circumflex over (t) ⁇ [o s ,o e ⁇ k ⁇ ) ⁇ ( t′,v ij ) ⁇ ( , h ij ): ⁇ circumflex over (t) ⁇ t′ ⁇ circumflex over (t) ⁇ +k ⁇ FRESH else if ( t′,v ij ) ⁇ ( , h ij ): t ⁇ k ⁇ t′ ⁇ t SUBSEQUENT else.
- Assignments where the same parameter value is assigned to the same host variable at least once every k ⁇ time units, are independent of the time and thus part of the static workload. Such assignments are classified as regular. For example, in FIG. 14 A this is the case for parameter value v′ ij . In contrast, the absence of a repeated assignment of a parameter value every k ⁇ time units implies that the assignment is time-dependent and thus part of the workload drift. If the parameter value in such assignments is first assigned after k ⁇ time units, the assignment is classified as fresh, otherwise (second, third, etc.) as subsequent. For example, in FIG.
- the system predicts the future fresh assignments and models the distribution of subsequent and regular assignments.
- the system may extrapolate the series of fresh assignments in the observed workload into the future and thereby predict the series of future fresh assignments.
- the system may detect which drift types are present in the observed SFA and then use a classifier (such as the one in FIG. 12 ) to resolve conflicts between multiple detected drift types. For example, the observed SFA in FIG. 14 A is classified as linear (as can be seen in FIG. 14 B ). The system can then predict the future series of fresh assignments SFA(P, h ij ) based on SFA( , h ij ) and the outcome of the prior classification.
- the system may model the distribution of subsequent assignments of parameter values in relation to their corresponding fresh assignments. This model can then be used to predict future subsequent assignments in relation to their predicted fresh assignments. To do this, first formalize the time difference between a subsequent assignment and its corresponding fresh assignment as the temporal offset between the two. Let sub ( , h ij ) be the set of subsequent assignments to host variable h ij in the observed workload .
- Another special case is when no fresh assignment exists for a subsequent assignment because the fresh assignment occurred at the beginning of the observed workload and was classified as uncertain. For example, in FIG. 14 A this happens for v′′ ij . In such cases, the system may estimate a fresh assignment by extrapolating the observed SFA into the past.
- FIG. 15 is a probability mass function 1500 (with the y-axis shown in log scale) according to some embodiments.
- the function 1500 shows the probability mass function p sub , for the observed workload of FIG. 1 A . Observe that the larger time differences between fresh and subsequent assignments are associated with smaller probabilities. Note that x-axis show time intervals I( ⁇ ) for ⁇ s between 0 and 72.
- some embodiments may model the distribution of regular assignments in terms of a probability mass function. Using such a model, the system can then predict future regular assignments.
- reg be the set of regular assignments to host variable h ij in the observed workload .
- p reg the probability of a regular assignment of parameter value v ij to host variable h ij :
- Table III contains an algorithm that can be used to compute a predicted workload .
- the algorithm expects as input for each SQL statement q i ⁇ the predicted statement arrival rate SAR( , q i ), and for each host variable h ij in statement q i the predicted series of fresh assignments SFA(h ij ) and the probability mass functions p sub (h ij , ⁇ ) and p reg (h ij ,v ij ).
- the algorithm first initializes as an empty set (line 1). Next, the algorithm iterates over all SQL statements q i ⁇ (line 2) and all future equidistant timestamps t ⁇ (f s , f e ) (line 3). The system then derives the number of statement instantiations F(t, , q i ) to be predicted in the current interval [t, t+ ⁇ ) from the predicted SAR, where SAR( , q i ) [t] denotes the statement arrival frequency at timestamp t (line 4).
- the algorithm For each statement instantiation to predict, the algorithm draws a random timestamp t rand uniformly from [t, t+ ⁇ ) (lines 5 and 6) and allocates a vector v i with a capacity of m i parameter values (line 7). Finally, the system iterates over all host variables h ij to predict their assignment by parameter values v ij (lines 8 to 20) and adds the predicted statement instantiation (t rand , q i , v i ) to the predicted workload (line 21).
- the system may first draw an assignment category fresh, subsequent, or regular with the same probabilities as they occur in ( ,h ij ) (line 9).
- the probability of generating a subsequent assignment is
- FIGS. 16 A through 16 C illustrate the prediction of regular, fresh, and subsequent assignments to host variable h ij at a random future timestamp t rand ⁇ [t, t+ ⁇ ). Also shown is the predicted SFA( , h ij ) 1612 .
- the value of the fresh assignment v ij is obtained as SFA( , h ij ) [t rand ] as illustrated 1610 by the line 1612 in FIG. 16 A .
- the time difference t diff is shown drawn between t fresh and t rand as illustrated 1620 in FIG. 16 B .
- the value of the subsequent assignment v ij is then obtained as SFA( , h ij ) [t fresh ].
- value v′ ij of a regular assignment is already regularly assigned to h ij as illustrated 1630 in FIG. 16 C (see also v′ ij in FIG. 14 A ).
- the second phase of the framework feeds the predicted workload into a physical database design advisor (e.g., a table partitioning advisor).
- a table partitioning advisor for column stores to minimize main memory costs in $ while meeting all performance guarantees assured in Service-Level Agreements (“SLAs”).
- SLAs Service-Level Agreements
- $ is used herein as an example, note that embodiments may be associated with any other currency or representation of monetary costs. As the amount of provisioned DRAM dominates hardware costs, this is in particular crucial for DBaaS providers.
- the advisor may group frequently accessed data into hot column partitions that remain in main memory, and rarely accessed data into cold column partitions that can be moved to cheaper storage layers.
- the classification of a column partition C determines its cost.
- Cold-classified column partitions can be pruned during the evaluation of the selection predicate (e.g., as described with respect to FIGS. 1 A and 1 B ) and therefore are not required to be held in DRAM during the workload. Because disk IO is performed instead for each access, their memory costs depend on the number of accessed pages by workload W and the costs in $ per disk IOP. In contrast, a hot-classified column partition depends on the DRAM costs in $/byte/second, the column partition size ⁇ C ⁇ in bytes, and the maximum workload execution time SLA(W) in seconds:
- a cost model may include table repartitioning costs.
- a straightforward approach would be to calculate repartitioning costs globally, i.e., at the level of the entire physical layout. However, if an enumeration algorithm operates at the level of column partitions, the system may compute the repartitioning costs per column partition. Further embodiments may model repartitioning costs in $ to be able to compare them to workload costs.
- an objective function may be modified. Contrary to some systems, which minimize the workload costs workload of the observed workload , some embodiments propose the future layout L fut that minimizes the combined workload and repartitioning costs total of the predicted workload P.
- FIG. 17 shows 1700 the impact of ⁇ on the choice of a future database layout L fut according to some embodiments.
- the x-axis shows future timestamp f s , f′ e , and f′′ e depending on the choice of ⁇ .
- the total costs in $ for L cur and two future layouts L fut 1 and L fut 2 are shown on the y-axis.
- the costs for both future layouts are greater than $0 at the y-axis intercept due to repartitioning costs.
- L cur is optimal for a short-time prediction as the repartitioning costs of both future layouts are not yet amortized.
- L fut 1 and L fut 2 respectively, become optimal.
- the advisor For the integration of the advisor into the prediction framework, assume a fixed observation period ⁇ and a fixed prediction confidence factor ⁇ as prerequisites. Also initialize the start timestamp of the observed workload o s as the current timestamp and the layout L cur as the current layout. The system can then start the first iteration of the framework by computing the end timestamp of the observed workload o e as o s + ⁇ and waiting until that time. Using the algorithm of Table III, the system next calculates the predicted workload between start timestamp f s set to o e and end timestamp f e set to f s + ⁇ based on the observed workload between timestamps o s and o e .
- the predicted workload and the current layout L cur can be fed into the physical database design advisor and the system may switch to the proposed future layout L fut .
- the current layout could also be considered as the best alternative.
- a memory costs and performance experiment analyzed how the partitioning layouts proposed by embodiments described herein and a backward-looking approach alter the relation between memory costs and workload performance. As hardware costs are dominated by DRAM costs, one may focus on the memory consumption. It was observed that embodiments described herein always outperform a backward-looking approach in terms of performance. For example, with enough available memory (i.e., high memory costs), the proposed framework can improve execution time by 1.4 compared to a backward-looking approach. The main reason for the observed performance improvements and memory cost savings is a sharper separation of frequently and rarely accessed data into hot and cold partitions as compared to the backward-looking approach. This avoids polluting the buffer cache with cold data, allows for more aggressive partition pruning, and speeds up delta merges.
- the R 2 score for assignments of the predicted workload have a substantially higher R 2 score compared to the observed workload.
- the R 2 score of the assignments of the predicted workload is frequently close to 1.
- a third experiment analyzed the impact of the observation period co and the prediction confidence factor ⁇ on the prediction accuracy.
- the R 2 score for statement 176 for co between 60 and 570 minutes were calculated for ⁇ between 0.1 and 8.0.
- the longer the predicted period i.e., the larger ⁇ is, the less accurate predictions become.
- the poor accuracy resulting from ⁇ larger than 1 is expected as it is no longer possible to detect reoccurring drifts with a period longer than the observation period.
- the choice of ⁇ set to 360 minutes and ⁇ set to 1.0 leads both to accurate predictions and to an amortization of the repartitioning costs due to the long prediction period.
- a fourth experiment evaluated how precisely the modified version of advisor estimates table repartitioning costs. For this purpose, 363 random partitioning layouts for different SQL database tables were generated and then randomly split and merged existing partitions in those layouts. For each considered table repartitioning, R rate was set to 133 MB/sec and R ovrhd was set to 0.16 sec as repartitioning costs were measured. There was a strong correlation between the estimated and actual repartitioning costs and the maximum q-error was 2.27.
- a fifth and final experiment analyzed the time required by the framework to predict the future workload for observation periods co between 60 and 360 minutes and ⁇ was set to 1.0.
- the required prediction time ranges between only 1.2% and 3.2% of the observation period. Since the advisor also has a low optimization time, one can conclude that that framework is suitable for practical usage.
- FIG. 18 is a framework for workload prediction and physical database design display 1800 according to some embodiments.
- the display 1800 includes a graphical representation 1810 or dashboard that might be used by an operator or administrator to configure the components described herein (e.g., for a cloud computing environment). In particular, selection of an element (e.g., via a touchscreen or computer mouse pointer 1890 ) might result in the display of a popup window that contains more detailed data.
- the display 1800 may also include a user selectable “Save” icon 1820 to store configurations and/or system mappings (e.g., to data sources) and an “Update” icon 1830 to adjust values as appropriate.
- FIG. 19 is a block diagram of an apparatus or platform 1900 that may be, for example, associated with the systems 200 , 300 , 500 , and 600 of FIGS. 2 , 3 , 5 , and 6 , respectively (and/or any other system described herein).
- the platform 1900 comprises a processor 1910 , such as one or more commercially available Central Processing Units (“CPUs”) in the form of one-chip microprocessors, coupled to a communication device 1960 configured to communicate via a communication network (not shown in FIG. 19 ).
- the communication device 1960 may be used to communicate, for example, with one or more databases 1924 (e.g., via a communication network 1922 ), system administrators, etc.
- the platform 1900 further includes an input device 1940 (e.g., a computer mouse and/or keyboard to input, create and/or manage prediction algorithm parameters) and/an output device 1950 (e.g., a computer monitor to render a display, transmit recommendations, generate alerts, and/or create reports about workload predictions, physical database design recommendations, etc.).
- an input device 1940 e.g., a computer mouse and/or keyboard to input, create and/or manage prediction algorithm parameters
- an output device 1950 e.g., a computer monitor to render a display, transmit recommendations, generate alerts, and/or create reports about workload predictions, physical database design recommendations, etc.
- a mobile device and/or PC may be used to exchange information with the platform 1900 .
- the processor 1910 also communicates with a storage device 1930 .
- the storage device 1930 may comprise any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, mobile telephones, and/or semiconductor memory devices.
- the storage device 1930 stores a program 1912 and/or workload prediction framework engine 1914 for controlling the processor 1910 .
- the processor 1910 performs instructions of the programs 1912 , 1914 , and thereby operates in accordance with any of the embodiments described herein.
- the processor 1910 may be associated with a cloud computing environment and receive observed workload information. Based on the observed workload information, a SAR prediction may be generated by the processor 1910 . In addition, a host variable assignment prediction may be generated by the processor 1910 based on the observed workload information.
- the processor 1910 may then use the SAR prediction and the host variable assignment prediction to automatically create a workload prediction for the database.
- a physical database design advisor e.g., a table partitioning advisor
- information may be “received” by or “transmitted” to, for example: (i) the platform 1900 from another device; or (ii) a software application or module within the platform 1900 from another software application, module, or any other source.
- the storage device 1930 further stores a workload prediction data store 2000 .
- a workload prediction data store 2000 An example of a database that may be used in connection with the platform 1900 will now be described in detail with respect to FIG. 20 . Note that the database described herein is only one example, and additional and/or different information may be stored therein. Moreover, various databases might be split or combined in accordance with any of the embodiments described herein.
- a table that represents the workload prediction data store 2000 that may be stored at the platform 1900 according to some embodiments.
- the table may include, for example, entries identifying workload predictions and physical database design changes for a cloud computing environment.
- the table may also define fields 2002 , 2004 , 2006 , 2008 , 2010 , 2012 for each of the entries.
- the fields 2002 , 2004 , 2006 , 2008 , 2010 , 2012 may, according to some embodiments, specify a database identifier 2002 , an observed workload identifier 2004 , SAR prediction data 2006 , host variable assignment prediction data 2008 , a workload prediction 2010 , and a recommended physical layout 2012 for a database.
- the workload prediction data store 2000 may be created and updated, for example, when new observations are received, a new DBaaS is implemented, etc.
- the database identifier 2002 might be a unique alphanumeric label that is associated with an SQL database to be analyzed and/or optimized.
- the observed workload identifier 2004 may be a file associated with a substantial number of historic statements.
- the SAR prediction data 2006 and host variable assignment prediction data 2008 may be automatically generates by any of the embodiments described herein (e.g., based on the observed workload).
- the workload prediction 2010 may be created by combining the SAR prediction data 2006 and host variable assignment prediction data 2008 .
- the recommended physical layout for a database may be generated by an advisor based on the workload prediction and the current physical layout (e.g., an advisor associated with table partitioning, data compression, buffer pool size, another advisor that is based on access counters of a workload, etc.).
- embodiments may provide accurate and efficient workload predictions which can serve as an input to a physical database design advisor (e.g., in a cloud computing environment).
- a physical database design advisor e.g., in a cloud computing environment.
- the following illustrates various additional embodiments of the invention. These do not constitute a definition of all possible embodiments, and those skilled in the art will understand that the present invention is applicable to many other embodiments. Further, although the following embodiments are briefly described for clarity, those skilled in the art will understand how to make any changes, if necessary, to the above-described apparatus and methods to accommodate these and other embodiments and applications.
- FIG. 21 illustrates a tablet computer 2100 providing a framework for workload prediction and physical database design display 2110 .
- the display 2110 might be used, for example, to create recommended database adjustments to improve performance/costs in a cloud computing or other environment.
- the display 2110 might be used to update and/or create prediction algorithms, layout recommendations, etc. via a “Save” icon 2120 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
| TABLE I |
| Notation for Workload Prediction |
| = {q1, . . . , qi, . . . , qn} | Set of n parameterized SQL statements |
| hi = [hi1, . . . , hij, . . . , him |
Vector of mi host variables of qi ∈ |
| vi = [vi1, . . . , vij, . . . , vim |
Vector of mi parameter values |
| (t, qi, vi) | Instantiation of statement qi with vi at |
| timestamp t | |
| Workload (set of p statement instantiations) | |
| ( , hij) | All assignments to host variable hij in |
| workload | |
| Observed, future, and predicted workload | |
| os, oe | Start and end timestamps of |
| fs, fe | Start and end timestamps of and |
(,h ij):={(t,v ij)|(t,q i ,v i)∈,v ij is assigned to h ij}.
Δ(o s ,o e):={o s +λ·δ|λ∈ ,o s ≤o s +λ·δ<o e}.
To calculate the observed statement arrival frequency, all statement instantiations in the observed workload that fall within interval [t, t+δ) may be aggregated for a timestamp t ∈Δ(os, oe). The observed statement arrival frequency F(t, , qi) at timestamp t∈Δ(os, oe) for statement qi∈ in the observed workload may be defined as:
F(t, ,q i):=|{(t′,q′ i ,v′ i)|q′ i {circumflex over ( )}t≤t′<t+δ}|.
Finally, the observed statement arrival rate may be defined as a series of consecutive statement arrival frequencies between the start and end timestamps os, oe∈N in the observed workload . The observed statement arrival rate (observed SAR) for statement qi∈ in the observed workload maybe defined as:
SAR(,q i):={(t,F(t, ,q i))|t∈Δ(o s ,o e)}.
SAR(,q i) is linear⇔|ρ(SAR(,q i))|≥φlin.
To detect exponential workload drifts, embodiments may take advantage of the fact that a function grows exponentially if its logarithm grows linearly. Therefore, the condition is satisfied if the Pearson correlation coefficient ρ between the series of discrete timestamps and the series of the logarithm of each statement arrival frequency is greater or equal than a threshold φexp∈[0, 1] (note that the logarithm is only applied on the series of statement arrival frequencies F(t, , qi) in SAR(,qi) and not on the series of discrete timestamps):
SAR(,q i) is exponential⇔|ρ(log(SAR(,q i)))|φexp.
ub i:=maxt∈Δ(o
lb i:=mint∈Δ(o
θ(,q i):=φcyc·max(ub i −lb i ,lb i)
SAR(,q i) is reoccurring⇔∃z∈DFT(SAR(,q i)):|z|≥θ(,q i).
| TABLE II |
| Classification Outcomes |
| Classification | Predicted Statement Arrival Frequency | |
| q1 | Linear | F(t, , q1) = 13t + 7071 |
| q2 | Exponential | F(t, , q2) = e0.09t+1.41 |
| q3 | Reoccurring | F(t, , q3) = 1216 cos(2πt) + 18 sin(2πt) − 1004 |
| cos(2πt) − 1701 sin(2πt) + 4954 | ||
| q4 | Linear and | F(t, , q4) = 61t + 5801 + 185 cos(2πt) + 135 |
| Reoccurring | sin(2πt) − 239 cos(2πt) − 317 sin(2πt) | |
| q5 | Static | F(t, , q5) = 10690 |
| q6 | Irregular | F(t, , q6) = F(t mod 72, , q6) |
-
- Linear: Use a linear approximation function obtained by ordinary least-squares regression on the observed SAR.
- Exponential: First perform ordinary least-squares regression on the natural logarithm of the observed SAR. The future SAR is calculated by raising e to the power of the prediction produced by the linear approximation function.
- Reoccurring: Convert all sinusoids with an amplitude greater or equal than threshold θ(, qi) into a sum of trigonometric functions in the time domain.
- Static: Use the mean observed statement arrival frequency μi as a prediction of future statement arrival frequencies.
- Irregular: We copy the observed SAR and paste it into the future, similar to a backward-looking approach.
- Linear and reoccurring: Compute a sum of a linear function and trigonometric functions. The linear function is obtained by ordinary least-squares regression on the observed SAR. Then calculate a normalized observed SAR by subtracting the linear function from the observed SAR. Finally, obtain the trigonometric functions by transforming all sinusoids in the DFT of the normalized SAR with an amplitude greater or equal than threshold θ(, qi).
Sinusoids with a small amplitude may be filtered out when handling a reoccurring drift because they may result from short-term fluctuations or noise and would therefore lead to overfitting. Since irregular drifts can neither be modeled nor predicted, the framework cannot know the future. Instead, the system may copy the observed SAR into the future (e.g., a backward-looking approach). Because the overhead of the approach is relatively low, the system can run the workload predictor and the advisor periodically (so that it responds timely to the outcome of an irregular drift).
UNCERTAIN if o s ≤t<o s +k·δ
REGULAR else if ∀{circumflex over (t)}∈[o s ,o e −k·δ)∃(t′,v ij)∈(,h ij):{circumflex over (t)}≤t′<{circumflex over (t)}+k·δ
FRESH else if (t′,v ij)∈(,h ij):t−k·δ≤t′<t
SUBSEQUENT else.
SFA(,h ij):={t,v ij)|(t,v ij)∈(,h ij),(t,v ij) is fresh}.
As before, the system may detect which drift types are present in the observed SFA and then use a classifier (such as the one in
τ(t,v ij)=min({t−t′|t′,v ij)∈(,h ij),(t′,v ij) is fresh,t′<t}).
Note that for the same parameter value, more than one fresh assignment can exist. This may happen, for example, with a reoccurring drift. To handle such ambiguities, some embodiments only consider the temporal offset to the most recent corresponding fresh assignment.
The prediction is based on the following two assumptions: first, regular assignments to a host variable in the observed workload will continue in the future workload; and second, regular assignments are uniformly distributed. To motivate both assumptions, consider
| TABLE III |
| Workload Prediction Algorithm |
| Input : | For each statement qi ∈ : SAR( , qi) |
| For each host variable hij in qi: SFA( , hij), psub, |
|
| 1 | := { } // initialize the predicted workload |
| 2 | for 1 ≤ i ≤ n do // iterate over indexes of SQL statements |
| 3 | for t ∈ Δ(fs, fe) do // iterate over future equidistant timestamps |
| 4 | F(t, , qi) := SAR( , qi) [t] // get statement arrival frequency |
| 5 | for 1 . . . F (t, , qi) do// iterate over statement instantiations |
| 6 | draw a random timestamp trand ∈ [t, t + δ) // uniform |
| 7 | allocate vector vi with a capacity of mi parameter values |
| 8 | for 1 ≤ j ≤ mi do // iterate over indexes of host variables |
| 9 | draw assignment category fresh, subsequent, or regular according to |
| their distribution in ( , hij) | |
| 10 | switch assignment category do |
| 11 | case fresh do // generate fresh assignment |
| 12 | vij := SFA( , hij) [trand] // get value from SFA at |
| 13 | case subsequent do // generate |
| 14 | draw an interval [λ·δ, (λ+1)·δ) according to |
| 15 | draw a random time difference tdiff ∈ [λ·δ, (λ+1)·δ) |
| 16 | tfresh := trand−tdiff //calculate time of fresh assignment |
| 17 | vij := SFA( , hij) [tfresh] // get value from SFA at |
| 18 | case regular do // generate regular assignment |
| 19 | vij := draw parameter value according to preg |
| 20 | insert parameter value vij into |
| 21 | = ∪ {(trand, qi, vi)} // add predicted statement instantiation to |
-
- Fresh: generate a fresh assignment with parameter value vij set to the value of the predicted series of fresh assignments SFA(, hij) at timestamp trand (line 12).
- Subsequent: The timestamp and parameter value of subsequent assignments are modeled in terms of their corresponding fresh assignments. Generate a subsequent assignment in two steps: In a first step, draw a time difference tdiff between the subsequent assignment to generate at timestamp tram and its corresponding fresh assignment at tfresh. More specifically, draw an interval [λ·δ, (λ+1)·δ) according to the probability given by psub(line 14), then draw tdiff uniformly from this interval (line 15), and calculate tfresh as difference between trand and tdiff (line 16). In a second step, generate a subsequent assignment with parameter value vij set to the value of the predicted series of fresh assignments SFA(, hij) at tfresh (line 17).
- Regular: Generate a regular assignment with parameter value IA, drawn according to the probability given by preg(line 19).
-
- (1) As column stores often employ read-optimized data structures (e.g., compressed dictionaries) that cannot be modified easily, assume that C is created from scratch.
- (2) Each record written to column partition C was read from an existing column partition in Lcur. Accordingly, assume that the total amount of data accessed during repartitioning is twice the column partition size ∥C∥.
- (3) To reduce the impact on performance SLAs, ensure fast repartitioning times by assuming that all data accessed during repartitioning is held in DRAM.
- (4) Assume that the system exhibits a repartitioning rate Rrate in Byte/sec for creating column partition C, as well as a fixed overhead Rovrhd in sec for initializing C.
o s:=current timestamp
o e :=o s+ω
f s :=o e
f e :=f s+η·ω.
Claims (23)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/462,513 US12242454B2 (en) | 2022-03-28 | 2023-09-07 | Framework for workload prediction and physical database design |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/705,728 US11789920B1 (en) | 2022-03-28 | 2022-03-28 | Framework for workload prediction and physical database design |
| US18/462,513 US12242454B2 (en) | 2022-03-28 | 2023-09-07 | Framework for workload prediction and physical database design |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/705,728 Continuation US11789920B1 (en) | 2022-03-28 | 2022-03-28 | Framework for workload prediction and physical database design |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20240004855A1 US20240004855A1 (en) | 2024-01-04 |
| US12242454B2 true US12242454B2 (en) | 2025-03-04 |
Family
ID=83996884
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/705,728 Active 2042-04-08 US11789920B1 (en) | 2022-03-28 | 2022-03-28 | Framework for workload prediction and physical database design |
| US18/462,513 Active US12242454B2 (en) | 2022-03-28 | 2023-09-07 | Framework for workload prediction and physical database design |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/705,728 Active 2042-04-08 US11789920B1 (en) | 2022-03-28 | 2022-03-28 | Framework for workload prediction and physical database design |
Country Status (3)
| Country | Link |
|---|---|
| US (2) | US11789920B1 (en) |
| EP (1) | EP4254217A1 (en) |
| CN (1) | CN116860713A (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12020417B2 (en) * | 2020-04-24 | 2024-06-25 | Camtek Ltd. | Method and system for classifying defects in wafer using wafer-defect images, based on deep learning |
| CN118689944B (en) * | 2024-06-06 | 2025-04-25 | 吉林大学 | A method and device for constructing a database of associated variables |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060129771A1 (en) * | 2004-12-14 | 2006-06-15 | International Business Machines Corporation | Managing data migration |
| US20210096996A1 (en) * | 2019-10-01 | 2021-04-01 | Microsoft Technology Licensing, Llc | Cache and i/o management for analytics over disaggregated stores |
| US20220129316A1 (en) * | 2020-10-28 | 2022-04-28 | Adobe Inc. | Workload Equivalence Class Identification For Resource Usage Prediction |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2923874B2 (en) * | 1996-12-26 | 1999-07-26 | 日本電気株式会社 | High load emulation performance evaluation method and device |
| CN103176973B (en) * | 2011-12-20 | 2016-04-20 | 国际商业机器公司 | For generating the system and method for the test job load of database |
| US11436207B2 (en) * | 2019-07-05 | 2022-09-06 | Purdue Research Foundation | Clustered database reconfiguration system for time-varying workloads |
| CN113157814B (en) * | 2021-01-29 | 2023-07-18 | 东北大学 | Query-driven intelligent workload analysis method in relational database |
-
2022
- 2022-03-28 US US17/705,728 patent/US11789920B1/en active Active
- 2022-10-25 EP EP22203598.2A patent/EP4254217A1/en active Pending
- 2022-10-26 CN CN202211320568.2A patent/CN116860713A/en active Pending
-
2023
- 2023-09-07 US US18/462,513 patent/US12242454B2/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060129771A1 (en) * | 2004-12-14 | 2006-06-15 | International Business Machines Corporation | Managing data migration |
| US20210096996A1 (en) * | 2019-10-01 | 2021-04-01 | Microsoft Technology Licensing, Llc | Cache and i/o management for analytics over disaggregated stores |
| US20220129316A1 (en) * | 2020-10-28 | 2022-04-28 | Adobe Inc. | Workload Equivalence Class Identification For Resource Usage Prediction |
Also Published As
| Publication number | Publication date |
|---|---|
| US20240004855A1 (en) | 2024-01-04 |
| EP4254217A1 (en) | 2023-10-04 |
| US11789920B1 (en) | 2023-10-17 |
| CN116860713A (en) | 2023-10-10 |
| US20230306009A1 (en) | 2023-09-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN105144112B (en) | Seasonal trend, forecast, anomaly detection and endpoint prediction for JAVA heap usage | |
| US10983895B2 (en) | System and method for data application performance management | |
| AU2021359241B2 (en) | Learning-based workload resource optimization for database management systems | |
| EP3289485B1 (en) | Automatic demand-driven resource scaling for relational database-as-a-service | |
| US12242454B2 (en) | Framework for workload prediction and physical database design | |
| US10776354B2 (en) | Efficient processing of data extents | |
| JP7719572B2 (en) | Computer-implemented method, computer program, and system (predicted query processing) | |
| KR102895621B1 (en) | Offloading statistics collection | |
| US11263117B2 (en) | System for intelligent code update for a test automation engine | |
| US20250292142A1 (en) | Model optimization and stabilization using quantum computing | |
| US11514044B2 (en) | Automated plan upgrade system for backing services | |
| US20220261254A1 (en) | Intelligent Partitioning Engine for Cluster Computing | |
| CN118037250B (en) | Data mining method and system applying text informatization system | |
| US12026134B2 (en) | Flow-based data quality monitoring | |
| US20250247308A1 (en) | Service-Level Detection of Latency Anomalies in a Computing Platform | |
| US20250111299A1 (en) | Systems and methods for self-schedule recommendations | |
| CN115827401A (en) | Prediction-based method for analyzing performance impact due to software system component content changes | |
| Sivasatyanarayanareddy | How Embedded AI Enhances Workflow Orchestration in Pega | |
| JP2024505522A (en) | System and method for determining model suitability and stability of model deployment in automatic model generation | |
| CN121073065A (en) | Task matching methods, devices, electronic devices and storage media | |
| CN119226375A (en) | Method, device, equipment and storage medium for processing customer data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: SAP SE, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BRENDLE, MICHAEL;MAY, NORMAN;SCHULZE, ROBERT;AND OTHERS;SIGNING DATES FROM 20220326 TO 20220328;REEL/FRAME:064824/0869 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |