WO2015038224A1 - Systems and methods for tuning multi-store systems to speed up big data query workload - Google Patents
Systems and methods for tuning multi-store systems to speed up big data query workload Download PDFInfo
- Publication number
- WO2015038224A1 WO2015038224A1 PCT/US2014/045348 US2014045348W WO2015038224A1 WO 2015038224 A1 WO2015038224 A1 WO 2015038224A1 US 2014045348 W US2014045348 W US 2014045348W WO 2015038224 A1 WO2015038224 A1 WO 2015038224A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- views
- store
- view
- stores
- multistore
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24539—Query rewriting; Transformation using cached or materialised query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
Definitions
- Multistore systems represent a natural evolution for big data analytics, where query processing may span both stores, transferring data and computation.
- One approach to multistore processing is to transfer and load all of the big data into the RDBMS (i.e., up-front data loading) in order to take advantage of its superior query processing performance relative to the big data store.
- the large size of big data and the high cost of an ETL process Extract-Transform-Load
- Another approach is to utilize both stores during query processing by enabling a query to transfer data on-the-fly (i.e., on-demand data loading).
- a more effective strategy for multistore processing is to make a tradeoff between up-front and on-demand data loading. This is challenging since exploratory queries are ad-hoc in nature and the relevant data is changing over time.
- a crucial problem for a multistore system is determining what data to materialize in which store at what time. We refer to this problem as tuning the physical design of a multistore system.
- multistore systems utilize multiple distinct data stores such as Hadoop's HDFS and an RDBMS for query processing by allowing a query to access data and computation in both stores.
- Current approaches to multistore query processing fail to achieve the full potential benefits of utilizing both systems due to the high cost of data movement and loading between the stores.
- Tuning the multistore physical design i.e., deciding what data resides in which store, can reduce the amount of data movement during query processing, which is crucial for good multistore performance.
- the stores have very asymmetric performance properties, the data placement problem is not straightforward. Roughly speaking, store 1 is large and slow relative to store 2, which is smaller and faster.
- Store 1 has worse query processing performance but provides much better data loading times
- Store 2 has better query processing performance but suffers from very high data load times.
- MISO MultlStore-Online -tuning
- MISO Speed up big data query processing by selecting the best set of materialized views for each store.
- MISO maximizes the utilization of the existing RDBMS resources, employing some of its spare capacity for processing big data queries.
- the physical design of a Multistore system has two decision components: the data to materialize (i.e., which views), and where to materialize the data (i.e., which store). Moreover, both of these decisions are not be made in isolation.
- a method utilizes the by-products of query processing in a multistore system to tune the physical design of the multistore system. These by-products are materializations of intermediate data, which are then placed across the stores by the method. The by-products are termed "views" over the base data. In order to place the views across the stores in a way that is beneficial to query processing, an expected query workload is required. The method considers the recently observed queries as indicative of the future query workload.
- Each store has an allotted view storage budget, and there is a view transfer budget for transferring views between the stores. Each budget may have a unique value. This setup is depicted in Figure 2 in the paper.
- the method then considers transferring views between the stores such that the final view placements fit within all of the budgets and minimize the cost of the future workload.
- the method considers a unified set of views, which is the union of all views present in each store.
- the solutions include a subset of these views to be placed in each store. It first begins by solving the view placement for the high-performance store first, store 2.
- the views computed to be most beneficial for the future workload are considered for transfer to (or retained within) store 2.
- a solution is computed for store 2 and the solution is the set of views to place in store 2.
- the solution for store 2 must not exceed the view storage budget for store 2 and must not exceed the view transfer budget. It then similarly computes a solution for store 1.
- the solution for store 1 must not exceed the view storage budget for store 1 and must not exceed the remaining view transfer budget that was not consumed by the solution for store 2.
- the method solves the data placement problem in a multistore system.
- the system results in faster query processing times for a multistore system.
- the system simply utilizes the 2 stores already in-place at many organizations. It couples them together in a way that reduces query execution time.
- the system makes better use of company resources. This has a monetary-cost benefit.
- FIG. 1 shows an exemplary setup with (a) two independent stores and (b) multistore system utilizing both stores.
- FIG. 2 depicts an exemplary system architecture, containing the MISO Tuner, the Query Optimizer, the Execution Layer, and the two stores HV and DW.
- FIG. 3 shows an exemplary process to reconfigure a multistore system.
- FIG. 4 shows another exemplary process to reconfigure a multistore system.
- FIG. 5 shows exemplary processes for handling views as elements of multistores.
- FIG. 6 shows an exemplary process to create a multi-store design.
- FIG. 7 shows in more details another exemplary process to create a multi-store design.
- FIG. 8 shows in more details another exemplary process to create a multi-store design.
- FIG. 9 shows methods used in a system for data base management.
- FIG. 1 shows an exemplary setup with (a) two independent stores and (b) multistore system utilizing both stores.
- Parallel relational database management systems RDBMS
- RDBMS Parallel relational database management systems
- HDFS high data storage and analysis
- the present system combines a parallel RDBMS and a big data store into a "multistore" system, either for performance reasons or for analysis of data sets that span both stores.
- multistore does not necessarily denote a combination of a big data store and a RDBMS store.
- multistore can involve more than two stores and can involve different combination of stores.
- multistores are not restricted to the combination of RDBMS and Hive.
- a multistore can be made up of a big data store, a column store and a RDBMS store.
- the techniques we developed here can be trivially extended to these alternative setups. Having noted that the concept of multistores, we describe a multistore consisting of Hive and RDBMS as a representative of a broader class of multistores for the ease of exposition.
- MISO a MultlStore Online tuning algorithm
- MISO maximizes the utilization of the existing high-performance RDBMS resources, employing some of its spare capacity for processing big data queries.
- DBAs are naturally protective of RDBMS resources. To be sensitive to this concern, it is important that our approach achieves this speedup with little impact on the RDBMS reporting queries.
- MISO is an adaptive and lightweight method to tune data placement in a multistore system, and it has the following unique combination of features:
- FIG. 1 shows an exemplary Multistore System Architecture.
- the big data store is Hive (HV) and the RDBMS is a commercial parallel data warehouse (DW) as depicted in Figure 1.
- HV contains the big data log files
- DW contains analytical business data.
- MISO accelerates big data exploratory queries in HV by utilizing some limited spare capacity in DW and tuning the physical design of the stores for a workload of queries. In this way, MISO makes use of existing resources to benefit big data queries.
- DW acts as an accelerator in order to improve the performance of queries in HV; later we show that MISO achieves this speedup with very little impact on DW.
- FIG. 2 depicts our system architecture, containing the MISO Tuner, the Query Optimizer, the Execution Layer, and the two stores HV and DW.
- each store resides on an independent cluster.
- HV contains log data while DW contains analytical data, but in this work our focus is on accelerating big data queries posed only on the log data.
- each store has a set of materialized views of the base data (i.e., logs) stored in HV, and together these views comprise the multistore physical design.
- Hadoop-based systems e.g., HV
- HV Hadoop-based systems
- DW products generally do not provide access to their intermediate materialized results. However, if those results become available, then they can also be used for tuning.
- MISO Tuner's job we consider the class of opportunistic materialized views when tuning the physical design. It is MISO Tuner's job to determine the placement of these views across the stores to improve workload performance. When placing these views, each store has a view storage budget as indicated in the figure, and there is also a transfer budget when moving views across the stores. These budgets limit the total size of views that can be stored and transferred.
- log files The primary data source in our system is large log files.
- social media data drawn from sites such as Twitter, Foursquare, Instagram, Yelp, etc.
- This type of data is largely text-based with little structure.
- Logs are stored as flat HDFS files in HV in a text-based format such as json, xml, CSV or completely unstructured formats such as LOG4J files.
- the input to the system is a stream of queries.
- the query language is HiveQL, which represents a subset of SQL implemented by Hive (HV). Queries are declarative and posed directly over the log data, such that the log schema of interest is specified within the query itself and is extracted during query execution.
- HV Hive
- extracting flat data from text files is accomplished by a SerDe (serialize/deserialize) function that understands how to extract data fields from a particular flat file format (e.g., json).
- a query may contain both relational operations and arbitrary code denoting user-defined functions (UDFs). UDFs are arbitrary user code, which may be provided in several languages (e.g., Perl, Python); the UDFs are executed as Hadoop jobs in HV.
- the processing location (DW or HV) is hidden from the end user, who has the impression of querying a single store.
- the Execution Layer component is responsible for forwarding each component of the execution plan generated by the Query Optimizer to the appropriate store.
- a multistore execution plan may contain "split points", denoting a cut in the plan graph whereby data and computation is migrated from one store to the other. Since DW is used as a accelerator for HV queries, the splits in a plan move data and computation in one direction: from HV to DW. It is the multistore query optimizer's job (described next) to select the split points for the plan. As an example, the figure alongside has three panels, showing an execution plan (represented as a DAG) and then two possible split points indicated by the cuts.
- the execution layer migrates the intermediate data (i.e., the working set corresponding to the output of the operators above the cut) and resumes executing the plan on the new store.
- the intermediate data i.e., the working set corresponding to the output of the operators above the cut
- two intermediate data sets need to be migrated.
- intermediate data sets are migrated during query execution, they are stored in temporary DW table space (i.e., not catalogued) and discarded at the end of the query.
- the execution layer is also responsible for orchestrating the view movements when changes to the physical design are requested by the tuner. When views are migrated during tuning, they are stored in permanent DW table space and become part of the physical design until the next tuning phase.
- the Multistore Query Optimizer component takes a query as input, and computes a multistore execution plan for query.
- the plan may span multiple stores, moving data and computation as needed, and utilizes the physical design of each store.
- the design of an optimizer that spans multiple stores must be based on common cost units (expected execution time) between stores, thus some unit normalization is required for each specific store.
- Our multistore cost function considers three components: the cost in HV, the cost to transfer the working set across the stores, and the cost in DW, expressed in normalized units.
- the multistore query optimizer chooses the split points based on the logical execution plan and then delegates the resulting sub-plans to the store-specific optimizers.
- the store in which query subexpressions are executed depends on the materialized views present in each store. Furthermore when determining split points the optimizer must also consider valid operations for each store, such as a UDF that can only be executed in HV. Moving a query sub-expression from one store to another is immediately beneficial only when the cost to transfer and load data from one store to another plus the cost of executing the sub-expression in the other store is less than continuing execution in the current store.
- the primary challenge for the multistore query optimizer is determining the point in an execution plan at which the data size of a query's working set is "small enough" to transfer and load it into DW rather than continue executing in HV.
- the MISO Tuner component is invoked periodically to reorganize the materialized views in each store based on the "latest traits" of the workload.
- the tuner examines several candidate designs and analyzes their benefit on a sliding window of recent queries (History in Figure 1) using the what-if optimizer.
- the selected design is then forwarded to the execution layer, which then moves views from HV to DW and from DW to HV (indicated in Figure 1) as per the newly computed design.
- the invocation of the MISO tuner which we term a reorganization phase, can be time-based (e.g., every lh), query-based (e.g., every queries), activity-based (e.g., when the system is idle), or a combination of the above. In our system, reorganizations are query-based.
- the View Transfer Budget is indicated by the arrow between the stores. This represents the total size of views in GB transferred between the stores during a reorganization phase and is provided as a constraint to the tuner.
- the HV View Storage Budget and the DW View Storage Budget are also indicated in the figure and similarly provided as constraints to the tuner. These represent the total storage allotted in GB for the views in each store. While DW represents a tightly-managed store, HV deployments are typically less tightly-managed and may have more spare capacity than DW. For these reasons, new opportunistic views created in HV between reconfigurations are retained until the next time the MISO tuner is invoked, while the set of views in DW is not altered except during reorganization phases. In any Hadoop configuration, there must be enough temporary storage space to retain these materializations during normal operation, we only propose to keep them a little while longer - until the next
- V The elements of our design are materialized views and the universe of views is denoted as V .
- the physical design of each store is denoted as V h for HV, and V d for DW, where V h , V d K .
- the values B h , B d denote the view storage budget constraints for HV and DW respectively, and the view transfer budget for a reorganization phase is denoted as B t .
- reorganization phases occur periodically (e.g., after j queries are observed by the system), at which point views may be transferred between the stores.
- the constraints B h , B d , and B t are specified in GB.
- M (V h , V d ) be a multistore design.
- M is a pair where the first component represents the views in HV, and the second component represents the views in DW.
- a query q i represents the i th query in W .
- the cost of a query q given multistore design M denoted by cost(q,M) , is the sum of the cost in HV, the cost to transfer the working set of q to DW and the cost in DW under a hypothetical design M .
- the evaluation metric we use for a multistore design M is the total workload cost, defined as:
- this metric represents the sum of the total time to process all queries in the workload. This is a reasonable metric, although others are possible. Note that the metric does not include the reorganization constraint B t since it is provided as an input to the problem.
- a pair of views ( ,b) may interact with one another with respect to their benefit for q .
- the interaction occurs when the benefit of a for q changes when b is present.
- the type of interaction for a view a with respect to b may be positive or negative.
- a positive interaction occurs when the benefit of a increases when b is present.
- a and b are said to interact positively when the benefit of using both is higher than the sum of their individual benefits. In this case, we want to pack both a and b in the knapsack.
- a negative interaction occurs when the benefit of a decreases when b is present.
- One embodiment uses a heuristic approach in the way we handle view interactions and solve the physical design of both stores.
- we motivate and develop our heuristics and then later in the experimental section we show that our approach results in significant benefits compared to simpler tuning approaches.
- the workload we address is online in nature, hence reorganization is done periodically in order to tune the design to reflect the recent workload.
- Our approach is to obtain a good multistore design by periodically solving a static optimization problem where the workload is given.
- the MISO Tuner algorithm is: Algorithm 1 MISO Tuner algorithm
- the tuner algorithm begins by grouping all views in the current designs V h and V d into interacting sets. The goal of this step is to identify views that have strong positive or negative interactions with other views in V .
- At the end of this step there are multiple sets of views, where views within a set strongly interact with each other, and views belonging to different sets do not.
- We sparsify each set by retaining some of the views and discarding the others.
- To sparsify a set we consider if the nature of the interacting views within a set is positive or negative. If the nature of the interacting views is strongly positive, then as a heuristic, those views should always be considered together since they provide additional benefit when they are all present.
- V cands contains views that may be considered independently when computing the new multistore design, which is done by solving two multidimensional knapsack problems in sequence.
- the dimensions of each knapsack are the storage budget and the transfer budget constraints. We solve an instance of a knapsack for DW, using view storage budget B d and view transfer budget B t .
- the output of this step is the new DW design, V d ew .
- V d ew the new DW design
- B h view storage budget
- B t rem any view transfer budget remaining
- V ⁇ TM new HV design
- the reason we solve the DW design first is because it can offer superior execution performance when the right views are present. With a good DW design, query processing can migrate to DW sooner thus taking advantage of its query processing power. For this reason, we focus on DW design as the primary goal and solve this in the first phase. After the DW design is chosen, the HV design is solved in the second phase. In this two-phase approach, the design of HV and D W can be viewed as complimentary, yet formulated to give DW greater importance than HV as a heuristic.
- the benefit function divides W into a series of non- overlapping epochs, each a fraction of the total length of W . This represents the recent query history.
- the predicted future benefit of each view is computed by applying a decay on the view's benefit per epoch— for each q e W , the benefit of a view v for query q is weighted less as q appears farther in the past.
- the outcome of the computation is a smoothed averaging of 's benefit over multiple windows of the past. In this way, the benefit computation captures a longer workload history but prefers the recent history as more representative of the benefit for the immediate future workload.
- each part may be considered independently when packing m-knapsack . This is because the benefit of a view in part P i is not affected by the benefit of any view in part P. , where P t ⁇ P. . At this point however, some parts may have a cardinality greater than one, so we next describe a method to choose a representative view among views within the same part.
- the MISO Tuner solves two instances of a 0- 1 multidimensional knapsack problem ( m-knapsack henceforward); the first instance for DW and the second instance for HV. Each instance is solved using a dynamic programming formulation.
- V h includes all views added to the HV design by the
- the MISO tuner during the previous reorganization window as well as any new opportunistic views in HV created since the last reorganization.
- the MISO tuner computes the new designs VTM w and V d ew for HV and DW respectively. Since the DW has better query performance, as a first heuristic MISO solves the DW instance first resulting in the best views being added to the DW design. Furthermore, we ensure V h to prevent duplicating views across the stores. Although this is also a heuristic, our rationale is it potentially results in a more "diverse" set of materialized views and hence better utilization of the limited storage budgets in preparation for an unknown future workload. If desired, this property could be relaxed by including all views in V cands when packing both HV and DW.
- the target design is V d ew
- the m-knapsack dimensions are B d and B t .
- the variable k denotes the k th element in V cands (i.e., view v k ), the order of elements is irrelevant.
- the recurrence relation C is given by the following two cases.
- the target design is , and the m-knapsack dimensions are B d and B t rem , and
- V d represents the views evicted from the DW which are now available for transfer back to HV.
- the solution is symmetric to Phase 1, with modified inputs.
- B t is initialized to B t rem .
- Phase 2 We similarly have 2 cases, defined in Phase 2 as:
- FIG. 3 shows an exemplary process to reconfigure a multistore system.
- the process first receives a new query Q on the system (102). The process then executes the query Q on the two stores (104). Next, the process checks if reconfiguration is needed in 106. If not then the process loops back to 102 and otherwise the process performs reconfiguration in 108 and then loops back to 106.
- FIG. 4 shows another exemplary process to reconfigure a multistore system.
- a new query is received by the system 120.
- V is a set of existing views; Q is rewritten using the views; and the system finds a set of split points in the plan of Q where computation can be offloaded to a relational database management system.
- the process then updates V based on the new views. Froml22, the process proceeds to 124 where it checks whether reconfiguration is needed. If not the process loops back to 120. Otherwise the process moves to 126 where B_h and B_d are the set of views currently present in both stores, and B_t is the transfer budgets for moving between the stores.
- the process chooses the subset of views for each store such as the storage budget and transfer budget constraints are satisfied in 126.
- FIG. 5 shows exemplary processes for handling views as elements of multistores.
- the process includes methods that allow existing query processing systems to speed up big data query processing 140.
- the process also includes a method by which the query processing is offloaded to a high-performance relational data base management system 142.
- the process also includes a method of using opportunistic views as the element of physical design of multi-stores in 144.
- the process also includes a method to determine which views should be placed in a particular store in a predetermined order in 146.
- the process also includes a method by which the relational database management systems 's existing query is not affected by restricting storage budget for views and transfer budget is done in 148.
- the process also includes a method of deciding split points of a query where the competition can be moved to the relational database management system 150.
- the process also includes a method of moving views based on looking at past recent workload in 152.
- FIG. 6 shows an exemplary process to create a multi-store design. The method receives a set of existing views V across the stores 202. Next, using knapsack-based method, the process chooses a subset of the views for retaining in the current store, a subset of views to transfer across the stores, and a subset of views to drop in 204. The process arrives at a new multi-store design in 206.
- FIG. 7 shows in more details another exemplary process to create a multi-store design.
- a set of existing views V is provided across the stores 210.
- the process resolves strongly interacting views, both positive and negative, in 212.
- the process packs the relational database management system with the most beneficial use using a knapsack packing within the transfer budget in 214.
- the process packs big data store with the remaining views and remaining transfer budget in 216.
- the process discards views that did not get packing to any of the knapsacks in 218.
- the process then generates a new multi-store design in 220.
- FIG. 8 shows in more details another exemplary process to create a multi-store design.
- the process receives a set of existing views V across stores in 230.
- the process updates budget and historical data in 232, for example big data store budget , RDBMS store budget, transfer budget, and history of last h queries.
- the process also performs the following:
- Value of item V is COST(V;, Q) and consumes storage capacity
- Value of item V is COST(V;, Q) and consumes storage capacity
- FIG. 9 shows methods used in a system for data base management.
- the system includes a method of deciding which view should be placed in which store and which views should be discarded in 262.
- the process also includes a method of deciding the most beneficial set of views for a given storage budget in 264.
- the process also includes a method of deciding the most beneficial set of views to transfer across stores in 266.
- the process also includes a method by which the most beneficial views for RDBMS is chosen first in 268.
- the process also includes a method by which the views for the big data store is chosen next in 270.
- the process also includes a method by which interactions between views is handled in 272.
- the process also includes a method of using a dynamic programming solution to pack both stores in 274.
- the invention may be implemented in hardware, firmware or software, or a combination of the three.
- the invention is implemented in a computer program executed on a programmable computer having a processor, a data storage system, volatile and non-volatile memory and/or storage elements, at least one input device and at least one output device.
- the computer preferably includes a processor, random access memory (RAM), a program memory (preferably a writable read-only memory (ROM) such as a flash ROM) and an input/output (I/O) controller coupled by a CPU bus.
- RAM random access memory
- program memory preferably a writable read-only memory (ROM) such as a flash ROM
- I/O controller coupled by a CPU bus.
- the computer may optionally include a hard drive controller which is coupled to a hard disk and CPU bus. Hard disk may be used for storing application programs, such as the present invention, and data. Alternatively, application programs may be stored in RAM or ROM.
- I/O controller is coupled by means of an I/O bus to an I/O interface.
- I/O interface receives and transmits data in analog or digital form over communication links such as a serial link, local area network, wireless link, and parallel link.
- communication links such as a serial link, local area network, wireless link, and parallel link.
- a display, a keyboard and a pointing device may also be connected to I/O bus.
- I/O interface may be used for I/O interface, display, keyboard and pointing device.
- Programmable processing system may be preprogrammed or it may be programmed (and reprogrammed) by downloading a program from another source (e.g., a floppy disk, CD-ROM, or another computer).
- Each computer program is tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein.
- the inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016519729A JP6123028B2 (en) | 2013-09-13 | 2014-07-03 | System and method for tuning multi-store systems and accelerating big data query workloads |
EP14844611.5A EP3044704A4 (en) | 2013-09-13 | 2014-07-03 | Systems and methods for tuning multi-store systems to speed up big data query workload |
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361877430P | 2013-09-13 | 2013-09-13 | |
US201361877423P | 2013-09-13 | 2013-09-13 | |
US61/877,423 | 2013-09-13 | ||
US61/877,430 | 2013-09-13 | ||
US14/321,881 | 2014-07-02 | ||
US14/321,875 US20150081668A1 (en) | 2013-09-13 | 2014-07-02 | Systems and methods for tuning multi-store systems to speed up big data query workload |
US14/321,875 | 2014-07-02 | ||
US14/321,881 US9569491B2 (en) | 2013-09-13 | 2014-07-02 | MISO (multistore-online-tuning) system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015038224A1 true WO2015038224A1 (en) | 2015-03-19 |
Family
ID=52666128
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2014/045348 WO2015038224A1 (en) | 2013-09-13 | 2014-07-03 | Systems and methods for tuning multi-store systems to speed up big data query workload |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2015038224A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100262633A1 (en) * | 2009-04-14 | 2010-10-14 | International Business Machines Corporation | Managing database object placement on multiple storage devices |
US7958159B1 (en) * | 2005-12-19 | 2011-06-07 | Teradata Us, Inc. | Performing actions based on monitoring execution of a query |
US20130110872A1 (en) * | 2011-10-28 | 2013-05-02 | Microsoft Corporation | De-focusing over big data for extraction of unknown value |
US20130124483A1 (en) * | 2011-11-10 | 2013-05-16 | Treasure Data, Inc. | System and method for operating a big-data platform |
-
2014
- 2014-07-03 WO PCT/US2014/045348 patent/WO2015038224A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7958159B1 (en) * | 2005-12-19 | 2011-06-07 | Teradata Us, Inc. | Performing actions based on monitoring execution of a query |
US20100262633A1 (en) * | 2009-04-14 | 2010-10-14 | International Business Machines Corporation | Managing database object placement on multiple storage devices |
US20130110872A1 (en) * | 2011-10-28 | 2013-05-02 | Microsoft Corporation | De-focusing over big data for extraction of unknown value |
US20130124483A1 (en) * | 2011-11-10 | 2013-05-16 | Treasure Data, Inc. | System and method for operating a big-data platform |
Non-Patent Citations (2)
Title |
---|
HERODOTOU, HERODOTOS ET AL.: "Strarfish: A Self-tuning System for Big Data Analytic", 5TH BIENNIAL CONFERENCE ON INNOVATIVE DATA SYSTEMS RESEARCH (CIDR' 11, 9 January 2011 (2011-01-09), pages 261 - 272, XP055294861, Retrieved from the Internet <URL:http://x86.cs.duke.edu/-gang/documents/CIDRII_Paper36.pdf> * |
See also references of EP3044704A4 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9569491B2 (en) | MISO (multistore-online-tuning) system | |
US11550791B2 (en) | Table placement in distributed databases | |
Doulkeridis et al. | A survey of large-scale analytical query processing in MapReduce | |
LeFevre et al. | MISO: souping up big data query processing with a multistore system | |
Crotty et al. | Tupleware:" Big" Data, Big Analytics, Small Clusters. | |
Aji et al. | Hadoop-GIS: A high performance spatial data warehousing system over MapReduce | |
US10713255B2 (en) | Spool file for optimizing hash join operations in a relational database system | |
US9256633B2 (en) | Partitioning data for parallel processing | |
EP3903205A1 (en) | Technique of comprehensively support autonomous json document object (ajd) cloud service | |
US9830346B2 (en) | Table redistribution in distributed databases | |
US20140344287A1 (en) | Database controller, method, and program for managing a distributed data store | |
US9183253B2 (en) | System for evolutionary analytics | |
US11132366B2 (en) | Transforming directed acyclic graph shaped sub plans to enable late materialization | |
Terlecki et al. | On improving user response times in tableau | |
WO2015038224A1 (en) | Systems and methods for tuning multi-store systems to speed up big data query workload | |
Xu et al. | A dynamic view materialization scheme for sequences of query and update statements | |
Sarkar et al. | MapReduce: A comprehensive study on applications, scope and challenges | |
Sangat et al. | Nimble join: A parallel star join for main memory column‐stores | |
US20230281201A1 (en) | On-demand access of database table partitions | |
US11960463B2 (en) | Multi-fragment index scan | |
Arres et al. | A data pre-partitioning and distribution optimization approach for distributed datawarehouses | |
US11775543B1 (en) | Heapsort in a parallel processing framework | |
US20230359671A1 (en) | Reparallelization for workload skewing database operations | |
Lee et al. | Join processing with threshold-based filtering in MapReduce | |
Costa et al. | Data warehouse processing scale-up for massive concurrent queries with SPIN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14844611 Country of ref document: EP Kind code of ref document: A1 |
|
REEP | Request for entry into the european phase |
Ref document number: 2014844611 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014844611 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2016519729 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |