EP2976724A1 - Apparatus and method for time series query packaging - Google Patents

Apparatus and method for time series query packaging

Info

Publication number
EP2976724A1
EP2976724A1 EP13713694.1A EP13713694A EP2976724A1 EP 2976724 A1 EP2976724 A1 EP 2976724A1 EP 13713694 A EP13713694 A EP 13713694A EP 2976724 A1 EP2976724 A1 EP 2976724A1
Authority
EP
European Patent Office
Prior art keywords
query
data
time series
series data
overlap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP13713694.1A
Other languages
German (de)
French (fr)
Inventor
Sunil Mathur
Jerry Lin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intelligent Platforms LLC
Original Assignee
GE Intelligent Platforms Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GE Intelligent Platforms Inc filed Critical GE Intelligent Platforms Inc
Publication of EP2976724A1 publication Critical patent/EP2976724A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • G06F3/0649Lifecycle management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/2454Optimisation of common expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24557Efficient disk access during query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0605Improving or facilitating administration, e.g. storage management by facilitating the interaction with a user or administrator
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays

Definitions

  • Time Series Data Analytics naming as inventors Kareem S. Aggour, Ward L. Bowman, Jerry Lin, Sunil Mathur, Michael Solda, Brian Courtney, and Justin McHugh and having attorney docket number 265596 (130294);
  • the subject matter disclosed herein relates to time series data, and, more specifically, to the efficient retrieval of time series data using queries.
  • data storage devices are used to store data and these data storage devices may vary in cost.
  • data may be stored according to certain formats on high cost devices such as random access memories (RAMs).
  • RAMs random access memories
  • data may be stored on low cost devices such as on hard disks.
  • time series data is obtained by some type of sensor or measurement device and the data is then stored as a function of time.
  • a measurement sensor may take a reading of a parameter at predetermined time intervals, and each of the measurements is stored in memory. Since large amounts of data are typically involved with time series measurements, the storage and retrieval of this data may become inefficient.
  • a typical read query in traditional time-series databases usually includes two properties: a variable identifier to query and a query time range.
  • the present approaches package multiple queries as a set, for example, if they span roughly the same time metrics and/or duration. This results in the performance of a single shared data access operation before executing each query. Consequently, significantly improved multi-query performance is achieved.
  • a user may want to run several analytics that require retrieving raw values from the last calendar day. Running each of these analytics individually would involve repeatedly retrieving the same 24 hours of raw data. Instead, the present approaches enable the analytics to be run in parallel such that, for instance, the 24 hours of data can be retrieved only once and shared among the analytics.
  • analytics and as used herein, it is meant any operations meant to analyze or manipulate the time series data, including but not limited to generating averages, calculating means and standard deviations, and identifying minimum and maximum values.
  • a first query and a second query are received.
  • the first query and the second query are evaluated and, based upon the evaluating, identifying first time series data required to fulfill the first query and second time series data required to fulfill the second query.
  • An extent of overlap of the first time series data and the second time series data and identifying the overlapping data is determined.
  • the data required to fulfill both the first query and the second query is retrieved from a plurality of data storage devices in parallel, the data retrieved across all of the plurality of data storage devices via a single read operation.
  • the retrieved data is sorted for disbursement to the first query and the second query.
  • the extent of overlap is determined based upon time ranges specified in the first query and the second query.
  • the first query or the second query comprises a read query.
  • the first query is from a first analytic and the second query is from a second analytic.
  • the query results (e.g., for the first query or the second query) are received.
  • a subset of the results is determined.
  • the interface has an input and an output and the input is configured to receive a first query and a second query.
  • the processor is coupled to the interface and is configured to evaluate the first query and the second query and, based upon the evaluation, identify first time series data required to fulfill the first query and second time series data required to fulfill the second query.
  • the processor is further configured to determine an extent of overlap of the first time series data and the second time series data and identifying the overlapping data.
  • the processor is further configured to, when the extent of overlap exceeds a predetermined threshold, retrieve the data required to fulfill both the first query and the second query from a plurality of data storage devices in parallel. The full data is retrieved across all of the plurality of data storage devices via a single read operation.
  • FIG. 1 comprises a block diagram of an approach to query packaging according to various embodiments of the present invention
  • FIG. 2 comprises a flow chart of an approach for query packaging according to various embodiments of the present invention.
  • FIG. 3 comprises a block diagram of an apparatus for query packaging according to various embodiments of the present invention.
  • Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity. It will further be appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. It will also be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein.
  • the present approaches allow multiple queries to be packaged together providing a more efficient way of accessing data.
  • a query planner apparatus computes the "union" of the individual queries, creating a single query plan that retrieves all the data that is needed by the two or more queries.
  • the query planner apparatus may select the proper subset of results to pass to each individual query.
  • the approaches provided herein provide a mechanism to group queries together such that they share a common step of retrieving the time series data, significantly reducing the (input/output) I/O processing and thus the overall time to execute the set of queries.
  • the present approaches significantly improve query performance by minimizing redundant I/O operations.
  • the present approaches allow for multiple queries to be submitted together to a query planner apparatus.
  • the query planner apparatus evaluates the incoming queries and determines if there is significant overlap between them in terms of the data that will be retrieved. In one example, the determination of significant overlap may be based on the time ranges specified in the queries (e.g., require that queries share at least some minimum percentage of their respective time windows in order to be considered significantly overlapping).
  • the query planner apparatus may, in addition, require that the queries share elements of the data model (e.g., require that the requested data of certain variables be for the same or similar variable groups/partitions in order for the queries to be considered overlapping).
  • the present approaches evaluate individually submitted jobs and determine if their level of overlap meets or exceeds the minimum threshold. If so, the many jobs can be repackaged into a single job for execution. This eliminates the need for repetitive I/O and has the added benefit of reducing the number of distinct jobs that have to be started within the system, another source of processing delay.
  • the query planner 102 includes a determine overlap module 104 and a sort overlapping data to query module 106.
  • the determine overlap module 104 and the sort overlapping data to query module 106 may be implemented as programmed software operating on a processing device.
  • the query planner 102 receives a first query 108 and a second query 110.
  • the determine overlap module 104 determines the extent of data overlap of the first query 108 and the second query 110.
  • time ranges on the queries 108 and 110 may specify a time period of interest for the queries.
  • time periods of 1 to 5 may be specified in the first query 108 and a time range of 3 to 7 may be specified in the second query 110 (as used herein the units for these times are arbitrary, but can be second, milliseconds, and so forth to mention a few examples).
  • the time overlap is 3 to 5 as between the queries.
  • a third query 120 is formed with the 1 to 7 time range.
  • a first storage device 122 includes first time series data 124 (for times 1 to 3) and second time series data 126 (for times 3 to 5).
  • a second storage device 128 includes third time series data 130 (for times 5 to 7) and fourth time series data 132 (for times 7 to 9).
  • the third query 120 is sent as needed to the first storage device 122 or the second storage device 128 to retrieve as appropriate the first time series data 124, the second time series data 126, and the third time series data 130.
  • the third query 120 is a union of the first query and the second query.
  • the third query 130 represents a best plan to obtain data for both queries.
  • the sort overlapping data to query module 106 may receive all data (the first time series data 124, the second time series data 126, and the third time series data 130) and this data is distributed appropriately in response to the first query 108 and data exclusively for the second query 110).
  • the query 120 has a single read to data storage device 122 and a single read to data storage device 128.
  • the two reads occur in parallel. This is different from previous approaches where two reads would have been made to the first data storage device and another read to the second data storage device.
  • the reduction in the number of reads improves system performance.
  • the 106 sorts the data and sends data for the 1 to 5 time periods in response to the first query 108 (i.e., the first time series data 124 and second time series data 126), and data for the 3 to 7 time period in response to the second query 110 (i.e., the second time series data 126 and the third time series data 130).
  • the first time series data 124 and second time series data 126 is returned to the first query 108 as results 140
  • the second time series data 126 and this time series data 130 is return as results 142 to the second query 110. This is all accomplished with a minimum number of read operations.
  • a first query and a second query are received.
  • the first query and the second query are evaluated.
  • first time series data required to fulfill the first query and second time series data required to fulfill the second query are evaluated.
  • an extent of overlap of the first time series data and the second time series data and identifying the overlapping data is determined.
  • the overlapping data is retrieved from a plurality of data storage devices in parallel, the data retrieved across all of the plurality of storage devices via a single read operation.
  • the retrieved data is sorted for disbursement to the first query and the second query.
  • the extent of overlap is determined based upon time ranges specified in the first query and the second query.
  • the first query or the second query comprises a read query.
  • the first query is from a first analytic and the second query is from a second analytic.
  • the query results are retrieved. In some examples, a subset of the results is determined.
  • a query planner apparatus 300 for executing multiple, time series data queries includes an interface 302 and a processor 304.
  • the interface 302 has an input 306 and an output 308 and the input 306 is configured to receive a first query 310 and a second query 312.
  • the processor 304 is coupled to the interface 302 and is configured to evaluate the first query 310 and the second query 312 and, based upon the evaluation, identify first time series data required to fulfill the first query and second time series data required to fulfill the second query.
  • the processor 304 is further configured to determine an extent of overlap of the first time series data and the second time series data and identify the overlapping data.
  • the processor 304 is further configured to, when the extent of overlap exceeds a

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A first query and a second query are received. The first query and the second query are evaluated and, based upon the evaluating, identifying first time series data required to fulfill the first query and second time series data required to fulfill the second query. An extent of overlap of the first time series data and the second time series data is determined. When the extent of overlap exceeds a predetermined threshold, the overlapping data is retrieved from a plurality of data storage devices in parallel, the data retrieved across all of the plurality of storage devices via a single read operation.

Description

APPARATUS AND METHOD FOR TIME SERIES QUERY PACKAGING Cross References to Related Applications
[0001] Utility application entitled "Apparatus and Method for Optimizing Time
Series Data Storage Based Upon Prioritization" naming as inventors John A. Interrante, Kareem S. Aggour, Jenny W. Williams, Ward L. Bowman, Jerry Lin, Sunil Mathur, Brian Courtney, and Justin McHugh, and having attorney docket number 265605 (130291);
[0002] Utility application entitled "Apparatus and method for Memory Storage and
Analytic Execution of Time Series Data" naming as inventors John A. Interrante, Kareem S. Aggour, Jenny W. Williams, Ward L. Bowman, Sunil Mathur, Brian Courtney, and Justin McHugh, and having attorney docket number 265604 (130292);
[0003] Utility application entitled "Apparatus and Method for Executing Parallel
Time Series Data Analytics" naming as inventors Kareem S. Aggour, Ward L. Bowman, Jerry Lin, Sunil Mathur, Michael Solda, Brian Courtney, and Justin McHugh and having attorney docket number 265596 (130294);
[0004] Utility application entitled "Apparatus and Method for Optimizing Time Data
Storage" naming as inventors Kareem S. Aggour, Ward L. Bowman, Sunil Mathur, Brian Courtney, and Justin McHugh, and having attorney docket number 265600 (130293);
[0005] Utility application entitled "Apparatus and Method for Optimizing Time Data
Store Usage" naming as inventors Kareem S. Aggour, Ward L. Bowman, Sunil Mathur, Justin McHugh, Ryan Cahalane, and John Leppiaho, and having attorney docket number 265599 (130296);
[0006] are being filed on the same date as the present application, the contents of which are incorporated herein by reference in their entireties.
Background of the Invention Field of the Invention
[0007] The subject matter disclosed herein relates to time series data, and, more specifically, to the efficient retrieval of time series data using queries.
Brief Description of the Related Art
[0008] Data is stored on data storage devices in a variety of different formats.
Additionally, various types of data storage devices are used to store data and these data storage devices may vary in cost. In one example, data may be stored according to certain formats on high cost devices such as random access memories (RAMs). In other examples, data may be stored on low cost devices such as on hard disks.
[0009] One type of data that is stored at data storage devices is time series data. In one aspect, time series data is obtained by some type of sensor or measurement device and the data is then stored as a function of time. For example, a measurement sensor may take a reading of a parameter at predetermined time intervals, and each of the measurements is stored in memory. Since large amounts of data are typically involved with time series measurements, the storage and retrieval of this data may become inefficient.
[0010] A typical read query in traditional time-series databases usually includes two properties: a variable identifier to query and a query time range. There has been substantial research into query optimization of individual queries in such systems, where multiple queries are run one at a time. However, because the query engine lacks the awareness of common properties across multiple queries, it is not able to most efficiently utilize system resources to process many queries.
[0011] Most time-series applications often access the most recent data, i.e., multiple queries request data from a recent (overlapping) time span. This approach results in redundant I/O reads when reading the raw data from disk, because each query ends up accessing largely the same data.
[0012] One previous approach that attempted to alleviate this problem, was to scan an entire relational table. Instead of executing queries to retrieve the data, the queries were registered to receive raw data. Data was then streamed to these queries, and data that was relevant to one or more registered queries is selected. Unfortunately, this technique requires the reading of the entire table or partition in order to satisfy multiple queries at once.
[0013] For the above-mentioned reasons, previous approaches have suffered various problems. As a result, user dissatisfaction of these previous approaches has resulted.
Brief Description of the Invention
[0014] The present approaches package multiple queries as a set, for example, if they span roughly the same time metrics and/or duration. This results in the performance of a single shared data access operation before executing each query. Consequently, significantly improved multi-query performance is achieved.
[0015] For example, a user may want to run several analytics that require retrieving raw values from the last calendar day. Running each of these analytics individually would involve repeatedly retrieving the same 24 hours of raw data. Instead, the present approaches enable the analytics to be run in parallel such that, for instance, the 24 hours of data can be retrieved only once and shared among the analytics. By "analytics" and as used herein, it is meant any operations meant to analyze or manipulate the time series data, including but not limited to generating averages, calculating means and standard deviations, and identifying minimum and maximum values.
[0016] In many of these embodiments, a first query and a second query are received.
The first query and the second query are evaluated and, based upon the evaluating, identifying first time series data required to fulfill the first query and second time series data required to fulfill the second query. An extent of overlap of the first time series data and the second time series data and identifying the overlapping data is determined. When the extent of overlap exceeds a predetermined threshold, the data required to fulfill both the first query and the second query is retrieved from a plurality of data storage devices in parallel, the data retrieved across all of the plurality of data storage devices via a single read operation.
[0017] In some aspects, the retrieved data is sorted for disbursement to the first query and the second query. In other aspects, the extent of overlap is determined based upon time ranges specified in the first query and the second query. [0018] In some aspects, the first query or the second query comprises a read query.
In other examples, the first query is from a first analytic and the second query is from a second analytic.
[0019] In other aspects, the query results (e.g., for the first query or the second query) are received. In some examples, a subset of the results is determined.
[0020] In others of these embodiments, an apparatus that is configured to execute multiple, time series data queries includes an interface and a processor. The interface has an input and an output and the input is configured to receive a first query and a second query.
[0021] The processor is coupled to the interface and is configured to evaluate the first query and the second query and, based upon the evaluation, identify first time series data required to fulfill the first query and second time series data required to fulfill the second query. The processor is further configured to determine an extent of overlap of the first time series data and the second time series data and identifying the overlapping data. The processor is further configured to, when the extent of overlap exceeds a predetermined threshold, retrieve the data required to fulfill both the first query and the second query from a plurality of data storage devices in parallel. The full data is retrieved across all of the plurality of data storage devices via a single read operation.
Brief Description of the Drawings
[0022] For a more complete understanding of the disclosure, reference should be made to the following detailed description and accompanying drawings wherein:
[0023] FIG. 1 comprises a block diagram of an approach to query packaging according to various embodiments of the present invention;
[0024] FIG. 2 comprises a flow chart of an approach for query packaging according to various embodiments of the present invention; and
[0025] FIG. 3 comprises a block diagram of an apparatus for query packaging according to various embodiments of the present invention. [0026] Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity. It will further be appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. It will also be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein.
Detailed Description of the Invention
[0027] The present approaches allow multiple queries to be packaged together providing a more efficient way of accessing data. In this respect, if it is determined that two or more queries overlap significantly, then a query planner apparatus computes the "union" of the individual queries, creating a single query plan that retrieves all the data that is needed by the two or more queries. When the results are returned, the query planner apparatus may select the proper subset of results to pass to each individual query.
[0028] As has been mentioned, running these queries individually as in previous systems results in redundant efforts to retrieve similar sets of time series data multiple times. In contrast, the approaches provided herein provide a mechanism to group queries together such that they share a common step of retrieving the time series data, significantly reducing the (input/output) I/O processing and thus the overall time to execute the set of queries. Put another way, because data movement and I/O is typically a significant amount of the processing time of a query, the present approaches significantly improve query performance by minimizing redundant I/O operations.
[0029] The present approaches allow for multiple queries to be submitted together to a query planner apparatus. The query planner apparatus evaluates the incoming queries and determines if there is significant overlap between them in terms of the data that will be retrieved. In one example, the determination of significant overlap may be based on the time ranges specified in the queries (e.g., require that queries share at least some minimum percentage of their respective time windows in order to be considered significantly overlapping). The query planner apparatus may, in addition, require that the queries share elements of the data model (e.g., require that the requested data of certain variables be for the same or similar variable groups/partitions in order for the queries to be considered overlapping).
[0030] Reducing redundant I/O steps results in quicker average query execution time for time series analytics, enabling analysts/users to identify and solve problems faster, particularly for remote monitoring and diagnostics. The present approaches are also useful for providing very efficient visualization capabilities. Additionally, in many cases this frees up processing resources for other uses. A system implementing the present approaches is faster and use less processing resources compared to other systems.
[0031] In other advantages, the present approaches evaluate individually submitted jobs and determine if their level of overlap meets or exceeds the minimum threshold. If so, the many jobs can be repackaged into a single job for execution. This eliminates the need for repetitive I/O and has the added benefit of reducing the number of distinct jobs that have to be started within the system, another source of processing delay.
[0032] Referring now to FIG. 1, a system 100 that uses a query planner 102 is described. The query planner 102 includes a determine overlap module 104 and a sort overlapping data to query module 106. The determine overlap module 104 and the sort overlapping data to query module 106 may be implemented as programmed software operating on a processing device.
[0033] The query planner 102 receives a first query 108 and a second query 110. The determine overlap module 104 determines the extent of data overlap of the first query 108 and the second query 110. For example, time ranges on the queries 108 and 110 may specify a time period of interest for the queries. For instance, time periods of 1 to 5 may be specified in the first query 108 and a time range of 3 to 7 may be specified in the second query 110 (as used herein the units for these times are arbitrary, but can be second, milliseconds, and so forth to mention a few examples). The time overlap is 3 to 5 as between the queries. After the overlap is determined, a third query 120 is formed with the 1 to 7 time range. In some examples, it is determined whether the extent of overlap has reached a predetermined threshold. For example, a time over lap of 1 (in the present example) may be required to meet the threshold. If the threshold is not met, then a "union" operation is not performed or undertaken. [0034] A first storage device 122 includes first time series data 124 (for times 1 to 3) and second time series data 126 (for times 3 to 5). A second storage device 128 includes third time series data 130 (for times 5 to 7) and fourth time series data 132 (for times 7 to 9). The third query 120 is sent as needed to the first storage device 122 or the second storage device 128 to retrieve as appropriate the first time series data 124, the second time series data 126, and the third time series data 130.
[0035] The third query 120 is a union of the first query and the second query. In this respect, the third query 120 includes a first read (to data storage device 122 to get the first and second time series data 124 and 126 from t=l to 5) and a second read (to get the third time series data 130 from T=5 to 7). In other words, the third query 130 represents a best plan to obtain data for both queries. Once the data is received in response to the third query 120, it is sorted by the sort overlapping data to query module 106. For example, the sort overlapping data to query module 106 may receive all data (the first time series data 124, the second time series data 126, and the third time series data 130) and this data is distributed appropriately in response to the first query 108 and data exclusively for the second query 110).
[0036] Thus, the query 120 has a single read to data storage device 122 and a single read to data storage device 128. The two reads occur in parallel. This is different from previous approaches where two reads would have been made to the first data storage device and another read to the second data storage device. The reduction in the number of reads improves system performance.
[0037] As mentioned and once retrieved, the sort overlapping data to query module
106 sorts the data and sends data for the 1 to 5 time periods in response to the first query 108 (i.e., the first time series data 124 and second time series data 126), and data for the 3 to 7 time period in response to the second query 110 (i.e., the second time series data 126 and the third time series data 130). In this way the first time series data 124 and second time series data 126 is returned to the first query 108 as results 140, the second time series data 126 and this time series data 130 is return as results 142 to the second query 110. This is all accomplished with a minimum number of read operations.
[0038] It will be appreciated that many different algorithms can be used to implement the modules 104 and 106. However, the exact algorithms used will depend upon, among other things, the nature of the queries, and the nature and identity of any potential
overlapping information.
[0039] Referring now to FIG. 2, an approach for data storage is described. At step
202, a first query and a second query are received. At step 204 the first query and the second query are evaluated. Based upon the evaluating, at step 206, first time series data required to fulfill the first query and second time series data required to fulfill the second query are evaluated. At step 208, an extent of overlap of the first time series data and the second time series data and identifying the overlapping data is determined. At step 210, when the extent of overlap exceeds a predetermined threshold, the overlapping data is retrieved from a plurality of data storage devices in parallel, the data retrieved across all of the plurality of storage devices via a single read operation.
[0040] In some aspects, the retrieved data is sorted for disbursement to the first query and the second query. In other aspects, the extent of overlap is determined based upon time ranges specified in the first query and the second query.
[0041] In some aspects, the first query or the second query comprises a read query.
In other examples, the first query is from a first analytic and the second query is from a second analytic. In other aspects, the query results are retrieved. In some examples, a subset of the results is determined.
[0042] Referring now to FIG. 3, a query planner apparatus 300 for executing multiple, time series data queries includes an interface 302 and a processor 304. The interface 302 has an input 306 and an output 308 and the input 306 is configured to receive a first query 310 and a second query 312.
[0043] The processor 304 is coupled to the interface 302 and is configured to evaluate the first query 310 and the second query 312 and, based upon the evaluation, identify first time series data required to fulfill the first query and second time series data required to fulfill the second query. The processor 304 is further configured to determine an extent of overlap of the first time series data and the second time series data and identify the overlapping data. The processor 304 is further configured to, when the extent of overlap exceeds a
predetermined threshold, retrieve the overlapping data from a plurality of data storage devices in parallel. The data retrieved across all of the plurality of storage devices via a single read operation. [0044] It will be appreciated by those skilled in the art that modifications to the foregoing embodiments may be made in various aspects. Other variations clearly would also work, and are within the scope and spirit of the invention. The present invention is set forth with particularity in the appended claims. It is deemed that the spirit and scope of that invention encompasses such modifications and alterations to the embodiments herein as would be apparent to one of ordinary skill in the art and familiar with the teachings of the present application.

Claims

What is Claimed Is:
1. A method for executing multiple, time series data queries, the method comprising:
receiving a first query and a second query;
evaluating the first query and the second query and, based upon the evaluating, identifying first time series data required to fulfill the first query and second time series data required to fulfill the second query;
determining an extent of overlap of the first time series data and the second time series data and identifying overlapping data; and
when the extent of overlap exceeds a predetermined threshold, retrieving the overlapping data from a plurality of data storage devices in parallel, the data being retrieved across all of the plurality of data storage devices via a single read operation.
2. The method of claim 1 further comprising sorting the retrieved data for disbursement to the first query and the second query.
3. The method of claim 1 wherein the extent of overlap is determined based upon time ranges specified in the first query and the second query.
4. The method of claim 1 wherein the first query or the second query comprise a read query.
5. The method of claim 1 wherein the first query is from a first analytic and the second query is from a second analytic.
6. The method of claim 1 further comprising receiving a first result for the first query and a second result for the second query.
7. The method of claim 6 further comprises determining a subset of the first result or the second result.
8. An apparatus configured to execute multiple, time series data queries, the apparatus comprising:
an interface having an input and an output, the input configured to receive a first query and a second query; and
a processor coupled to the interface, the processor configured to evaluate the first query and the second query and, based upon the evaluation, identify first time series data required to fulfill the first query and second time series data required to fulfill the second query, the processor further configured to determine an extent of overlap of the first time series data and the second time series data and identify the overlapping data, the processor further configured to, when the extent of overlap exceeds a predetermined threshold, retrieve the overlapping data from a plurality of data storage devices in parallel, the data retrieved across all of the plurality of data storage devices via a single read operation.
9. The apparatus of claim 8 wherein the processor is further configured to sort the retrieved data for disbursement to the first query and the second query.
10. The apparatus of claim 8 wherein the extent of overlap is determined based upon time ranges specified in the first query and the second query.
11. The apparatus of claim 8 wherein the first query or the second query comprise a read query.
12. The apparatus of claim 8 wherein the first query is from a first analytic and the second query is from a second analytic.
13. The apparatus of claim 8 wherein the processor is further configured to receive a first result for the first query and a second result for the second query.
The apparatus of claim 13 wherein the processor is further configured to determine subset of the first result and the second result.
EP13713694.1A 2013-03-18 2013-03-18 Apparatus and method for time series query packaging Withdrawn EP2976724A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2013/032823 WO2014149031A1 (en) 2013-03-18 2013-03-18 Apparatus and method for time series query packaging

Publications (1)

Publication Number Publication Date
EP2976724A1 true EP2976724A1 (en) 2016-01-27

Family

ID=48045120

Family Applications (1)

Application Number Title Priority Date Filing Date
EP13713694.1A Withdrawn EP2976724A1 (en) 2013-03-18 2013-03-18 Apparatus and method for time series query packaging

Country Status (3)

Country Link
US (1) US20160054952A1 (en)
EP (1) EP2976724A1 (en)
WO (1) WO2014149031A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9934275B2 (en) * 2015-01-12 2018-04-03 Red Hat, Inc. Query union and split

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7146365B2 (en) * 2003-01-27 2006-12-05 International Business Machines Corporation Method, system, and program for optimizing database query execution
IL197961A0 (en) * 2009-04-05 2009-12-24 Guy Shaked Methods for effective processing of time series
SG166014A1 (en) * 2009-04-14 2010-11-29 Electron Database Corp Pte Ltd Server architecture for multi-core systems
US8346758B2 (en) * 2010-08-31 2013-01-01 International Business Machines Corporation Method and system for transmitting a query in a wireless network
US8336051B2 (en) * 2010-11-04 2012-12-18 Electron Database Corporation Systems and methods for grouped request execution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None *
See also references of WO2014149031A1 *

Also Published As

Publication number Publication date
WO2014149031A1 (en) 2014-09-25
US20160054952A1 (en) 2016-02-25

Similar Documents

Publication Publication Date Title
US9235622B2 (en) System and method for an efficient query sort of a data stream with duplicate key values
EP3117347B1 (en) Systems and methods for rapid data analysis
US20070143246A1 (en) Method and apparatus for analyzing the effect of different execution parameters on the performance of a database query
CN107329983B (en) Machine data distributed storage and reading method and system
US20180173753A1 (en) Database system and method for compiling serial and parallel database query execution plans
US11074242B2 (en) Bulk data insertion in analytical databases
US9235590B1 (en) Selective data compression in a database system
US20090327220A1 (en) Automated client/server operation partitioning
CN107122126B (en) Data migration method, device and system
US10915533B2 (en) Extreme value computation
CN102934097A (en) Data deduplication
WO2017162086A1 (en) Task scheduling method and device
US10176231B2 (en) Estimating most frequent values for a data set
US10877973B2 (en) Method for efficient one-to-one join
US10248618B1 (en) Scheduling snapshots
CN111742309A (en) Automated database query load assessment and adaptive processing
CN111061758A (en) Data storage method, device and storage medium
US10331670B2 (en) Value range synopsis in column-organized analytical databases
US20160054952A1 (en) Apparatus and method for time series query packaging
US9305045B1 (en) Data-temperature-based compression in a database system
CN115576924A (en) Data migration method
Wang et al. Turbo: Dynamic and decentralized global analytics via machine learning
CN113312434A (en) Pre-polymerization treatment method for massive structured data
US20160259703A1 (en) Retrieval control method, and retrieval control device
CN105447137A (en) Algorithm for retrieving data with same master-slave relation from big data on the basis of relational database

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20151019

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20190313

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20190724