CA3132004A1 - Sql query optimization method and device - Google Patents
Sql query optimization method and device Download PDFInfo
- Publication number
- CA3132004A1 CA3132004A1 CA3132004A CA3132004A CA3132004A1 CA 3132004 A1 CA3132004 A1 CA 3132004A1 CA 3132004 A CA3132004 A CA 3132004A CA 3132004 A CA3132004 A CA 3132004A CA 3132004 A1 CA3132004 A1 CA 3132004A1
- Authority
- CA
- Canada
- Prior art keywords
- subquery
- query
- statements
- target data
- statement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24539—Query rewriting; Transformation using cached or materialised query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
parsing SQL
query statements classified as UNION type query statements, and counting the number of storage segments associated with target data, wherein a UNION type query includes multiple subquery statements; when the number of associated storage segments exceeds a threshold, dividing query periods of the subquery statements into multiple granularity periods, then adding combined target data dimensions for the subquery statements into the subquery statements as additional limit rules; and sending the subquery statements with limit rules to a computation engine, parallelly accessing target data and dimensions in each granularity period, then returning the query results to users after calculation. The forementioned device is applied to achieve the described method.
Description
Technical Field [0001] The present invention is related to the field of big data technologies, in particular to a SQL query optimization method and device.
Background
Summary
type query includes multiple subquery statements;
Date Recue/Date Received 2021-09-27
accessing the query period corresponding to each described subquery statement, and dividing each query period into multiple granularity periods according to a pre-set mean granularity.
where if the dimensions of the target data corresponding to each subquery statement are identical, selecting one of the described dimension of the target data corresponding to the subquery statement and adding into each subquery statement as an additional limit rule;
and
Preferably, the described UNION type query statement includes at least a primary period query statement and a secondary period query statement, and the described limit rules includes query metrices, a dimension rank, and numbers of target data queries besides the additional rule.
Preferably, the computation engine parallelly accesses the target data and dimension values in each granularity period by:
Furthermore, the query result is return to users after calculation on the processing terminal by:
Preferably, the described subquery statements with limit rules are sent to the computation engine, parallelly accessing target data and dimension values in each granularity period, then returning the query results to the users after the calculation on the processing terminal, is followed by the step of:
Date Recue/Date Received 2021-09-27
Date Recue/Date Received 2021-09-27
a statement parsing unit, configured to parse SQL query statements classified as UNION type query statements, and counting the number of storage segments associated with target data, wherein a UNION type query includes multiple subquery statements;
a detecting and processing unit, configured to divide query periods of the subquery statements into multiple granularity periods when the number of associated storage segments exceeds a threshold, combine the target data dimensions for the subquery statements, and add the combined target data into the subquery statements as an additional limit rule;
and
a searching unit, configured to send the subquery statements with limit rules to a computation engine, parallelly access target data and dimension values in each granularity period, then return the query results to the users after the calculation on the processing terminal.
From the third perspective, a computer readable storage medium is provided in the present invention, wherein the computer programs are stored on. When the described computer programs are executed by the processor, any of the procedures in the forementioned SQL query optimization method are performed.
Compared with the currently available techniques, the computer readable storage medium can provide the same benefits as forementioned SQL query optimization method and is not explained in detail herein.
Brief descriptions of the drawings
To clarify, the described drawings are providing further explanations for the present invention as a part of the present invention. The demonstration embodiments and descriptions are used to explain the present invention and shall not limit the present invention. In the drawings:
Fig. 3 is a schematic diagram of the SQL query optimization method in the present invention.
Date Recue/Date Received 2021-09-27 Detailed descriptions
Based on the embodiments in the present invention, the other applications by those skilled in the art without any creative works are falling within the scope of the present invention.
Referring to Fig. 1, a SQL query optimization method is provided in the present embodiment, comprising:
type query includes multiple subquery statements; when the number of associated storage segments exceeds a threshold, dividing query periods of the subquery statements into multiple granularity periods, and adding the combined target data dimensions for the subquery statements into the subquery statements as additional limit rules; and sending the subquery statements with limit rules to a computation engine, parallelly accessing target data and dimension values in each granularity period, then returning the query results to the users after the calculation on the processing terminal.
Obviously, the present invention can not only solve the problem of directly circuit-breaker mechanism of the computation engine during a full-volume query, but also standardize and yield the uniform dimension of the target data for each subquery statement, to guarantee the query result output.
In practice, the UNION type query statements are represented as UNION ALL, including primary period query statement and a secondary period query statement. The described limit rules include query metrices, dimension rank, and numbers of target data queries besides the additional rule. The computation engine is Druid engine; the source of data is noted as datasource; the storage segments are noted as segment, and the processing terminal is the spark terminal. In particular, the numbers of target data queries in the limit rules are used to restrict the target data reading volume by the computation engine, to prevent initiating the circuit-breaker mechanisms.
In the forementioned embodiment, when the number of associated storage segments exceeds the threshold, the query periods of the subquery statements are divided into multiple granularity periods by:
accessing the query period corresponding to each described subquery statement, and dividing each query period into multiple granularity periods according to a pre-set mean granularity.
Illustratively, the SQL query statement includes inner statement and outer statement.
Taking the year-to-year period query as an example, the inner statement includes the primary subquery statement and the secondary subquery statement, wherein the primary subquery statement and the secondary subquery statement imply the current period query statement and the historical period query statement, individually. The inner statement is used to acquire the target data corresponding to the primary subquery statement and the secondary subquery statement from the Druid, and the outer statement is used to access the dimensions of the target data and perform associated year-to-year calculation from the spark terminal based on the inner statement.
query statements, to extract the original data information able to be sliced, such as the target data Date Recue/Date Received 2021-09-27 associated storage segment query command, target data dimension query command, target data matrix query command, target data period query command, etc. Then, the number of the storage segments associated with the target data is determined to be exceeding the threshold or not.
Where if the storage segments associated with the target data exceeds the threshold, the entire subquery can be sent to the Druid for a full-volume query based on the current techniques.
Where if the storage segments associated with the target data does not exceed the threshold, the query period is evenly divided into multiple granularity periods for parallel queries, to prevent initiating the circuit-breaker mechanisms. For example, the minimum granularity is a day when the datasource is taken, and the current SQL query statement queries the data within one month.
The one-month query period is divided into thirty granularity periods. In the following, the SQL
query statement corresponding to each granularity period are processed, in order to reserve only necessary portion of the target dimension data and filter out the irrelevant derived metrices data.
The post-processing SQL query data corresponding to these granularity periods are converted to json statements receivable by the druid. A request for acquiring the target data is sent to the druid.
After the druid queries the target data for each granularity period, the dimension sets and corresponding dimension values of each granularity period are returned by the druid. The data sets returned from all granularity periods are combined. Then, a data set with less or equal data volume to the target data query is obtained via selection sort algorithm.
where if the dimensions of the target data corresponding to each subquery statement are identical, selecting one of the described dimension of the target data corresponding to the subquery statement and adding into each subquery statement as an additional limit rule;
and where if the dimensions of the target data corresponding to each subquery statement are not identical, combining the target data dimensions for the subquery statements and adding the combined target data into the subquery statements as an additional limit rule.
For example, in terms of the case wherein the pre-set calculation rules are set for growth rate Date Recue/Date Received 2021-09-27 calculation, the growth rate result is obtained via dividing the dimension values of the primary by the dimension values of the primary secondary periods.
query statements, based on the most active datasource and the associated dimension fields, the druid statements are generated dynamically when low query volume. Then the described dimension and dimension values are pre-extracted and stored in the form of materialized views. In the upcoming operations, the materialized views can be extracted immediately once the same datasource or dimension value queries are requested. As a result, it is not required to send the requests to the druid for accessing associated dimension value sets, to achieve the fast calling.
statement includes the history period query results, the present SQL query statement can be split based on the query periods of the query statements. The history period query results can be directly extracted from the databased and the rest of present periods can be obtained from the druid query. For example, Date Recue/Date Received 2021-09-27 the dimension data for day 1 to day 20 has been stored. If the current query requests for dimension from day 1 to day 30, only the granularity periods of day 21 to day 30 are sent to the druid to greatly reduce the pressure over the druid.
3. combining the target data dimensions for the subquery statements and adding the combined target data into the subquery statements as an additional limit rule;
after processing each subquery statement, converting the subqueries into the json statements that are readable by the druid. With the additional limit rules, the current subquery statements request more dimensions than the limit rules. In other words, the druid does not need to scan all the data within the period, wherein no circuit-breaker mechanisms initiation or query time-out, to greatly reduce the pressure over the druid.
a statement parsing unit, configured to parse SQL query statements classified as UNION type query statements, and counting the number of storage segments associated with target data, wherein a UNION type query includes multiple subquery statements;
a detecting and processing unit, configured to divide query periods of the subquery statements into multiple granularity periods when the number of associated storage segments exceeds a threshold, combine the target data dimensions for the subquery statements, and add the combined target data into the subquery statements as an additional limit rule;
and
a searching unit, configured to send the subquery statements with limit rules to a computation engine, parallelly access target data and dimension values in each granularity period, then return the query results to the users after the calculation on the processing terminal.
Date Recue/Date Received 2021-09-27
Date Recue/Date Received 2021-09-27
Claims (10)
parsing SQL query statements classified as UNION type query statements, and counting the number of storage segments associated with target data, wherein a UNION type query includes multiple subquery statements;
when the number of associated storage segments exceeds a threshold, dividing query periods of the subquery statements into multiple granularity periods, and adding the combined target data dimensions for the subquery statements into the subquery statements as additional limit rules;
and sending the subquery statements with limit rules to a computation engine, parallelly accessing target data and dimension values in each granularity period, then returning the query results to the users after the calculation on the processing terminal.
accessing the query period corresponding to each described subquery statement, and dividing each query period into multiple granularity periods according to a pre-set mean granularity.
identifying the dimensions of the target data corresponding to each subquery statement;
where if the dimensions of the target data corresponding to each subquery statement are identical, selecting one of the described dimension of the target data corresponding to the subquery statement and adding into each subquery statement as an additional limit rule;
and Date Recue/Date Received 2021-09-27 where if the dimensions of the target data corresponding to each subquery statement are not identical, combining the target data dimensions for the subquery statements and adding the combined target data into the subquery statements as an additional limit rule.
according to each subquery statement and corresponding limit rules, on the computation engine, searching for target data satisfying the described target data query amount rules from the storage segments corresponding to data sources, and extracting an associated dimension value.
based on the dimension value of each subquery, calculating according to the pre-set rules on the processing terminal and returning query results to users.
counting hot data source and hot data dimensions from the history SQL
statements, and pre-extract the dimension values of the hot data dimensions, for fast calling of the upcoming SQL
query statement.
Date Recue/Date Received 2021-09-27
caching each subquery statement and the target data and corresponding dimension values for each subquery statement in the database, for fast calling of upcoming identical subquery statements from the database.
a statement parsing unit, configured to parse SQL query statements classified as UNION type query statements, and counting the number of storage segments associated with target data, wherein a UNION type query includes multiple subquery statements;
a detecting and processing unit, configured to divide query periods of the subquery statements into multiple granularity periods when the number of associated storage segments exceeds a threshold, combine the target data dimensions for the subquery statements, and add the combined target data into the subquery statements as an additional limit rule;
and a searching unit, configured to send the subquery statements with limit rules to a computation engine, parallelly access target data and dimension values in each granularity period, then return the query results to the users after the calculation on the processing terminal.
Date Recue/Date Received 2021-09-27
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011048287.7A CN112286961B (en) | 2020-09-29 | 2020-09-29 | SQL optimization query method and device |
| CN202011048287.7 | 2020-09-29 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CA3132004A1 true CA3132004A1 (en) | 2022-03-29 |
Family
ID=74422478
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CA3132004A Pending CA3132004A1 (en) | 2020-09-29 | 2021-09-27 | Sql query optimization method and device |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN112286961B (en) |
| CA (1) | CA3132004A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114860753A (en) * | 2022-03-31 | 2022-08-05 | 中国建设银行股份有限公司 | A method, device, device, medium and product for generating an SQL statement |
| CN117453732A (en) * | 2023-12-25 | 2024-01-26 | 智业软件股份有限公司 | CDSS doctor's advice data query optimization method and system |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112765286A (en) * | 2021-02-01 | 2021-05-07 | 广州海量数据库技术有限公司 | Query method and device based on relational database |
| CN113448983B (en) * | 2021-07-15 | 2024-01-30 | 中国银行股份有限公司 | Knowledge point processing method, device, server, medium and product |
| CN113901118A (en) * | 2021-09-28 | 2022-01-07 | 湖南新云网科技有限公司 | Data export method, terminal equipment and storage medium |
| CN113868285A (en) * | 2021-09-30 | 2021-12-31 | 平安科技(深圳)有限公司 | Data reading method, device, electronic device and storage medium |
| CN116414866A (en) * | 2023-04-18 | 2023-07-11 | 北京人大金仓信息技术股份有限公司 | Optimization method, storage medium and equipment for database operation statement |
| CN117093609A (en) * | 2023-08-24 | 2023-11-21 | 中国工商银行股份有限公司 | Query statement processing method, device, equipment, medium and program product |
| CN117056388A (en) * | 2023-08-30 | 2023-11-14 | 北京自如信息科技有限公司 | Data pushing method and device, computer equipment and storage medium |
| CN118377586B (en) * | 2024-04-02 | 2024-11-29 | 国家计算机网络与信息安全管理中心 | An optimization method for traffic merging and computation merging for SQL tasks |
| CN119358030A (en) * | 2024-12-23 | 2025-01-24 | 易联云计算(杭州)有限责任公司 | A method for protecting security permissions of generative data analysis |
| CN120448417B (en) * | 2025-07-09 | 2025-09-02 | 本溪钢铁(集团)信息自动化有限责任公司 | A query optimization method and device for structured query language |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103714058B (en) * | 2012-09-28 | 2017-05-17 | Sap欧洲公司 | Method for database query optimization and system using same |
| CN110096489A (en) * | 2019-04-30 | 2019-08-06 | 阿里巴巴集团控股有限公司 | A kind of data query method, system, device and electronic equipment |
| CN110515973B (en) * | 2019-08-30 | 2022-02-18 | 上海达梦数据库有限公司 | Data query optimization method, device, equipment and storage medium |
-
2020
- 2020-09-29 CN CN202011048287.7A patent/CN112286961B/en active Active
-
2021
- 2021-09-27 CA CA3132004A patent/CA3132004A1/en active Pending
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114860753A (en) * | 2022-03-31 | 2022-08-05 | 中国建设银行股份有限公司 | A method, device, device, medium and product for generating an SQL statement |
| CN117453732A (en) * | 2023-12-25 | 2024-01-26 | 智业软件股份有限公司 | CDSS doctor's advice data query optimization method and system |
| CN117453732B (en) * | 2023-12-25 | 2024-03-01 | 智业软件股份有限公司 | CDSS doctor's advice data query optimization method and system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112286961B (en) | 2022-11-18 |
| CN112286961A (en) | 2021-01-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CA3132004A1 (en) | Sql query optimization method and device | |
| US20210319024A1 (en) | Methods and systems for joining indexes for query optimization in a multi-tenant database | |
| US8234268B2 (en) | System, method, and computer-readable medium for optimizing processing of distinct and aggregation queries on skewed data in a database system | |
| Chaudhuri et al. | Evaluating top-k selection queries | |
| US11003649B2 (en) | Index establishment method and device | |
| US20050262158A1 (en) | System and method for externally providing database optimizer statistics | |
| US8423569B2 (en) | Decomposed query conditions | |
| US7921085B2 (en) | Method and system for quantifying a data page repetition pattern for a database index in a database management system | |
| CN107748766B (en) | Big data fast query method based on Presto and elastic search | |
| Durner et al. | Crystal: a unified cache storage system for analytical databases | |
| US20110252020A1 (en) | Enhanced utlization of query optimization | |
| CN117688032B (en) | Query acceleration method and device based on materialized view, electronic equipment and medium | |
| US10795889B2 (en) | Query path with aggregate projection | |
| Behrend et al. | Period index: A learned 2d hash index for range and duration queries | |
| US20200364222A1 (en) | Temporary materialized views | |
| US20180341709A1 (en) | Unstructured search query generation from a set of structured data terms | |
| WO2025026170A1 (en) | Data query method and related device | |
| CN116680284A (en) | Database query processing method, cloud computing platform and cloud computing device | |
| D’silva et al. | Secondary indexing techniques for key-value stores: Two rings to rule them all | |
| US11899665B2 (en) | Data aggregation and pre-positioning for multi-store queries | |
| US8290935B1 (en) | Method and system for optimizing database system queries | |
| Peng et al. | Event detection over live and archived streams | |
| WO2015191033A1 (en) | Top-k projection | |
| US10372708B1 (en) | Statistic normalization in a data store system | |
| US20100332791A1 (en) | System, method, and computer-readable medium for optimizing processing of group-by queries featuring maximum or minimum equality conditions in a parallel processing system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |
|
| EEER | Examination request |
Effective date: 20220506 |