CN112286961A - SQL optimization query method and device - Google Patents

SQL optimization query method and device Download PDF

Info

Publication number
CN112286961A
CN112286961A CN202011048287.7A CN202011048287A CN112286961A CN 112286961 A CN112286961 A CN 112286961A CN 202011048287 A CN202011048287 A CN 202011048287A CN 112286961 A CN112286961 A CN 112286961A
Authority
CN
China
Prior art keywords
query
sub
target data
query statement
statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011048287.7A
Other languages
Chinese (zh)
Other versions
CN112286961B (en
Inventor
卢勇亮
郑尧峰
汤国强
桑强
孙迁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Cloud Computing Co Ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN202011048287.7A priority Critical patent/CN112286961B/en
Publication of CN112286961A publication Critical patent/CN112286961A/en
Priority to CA3132004A priority patent/CA3132004A1/en
Application granted granted Critical
Publication of CN112286961B publication Critical patent/CN112286961B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24539Query rewriting; Transformation using cached or materialised query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an SQL optimization query method and device, relates to the technical field of big data, and can avoid the starting of a fusing mechanism and improve the query efficiency. The method comprises the following steps: when the SQL query statement is analyzed to be a UNION set type query statement, counting the number of storage segments related to target data, wherein the UNION set type query statement comprises a plurality of sub query statements; when the number of related storage segments is larger than a threshold value, splitting the query cycle corresponding to the sub-query statement into a plurality of granularity cycles, summarizing target data dimensions corresponding to the sub-query statements, and adding the summarized target data dimensions as new conditions into the filtering conditions of the sub-query statement; and pushing the sub-query statements containing the filter conditions down to a computing engine, reading the target data corresponding to each granularity period and the dimension value in the target data, and returning a query result to the user after the target data is computed by the processing end. The device is applied with the method provided by the scheme.

Description

SQL optimization query method and device
Technical Field
The invention relates to the technical field of big data, in particular to a SQL optimization query method and a device.
Background
With the continuous increase of the business data volume, the data volume of the data source which is taken into the drive engine by the flow model is huge, and each time when SQL carries out the same-round-ratio query, all data in the SQL specified period need to be searched by the union all clause of the inner layer. In order to protect the effective query of the pipeline engine to other tasks, the pipeline engine can judge the total number of segments in the current SQL period and start a fusing mechanism when the total number of segments exceeds a threshold value, and even if the total number of segments does not exceed the threshold value, the query time can be increased by adopting a full-scale query mode, and the query efficiency is reduced.
Disclosure of Invention
The invention aims to provide a method and a device for SQL optimization query, which can avoid the starting of a fusing mechanism and improve the query efficiency.
In order to achieve the above object, a first aspect of the present invention provides a method for optimizing a query in SQL, comprising:
when the SQL query statement is analyzed to be a UNION set type query statement, counting the number of storage segments related to target data, wherein the UNION set type query statement comprises a plurality of sub query statements;
when the number of the related storage segments is larger than a threshold value, splitting the query cycle corresponding to the sub-query statement into a plurality of granularity cycles, summarizing target data dimensions corresponding to the sub-query statements, and adding the summarized target data dimensions as new conditions into the filtering conditions of the sub-query statement;
and pushing the sub-query statements containing the filter conditions down to a computing engine, reading the target data corresponding to each granularity period and the dimension value in the target data, and returning a query result to a user after the target data is computed by a processing end.
Preferably, when the number of the storage segments is greater than the threshold, the method for splitting the query cycle corresponding to the sub-query statement into a plurality of granularity cycles includes:
extracting a query cycle corresponding to each sub-query statement, and averagely dividing each query cycle into a plurality of granularity cycles according to preset granularity.
Preferably, the method for adding the collected target data dimensions corresponding to the multiple sub-query statements as the added conditions into the filter conditions of the sub-query statements comprises the following steps:
identifying a target data dimension corresponding to each sub-query statement, and when the target data dimensions of the sub-query statements are consistent, adding the target data dimension of any sub-query statement as a newly added condition into the filtering condition of each sub-query statement;
and when the target data dimensions of the sub-query statements are not consistent, summarizing and combining the target data dimensions of the sub-query statements respectively, and adding the combined target data dimensions as newly added conditions into the filtering conditions of each sub-query statement.
Preferably, the query statement of the UNION set type at least includes a first periodic sub-query statement and a second periodic sub-query statement, and the filter condition includes a query metric, a dimension rank and a target data query quantity in addition to the newly added condition.
Preferably, the method for the computing engine to concurrently read the target data and the dimension value thereof corresponding to each granularity period includes:
and respectively querying the target data meeting the target data query quantity condition from the corresponding storage segment of the corresponding data source and extracting the dimension value of the target data according to each sub-query statement and the filter condition in the sub-query statement by the computing engine.
Further, the method for returning the query result to the user after the computation of the processing end comprises the following steps:
and the processing terminal returns the query result to the user after calculating according to a preset rule according to the dimension value returned by each sub-query statement.
Preferably, the step of pushing down the sub-query statement including the filter condition to the calculation engine and reading the target data corresponding to each granularity period and the dimension value therein, and after the target data and the dimension value are calculated by the processing end and the query result is returned to the user, the method further includes:
and counting hot data sources in the historical SQL query statement and hot data dimensions in the historical SQL query statement, and pre-extracting dimensional values corresponding to the hot data dimensions for quick calling of subsequent SQL query statements.
Preferably, the step of pushing down the sub-query statement including the filter condition to the calculation engine and reading the target data corresponding to each granularity period and the dimension value therein, and after the target data and the dimension value are calculated by the processing end and the query result is returned to the user, the method further includes:
and caching each sub-query statement, the corresponding target data and the dimension value in the sub-query statement in the database so as to be quickly called from the database when the same sub-query statement appears in the follow-up process.
Compared with the prior art, the SQL optimization query method provided by the invention has the following beneficial effects:
in the SQL optimization query method provided by the invention, the number of storage segments related to target data is required to be pre-counted before a computing engine queries the target data according to an SQL query statement, when the number of the related storage segments is larger than a threshold value, a fusing mechanism of the computing engine can be triggered if the full-scale query is directly performed, so the invention adopts the scheme for the situation that the query cycle corresponding to the sub-query statement is divided into a plurality of granularity cycles, simultaneously, in order to avoid the problem that the query result cannot be output due to different target data dimensions corresponding to different sub-query statements, the target data dimensions corresponding to a plurality of sub-query statements are required to be summarized as new adding conditions into the filtering conditions of each sub-query statement before the query, after the operation is completed, the sub-query statements containing the filtering conditions are finally pushed down to the computing engine to read the target data corresponding to each granularity cycle and the dimension values therein, and returning a query result to the user after the computation of the processing terminal.
Therefore, the method and the device not only solve the problem that a fusing mechanism of a calculation engine is easy to trigger in the process of full-scale query, but also ensure that the dimensions of the target data corresponding to each sub-query statement are uniform, and ensure that the query result can be output.
A second aspect of the present invention provides an apparatus for SQL optimization query, which is applied to the method for SQL optimization query according to the foregoing technical solution, and the apparatus includes:
the statement analyzing unit is used for counting the number of storage segments related to target data when an SQL query statement is analyzed to be a UNION set type query statement, and the UNION set type query statement comprises a plurality of sub query statements;
the identification processing unit is used for splitting the query cycle corresponding to the sub-query statement into a plurality of granularity cycles when the number of the related storage segments is larger than a threshold value, summarizing target data dimensions corresponding to the sub-query statements and adding the summarized target data dimensions as new conditions into the filter conditions of the sub-query statements;
and the query unit is used for pushing down the sub-query statements containing the filter conditions to a calculation engine, reading the target data corresponding to each granularity period and the dimension value in the target data, calculating by the processing end and returning a query result to the user.
Compared with the prior art, the beneficial effects of the SQL optimizing query device provided by the invention are the same as those of the SQL optimizing query method provided by the technical scheme, and are not described herein again.
A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described SQL optimized query method.
Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the invention are the same as the beneficial effects of the SQL optimization query method provided by the technical scheme, and the detailed description is omitted here.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a partial schematic flow chart of an SQL optimization query method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a full flow of an SQL optimization query method in the embodiment of the present invention;
FIG. 3 is a diagram illustrating an example of an SQL query optimization method according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, the present embodiment provides an SQL optimizing query method, including:
when the SQL query statement is analyzed to be a UNION set type query statement, counting the number of storage segments related to target data, wherein the UNION set type query statement comprises a plurality of sub query statements; when the number of related storage segments is larger than a threshold value, splitting the query cycle corresponding to the sub-query statement into a plurality of granularity cycles, summarizing target data dimensions corresponding to the sub-query statements, and adding the summarized target data dimensions as new conditions into the filtering conditions of the sub-query statement; and pushing the sub-query statements containing the filter conditions down to a computing engine, reading the target data corresponding to each granularity period and the dimension value in the target data, and returning a query result to the user after the target data is computed by the processing end.
In the SQL optimized query method provided in this embodiment, the number of storage segments related to target data needs to be pre-counted before a computing engine queries the target data according to an SQL query statement, and when the number of storage segments related to the target data is greater than a threshold, if a full-scale query is directly performed, a fusing mechanism of the computing engine may be triggered, so the embodiment adopts a scheme for such a situation that a query cycle corresponding to a sub-query statement is split into multiple granularity cycles, and meanwhile, in order to avoid a problem that query results cannot be output due to different target data dimensions corresponding to different sub-query statements, a target data dimension summary corresponding to multiple sub-query statements needs to be added as a new condition to a filtering condition of each sub-query statement before querying, after the operation is completed, the sub-query statement including the filtering condition is finally pushed down to the computing engine to read the target data corresponding to each granularity cycle and a dimension value therein, and returning a query result to the user after the computation of the processing terminal.
Therefore, the problem that a computing engine fusing mechanism is easily triggered in the full-scale query process is solved, the dimension of the target data corresponding to each sub-query statement is uniform, and the query result can be output.
In specific implementation, the query statement of the UNION set type is usually represented as UNION ALL, and includes a plurality of sub-query statements, such as a first periodic sub-query statement and a second periodic sub-query statement, and the filter condition (limit condition) includes a query metric, a dimension order, a target data query quantity, and the like, in addition to the new addition condition. The computing engine is a pipeline, the data source is a database, the storage segment is a segment, and the processing end is a spark end, wherein the target data query quantity in the filtering condition is mainly used for limiting the data reading quantity of the computing engine target, so as to avoid triggering a fusing mechanism.
In the above embodiment, when the number of the storage segments is greater than the threshold, the method for splitting the query cycle corresponding to the sub-query statement into multiple granularity cycles includes:
extracting a query cycle corresponding to each sub-query statement, and averagely dividing each query cycle into a plurality of granularity cycles according to preset granularity.
Illustratively, the SQL query statement is composed of an inner layer statement and an outer layer statement, taking a circle ratio cycle query as an example, the inner layer statement includes a first cycle sub-query statement and a second cycle sub-query statement, that is, a current cycle sub-query statement and a historical cycle sub-query statement, the inner layer statement mainly functions to read target data corresponding to the first cycle sub-query statement and the second cycle sub-query statement from the druid, and the outer layer statement mainly functions to perform corresponding circle ratio calculation on target data dimension values read by the inner layer statement at the spark end.
In specific implementation, please refer to fig. 2, the logic plan tree is obtained by parsing the SQL query statement, metadata information that can be used for clipping is extracted, for example, the metadata information includes a storage segment query instruction, a target data dimension query instruction, a target data measurement query instruction, a target data cycle query instruction, and the like, which are related to the target data, and then whether the number of storage segments related to the target data in the query cycle is greater than a threshold is checked, if the number of storage segments is not greater than the threshold, the whole sub-query of the SQL query statement is pushed down to the druid for full-scale query by adopting the prior art, and if the number of storage segments is greater than the threshold, the query cycle is averagely split into multiple granularity cycles for concurrent query, so as to avoid triggering. For example, the minimum granularity is day when the database is ingested, and if the current SQL query statement needs to query data for one month, the query period for one month is split into 30 granularity periods. And then, sorting the SQL query statements corresponding to each granularity period, wherein the purpose is to enable the SQL query statements corresponding to each granularity period to only retain required target dimension data and filter out other irrelevant derived measurement data. After the SQL information corresponding to the granularity periods is sorted, the SQL information is converted into json statements which can be received by the druid, then the druid is requested to ask for target data, after the druid queries the target data corresponding to each granularity period, the dimension set corresponding to each granularity period and the corresponding dimension value are returned, then the data sets returned by all the granularity periods are merged, and a data set which is not more than the query quantity of the target data is obtained through a selection sorting algorithm.
Illustratively, referring to fig. 3, the result of increase of the same-ring ratio in this month and the previous month is summarized through SQL query statements, and the number of target data queries in the filter condition is set to 500 pieces of data. If the full data query with a period of one month is directly performed at the druid, and there are 2 billion pieces of target data in a large druid table, in this embodiment, one month is split into 30 granularity periods, each granularity period corresponds to approximately 10 million pieces of target data, the data sets corresponding to each granularity period in the current month and the previous month are concurrently read by the druid, because the query number of the target data in the filtering condition is 500, only the first 500 pieces of target data are filtered from each granularity period, then the target data in the current month and the previous month belonging to the same granularity period are aggregated into 500 pieces of target data, then the 500 pieces of target data in each granularity period are scanned by the druid, and after the filtering is performed again, the finally obtained 500 query results are returned to the user. Compared with the scheme of full query adopted in the prior art, the embodiment only sets the query quantity of the target data in the filtering condition to be 500 items according to needs, and then only calculates and reads the index data of 500 items in the front of the sequence according to the filtering condition in the drive, so that the calculation pressure of the drive is greatly reduced.
In the above embodiment, the method for adding the collected target data dimensions corresponding to the multiple sub-query statements as the new conditions into the filter conditions of the sub-query statements includes:
identifying the target data dimension corresponding to each sub-query statement, and when the target data dimensions of the sub-query statements are consistent, adding the target data dimension of any sub-query statement as a newly added condition into the filtering condition of each sub-query statement; and when the target data dimensions of the sub-query statements are not consistent, summarizing and combining the target data dimensions of the sub-query statements respectively, and adding the combined target data dimensions as new conditions into the filtering conditions of each sub-query statement.
In specific implementation, in order to prevent the problem that the spark end cannot perform corresponding operation according to the target data acquired by the pipeline based on the first cycle and the target data acquired based on the second cycle due to the fact that the target data dimensions corresponding to the first cycle and the second cycle are not consistent, if the target data dimension number corresponding to the first cycle is 100 and the target data dimension number corresponding to the second cycle is 150, the target data dimension numbers corresponding to the first cycle and the second cycle need to be merged together to be used as a newly added condition to be added to the filtering condition of each sub query statement, so that the merged target data dimension can cover the dimensions of the target data corresponding to the first cycle and the second cycle.
In the above embodiment, the method for concurrently reading, by the compute engine, the target data and the dimension value thereof corresponding to each granularity cycle includes:
and respectively querying the target data meeting the target data query quantity condition from the corresponding storage segment of the corresponding data source and extracting the dimension value of the target data according to each sub-query statement and the filter condition in the sub-query statement by the computing engine.
In the above embodiment, the method for returning the query result to the user after computation by the processing end includes:
and the processing terminal returns the query result to the user after calculating according to the preset rule according to the dimension value returned by each sub-query statement. For example, for the calculation that the preset rule is the growth rate, the dimension value acquired in the first period may be divided by the dimension value acquired in the second period to obtain the growth rate result.
In the above embodiment, the step of pushing down the sub-query statement including the filter condition to the calculation engine and reading the target data and the dimension value in the target data corresponding to each granularity period, and after the target data and the dimension value are calculated by the processing end and the query result is returned to the user, the method further includes:
and counting hot data sources in the historical SQL query statement and hot data dimensions in the historical SQL query statement, and pre-extracting dimensional values corresponding to the hot data dimensions for quick calling of subsequent SQL query statements.
In specific implementation, statistics is continuously carried out in the process of executing the SQL query statement, a most frequently used data source and a dimension field of the data source are used, a pipeline statement is dynamically constructed in the valley period of query, the dimension and the dimension value are pre-extracted, the dimension and the dimension value are pre-stored like a materialized view, and then the fact that if the query requirements of the same data source and the dimension value exist, the data can be quickly extracted from the materialized view without requesting the pipeline to fetch corresponding dimension value set data is discovered, so that quick calling is achieved.
In the above embodiment, the step of pushing down the sub-query statement including the filter condition to the calculation engine and reading the target data and the dimension value in the target data corresponding to each granularity period, and after the target data and the dimension value are calculated by the processing end and the query result is returned to the user, the method further includes:
and caching each sub-query statement, the corresponding target data and the dimension value in the sub-query statement in the database so as to be quickly called from the database when the same sub-query statement appears in the follow-up process.
Specifically, if the current SQL query statement includes the query result of the history cycle, the current SQL query statement query cycle may be split, the query result of the history cycle included therein may be directly retrieved from the cache library, and the current history cycle not included therein is obtained from the druid query at the earlier stage. Such as: dimension data from No. 1 to No. 20 are cached, if the current query is the dimension from No. 1 to No. 30, only the granularity period from No. 21 to No. 30 needs to be pressed down to query the drive, and the pressure of the drive is greatly reduced.
After the target data after the current early stage and the historical stage are combined is obtained, the sub-query statements corresponding to the SQL query statements can be sorted:
1. sorting the group by dimension fields of the sub-query statements;
2. sorting the derived indexes needing to be calculated by the sub query sentences;
3. summarizing target data dimensions corresponding to the sub-query statements and adding the summarized target data dimensions as newly added conditions into the filter conditions of the sub-query statements;
after each sub-query statement is sorted, the sub-query statement is converted into a json statement of a drive computing engine, and because a filter condition is added, the dimension information to be queried by the sub-query statement is larger than the filter condition. That is, the drive does not need to scan all data in the query period, so that the problems of fusing and query timeout abnormality do not occur, and the query pressure of the drive is reduced.
And because the data volume returned by the pipeline is filtered, the spark end does not need to receive all data in the query period, the data processing volume of the spark end is reduced, and the overall efficiency is greatly improved.
Example two
The present embodiment provides an SQL optimizing query apparatus, including:
the statement analyzing unit is used for counting the number of storage segments related to target data when an SQL query statement is analyzed to be a UNION set type query statement, and the UNION set type query statement comprises a plurality of sub query statements;
the identification processing unit is used for splitting the query cycle corresponding to the sub-query statement into a plurality of granularity cycles when the number of the related storage segments is larger than a threshold value, summarizing target data dimensions corresponding to the sub-query statements and adding the summarized target data dimensions as new conditions into the filter conditions of the sub-query statements;
and the query unit is used for pushing down the sub-query statements containing the filter conditions to a calculation engine, reading the target data corresponding to each granularity period and the dimension value in the target data, calculating by the processing end and returning a query result to the user.
Compared with the prior art, the beneficial effects of the SQL optimization query device provided by the embodiment of the invention are the same as the beneficial effects of the SQL optimization query method provided by the first embodiment, and are not described herein again.
EXAMPLE III
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the SQL optimized query method are executed.
Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the embodiment are the same as the beneficial effects of the SQL optimization query method provided by the above technical scheme, and are not described herein again.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the invention may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the embodiment, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. An SQL optimization query method is characterized by comprising the following steps:
when the SQL query statement is analyzed to be a UNION set type query statement, counting the number of storage segments related to target data, wherein the UNION set type query statement comprises a plurality of sub query statements;
when the number of the related storage segments is larger than a threshold value, splitting the query cycle corresponding to the sub-query statement into a plurality of granularity cycles, summarizing target data dimensions corresponding to the sub-query statements, and adding the summarized target data dimensions as new conditions into the filtering conditions of the sub-query statement;
and pushing the sub-query statements containing the filter conditions down to a computing engine, reading the target data corresponding to each granularity period and the dimension value in the target data, and returning a query result to a user after the target data is computed by a processing end.
2. The method of claim 1, wherein splitting the query cycle corresponding to the sub-query statement into a plurality of granularity cycles when the number of the storage segments is greater than a threshold value comprises:
extracting a query cycle corresponding to each sub-query statement, and averagely dividing each query cycle into a plurality of granularity cycles according to preset granularity.
3. The method according to claim 1 or 2, wherein the method for adding the collected target data dimensions corresponding to the plurality of sub-query statements as the added conditions into the filter conditions of the sub-query statements comprises:
identifying a target data dimension corresponding to each sub-query statement, and when the target data dimensions of the sub-query statements are consistent, adding the target data dimension of any sub-query statement as a newly added condition into the filtering condition of each sub-query statement;
and when the target data dimensions of the sub-query statements are not consistent, summarizing and combining the target data dimensions of the sub-query statements respectively, and adding the combined target data dimensions as newly added conditions into the filtering conditions of each sub-query statement.
4. The method of claim 3, wherein the UNION set type query statements include at least a first periodic sub-query statement and a second periodic sub-query statement, and wherein the filter conditions include a query metric, a dimension ordering, and a target data query quantity in addition to the new addition condition.
5. The method of claim 4, wherein the method for the compute engine to concurrently read the target data and the dimension value thereof corresponding to each granularity period comprises:
and respectively querying the target data meeting the target data query quantity condition from the corresponding storage segment of the corresponding data source and extracting the dimension value of the target data according to each sub-query statement and the filter condition in the sub-query statement by the computing engine.
6. The method of claim 5, wherein the method for returning the query result to the user after the processing end computing comprises:
and the processing terminal returns the query result to the user after calculating according to a preset rule according to the dimension value returned by each sub-query statement.
7. The method according to claim 1, wherein the pushing down the sub-query statement containing the filter condition to the compute engine and reading the target data and the dimension value in the target data corresponding to each granularity period in step, and after the query result is returned to the user after the computation by the processing end, the method further comprises:
and counting hot data sources in the historical SQL query statement and hot data dimensions in the historical SQL query statement, and pre-extracting dimensional values corresponding to the hot data dimensions for quick calling of subsequent SQL query statements.
8. The method according to claim 1, wherein the pushing down the sub-query statement containing the filter condition to the compute engine and reading the target data and the dimension value in the target data corresponding to each granularity period in step, and after the query result is returned to the user after the computation by the processing end, the method further comprises:
and caching each sub-query statement, the corresponding target data and the dimension value in the sub-query statement in the database so as to be quickly called from the database when the same sub-query statement appears in the follow-up process.
9. An SQL optimized query device, comprising:
the statement analyzing unit is used for counting the number of storage segments related to target data when an SQL query statement is analyzed to be a UNION set type query statement, and the UNION set type query statement comprises a plurality of sub query statements;
the identification processing unit is used for splitting the query cycle corresponding to the sub-query statement into a plurality of granularity cycles when the number of the related storage segments is larger than a threshold value, summarizing target data dimensions corresponding to the sub-query statements and adding the summarized target data dimensions as new conditions into the filter conditions of the sub-query statements;
and the query unit is used for pushing down the sub-query statements containing the filter conditions to a calculation engine, reading the target data corresponding to each granularity period and the dimension value in the target data, calculating by the processing end and returning a query result to the user.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1 to 8.
CN202011048287.7A 2020-09-29 2020-09-29 SQL optimization query method and device Active CN112286961B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011048287.7A CN112286961B (en) 2020-09-29 2020-09-29 SQL optimization query method and device
CA3132004A CA3132004A1 (en) 2020-09-29 2021-09-27 Sql query optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011048287.7A CN112286961B (en) 2020-09-29 2020-09-29 SQL optimization query method and device

Publications (2)

Publication Number Publication Date
CN112286961A true CN112286961A (en) 2021-01-29
CN112286961B CN112286961B (en) 2022-11-18

Family

ID=74422478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011048287.7A Active CN112286961B (en) 2020-09-29 2020-09-29 SQL optimization query method and device

Country Status (2)

Country Link
CN (1) CN112286961B (en)
CA (1) CA3132004A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765286A (en) * 2021-02-01 2021-05-07 广州海量数据库技术有限公司 Query method and device based on relational database
CN113448983A (en) * 2021-07-15 2021-09-28 中国银行股份有限公司 Knowledge point processing method, knowledge point processing device, knowledge point processing server, knowledge point processing medium and knowledge point processing product

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453732B (en) * 2023-12-25 2024-03-01 智业软件股份有限公司 CDSS doctor's advice data query optimization method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140095469A1 (en) * 2012-09-28 2014-04-03 Sap Ag Optimization of database query
CN110096489A (en) * 2019-04-30 2019-08-06 阿里巴巴集团控股有限公司 A kind of data query method, system, device and electronic equipment
CN110515973A (en) * 2019-08-30 2019-11-29 上海达梦数据库有限公司 A kind of optimization method of data query, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140095469A1 (en) * 2012-09-28 2014-04-03 Sap Ag Optimization of database query
CN110096489A (en) * 2019-04-30 2019-08-06 阿里巴巴集团控股有限公司 A kind of data query method, system, device and electronic equipment
CN110515973A (en) * 2019-08-30 2019-11-29 上海达梦数据库有限公司 A kind of optimization method of data query, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765286A (en) * 2021-02-01 2021-05-07 广州海量数据库技术有限公司 Query method and device based on relational database
CN113448983A (en) * 2021-07-15 2021-09-28 中国银行股份有限公司 Knowledge point processing method, knowledge point processing device, knowledge point processing server, knowledge point processing medium and knowledge point processing product
CN113448983B (en) * 2021-07-15 2024-01-30 中国银行股份有限公司 Knowledge point processing method, device, server, medium and product

Also Published As

Publication number Publication date
CA3132004A1 (en) 2022-03-29
CN112286961B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN112286961B (en) SQL optimization query method and device
CN109241093B (en) Data query method, related device and database system
US10372706B2 (en) Tracking and maintaining expression statistics across database queries
US8396852B2 (en) Evaluating execution plan changes after a wakeup threshold time
CN108376143B (en) Novel OLAP pre-calculation system and method for generating pre-calculation result
EP2605158A1 (en) Mixed join of row and column database tables in native orientation
US20120317094A1 (en) Sideways Information Passing
US8935233B2 (en) Approximate index in relational databases
EP3329393A1 (en) Materializing expressions within in-memory virtual column units to accelerate analytic queries
CN110109923A (en) Storage method, analysis method and the device of time series data
US9218394B2 (en) Reading rows from memory prior to reading rows from secondary storage
US11294816B2 (en) Evaluating SQL expressions on dictionary encoded vectors
CN105069117A (en) Data flow efficiency improving method based on storage process
Chambi et al. Optimizing druid with roaring bitmaps
Le-Phuoc Operator-aware approach for boosting performance in RDF stream processing
CN110704472A (en) Data query statistical method and device
US20200257684A1 (en) Higher-order data sketching for ad-hoc query estimation
Alghazali et al. Development of Intelligent Tools for Detecting Resource-intensive Database Queries
CN110888909B (en) Data statistical processing method and device for evaluation content
Liu et al. Optimizing llm queries in relational workloads
CN112148719B (en) Data processing query method and device based on OLAP pre-calculation model
Morfonios et al. Supporting the data cube lifecycle: the power of ROLAP
CN113297270A (en) Data query method and device, electronic equipment and storage medium
Xie et al. Cool, a COhort OnLine analytical processing system
CN109684419B (en) Big data-based data cube processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant