CN114969101B - SQL statement processing method and device - Google Patents

SQL statement processing method and device Download PDF

Info

Publication number
CN114969101B
CN114969101B CN202210819206.1A CN202210819206A CN114969101B CN 114969101 B CN114969101 B CN 114969101B CN 202210819206 A CN202210819206 A CN 202210819206A CN 114969101 B CN114969101 B CN 114969101B
Authority
CN
China
Prior art keywords
query
target
statement
sub
grouping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210819206.1A
Other languages
Chinese (zh)
Other versions
CN114969101A (en
Inventor
施文俊
王国平
朱涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Oceanbase Technology Co Ltd
Original Assignee
Beijing Oceanbase Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Oceanbase Technology Co Ltd filed Critical Beijing Oceanbase Technology Co Ltd
Priority to CN202210819206.1A priority Critical patent/CN114969101B/en
Publication of CN114969101A publication Critical patent/CN114969101A/en
Application granted granted Critical
Publication of CN114969101B publication Critical patent/CN114969101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a method and a device for processing an SQL statement, wherein in the method for processing the SQL statement, a target SQL statement to be processed is obtained, and the target SQL statement to be processed comprises a main query statement and a sub-query statement. The sub-query statement constructs a target view based on the target table by including a grouping aggregation function in its query selection clause and a target table in its query source table clause. A query source table clause packet for the main query statement. The equivalent rewriting includes at least replacing the grouping aggregation function with the corresponding window function and deleting the target table from the query source table clause of the main query statement.

Description

SQL statement processing method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of databases, and in particular, to a method and an apparatus for processing an SQL statement.
Background
With the development of computers and databases in recent years, SQL (structured Query language) has been widely used as a computer programming language dedicated to databases, and has become the most important and most common relational database Query language.
In the existing database query technology, the execution of some SQL statements requires the establishment of a connection between two objects (e.g., two views or one view and one data table) directly or indirectly containing the same data table, and the establishment of the connection requires repeated scanning of the data table, thereby generating unnecessary overhead, which makes it impossible for the query optimizer to make an optimal execution plan.
Therefore, it is desirable to provide a processing scheme for SQL statements so that the query optimizer can formulate a more efficient execution plan based on the processed SQL statements.
Disclosure of Invention
One or more embodiments of the present specification describe a method and an apparatus for processing an SQL statement, which are helpful for a query optimizer to make a more efficient execution plan.
In a first aspect, a method for processing an SQL statement is provided, which includes:
acquiring target SQL statements to be processed, wherein the target SQL statements comprise main query statements and sub query statements; the sub-query statement comprises a grouping aggregation function in a query selection clause thereof and a target table in a query source table clause thereof, so that a target view based on the target table is constructed; the grouping aggregation function is used for carrying out target aggregation operation; a query source table clause of the main query statement comprises the target view and the target table;
equivalently rewriting the target SQL sentence to obtain a rewritten SQL sentence; the equivalent rewriting at least comprises replacing the grouping aggregation function by using a corresponding window function, and deleting the target table from a query source table clause of the main query statement; the window function is used for carrying out the target aggregation operation on each data row in the window to which the data row belongs according to each data row of the input table, and recording the corresponding aggregation result into the corresponding row of the newly added column.
In a second aspect, a method for processing an SQL statement is provided, including:
acquiring a target SQL statement to be processed, wherein the target SQL statement comprises a main query statement and two sub-query statements; the two sub-query sentences respectively comprise two different grouping aggregation functions in respective query selection clauses and target tables in respective query source table clauses, so that two different views based on the target tables are constructed; the query source table clause of the main query statement comprises the two views;
carrying out equivalent rewriting on the target SQL statement; the equivalent rewriting at least comprises that for a first sub-query statement in the two sub-query statements, a second grouping aggregation function corresponding to a second sub-query statement is added to a query selection clause in the two sub-query statements to obtain an updated first sub-query statement; generating a target sub-query statement for constructing a target view based on the window function corresponding to the second aggregation function and the updated first sub-query statement, and replacing the two sub-query statements with the target sub-query statement;
and the window function is used for carrying out aggregation operation corresponding to the second grouped aggregation function on each data row in the window to which the window function belongs aiming at each data row of the input table, and recording the corresponding aggregation result into the corresponding row of the newly added column.
In a third aspect, a device for processing an SQL statement is provided, which includes:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring target SQL statements to be processed, and the target SQL statements comprise main query statements and sub query statements; the sub-query statement comprises a grouping aggregation function in a query selection clause thereof and a target table in a query source table clause thereof, so that a target view based on the target table is constructed; the grouping aggregation function is used for carrying out target aggregation operation; a query source table clause of the main query statement comprises the target view and the target table;
the rewriting unit is used for equivalently rewriting the target SQL statement to obtain a rewritten SQL statement;
the rewriting unit includes:
a replacement module for replacing the grouping aggregation function with a corresponding window function;
a deletion module for deleting the target table from the query source table clause of the main query statement; the window function is used for carrying out the target aggregation operation on each data row in the window to which the data row belongs according to each data row of the input table, and recording the corresponding aggregation result into the corresponding row of the newly added column.
In a fourth aspect, a device for processing an SQL statement is provided, including:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a target SQL statement to be processed, and the target SQL statement comprises a main query statement and two sub-query statements; the two sub-query sentences respectively comprise two different grouping aggregation functions in respective query selection clauses and comprise target tables in respective query source table clauses, so that two different views based on the target tables are constructed; the query source table clause of the main query statement comprises the two views;
the rewriting unit is used for equivalently rewriting the target SQL statement;
the rewriting unit includes:
the adding module is used for adding a second grouping aggregation function corresponding to a second sub-query statement to a query selection sub-statement in a first sub-query statement of the two sub-query statements to obtain an updated first sub-query statement;
a generating module, configured to generate a target sub-query statement for constructing a target view based on the window function corresponding to the second aggregation function and the updated first sub-query statement;
a replacement module for replacing the two sub-query statements with the target sub-query statement;
and the window function is used for carrying out aggregation operation corresponding to the second grouped aggregation function on each data row in the window to which the window function belongs aiming at each data row of the input table, and recording the corresponding aggregation result into the corresponding row of the newly added column.
In a fifth aspect, there is provided a computer storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.
In a sixth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that when executing the executable code implements the method of the first or second aspect.
In the method and apparatus for processing an SQL statement provided in one or more embodiments of the present specification, when a query source table clause of a main query statement of a target SQL statement includes a target table and a target view constructed based on the target table, and a query selection clause of a sub-query statement used for constructing the target view includes a grouping aggregation function, equivalent rewriting is performed on the target SQL statement by replacing the grouping aggregation function with a window function. Therefore, connection between the target view containing the target table and the target table can be eliminated, and repeated scanning of the target table can be avoided. In other words, the scheme can reduce the scanning times of the target table and also reduce the connection overhead, which is beneficial for a query optimizer to make a more efficient execution plan.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings used in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the description below are only some embodiments of the present specification, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 illustrates a flow diagram of a method of processing an SQL statement, according to one embodiment;
FIG. 3 illustrates a flow diagram of a method of processing an SQL statement according to another embodiment;
FIG. 4 illustrates a flow diagram of a method of processing an SQL statement according to yet another embodiment;
FIG. 5 illustrates a flow diagram of a method of processing an SQL statement according to yet another embodiment;
FIG. 6 shows a schematic diagram of a processing device for an SQL statement according to one embodiment;
fig. 7 shows a schematic diagram of a processing means of an SQL statement according to another embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Before describing the solutions provided by the embodiments of the present specification, a brief description is made of the components of the SQL statement.
Generally, an SQL statement essentially consists of two parts: a query SELECT clause (also known as a SELECT clause) and a query source table clause (also known as a FROM clause).
The query selection clause may include a column in a data table or a view, or may include a grouping aggregation function, etc. The grouping aggregation function is used for performing aggregation operation on one column or several columns in the data table or the view. This may be, for example, a SUM function (SUM ()), an averaging function (AVG ()), a maximum value function (MAX ()), a minimum value function (MIN ()), a COUNT function (COUNT ()), or the like.
The query source table clause may include objects such as a data table or a view (described later). It should be understood that when the query source table clause includes two objects, such as two data tables, two views, or one data table and one view, a connection needs to be established between the two objects.
It should be noted that, by establishing a connection between the two objects, the two objects can be merged to form a new result set. Based on the resulting set of results formed, database operations such as query, add, delete, and change may be implemented.
With respect to the above two objects, a connection can be established by a designated connection type. It should be noted that, when a connection type is also specified, the syntax format of the query source table clause may be expressed as: data table 1 < connection type > JOIN data table 2. Of course, the data table here can also be a view.
As for the above connection type, it may be, for example, an internal connection (INNER JOIN), an external connection (OUTER JOIN), or a CROSS connection (CROSS JOIN). Different connection types can be subdivided, for example, an interconnection can also include an equal connection, a non-equal connection, a natural connection, and the like; the external connection operation may also include left connection, right connection, full connection, etc.
Further, regarding the connection of the above-described two objects, a corresponding connection condition (also referred to as a query condition) may also be specified. In one example, the connection condition may be included in a query source table clause. In another example, the join condition may also be included in a query condition clause (also called WHERE clause) of an SQL statement.
As for the above-described connection conditions, which may also include a plurality of kinds, common connection conditions may be classified as equal-value connection conditions (i.e., the relationship in the connection conditions is an equal-value relationship, such as t1.C1= t2. C2) or non-equal-value connection conditions. In some embodiments, the connection condition may be a simple condition or a complex condition. The complex condition is a combination of a plurality of simple conditions, AND for example, a plurality of simple conditions may be combined by using a logical relationship such as "AND", "OR" to form a complex condition.
It should be noted that the above description is only for two objects as an example, and the corresponding connection type and connection condition are described. In practical applications, the query source table clause may also include more than two objects. When a plurality of objects are included, two of the objects may be connected, and then the result set formed may be connected to a third object, and so on. Based on this, in some embodiments, the query source table clause may also be referred to as a multi-table join clause. For simplicity of description, the connection of two objects is hereinafter described as an example.
It should be understood that, in some scenarios, the connection of the objects has strong data filtering property, and the data volume of the result set after connection is relatively small, because the connection of the objects can filter the data in the objects according to the connection condition to screen out a useful result set.
The following description of the drawings.
A view may be understood as a data table that is a visualization of a result set based on a database query statement. A view contains rows and columns (also called fields), and the columns in the view are columns in a real data table from one or more databases.
The type of view may be various, such as a standard view, an inline view, a materialized view, and the like. Wherein, the inline view can be understood as a sub-query in the query source table clause of the SQL statement. It should be understood that in the query source table clause, a data table of the data to be queried may be specified, and that in addition to the data table, a view (subquery) may be possible. Since the inline view may replace a table in a query, in some embodiments, the inline view may also be referred to as a derived table.
The scheme provided by the embodiment of the present disclosure mainly aims at that a query source table clause of a main query statement includes a target table and a target view constructed based on the target table, and a query selection clause of a sub-query statement used for constructing the target view includes an SQL statement of a grouping aggregation function for equivalent rewriting. Because the query select clause of the sub-query clause includes a grouping aggregation function, it indicates that the target table needs to be scanned once first. In addition, the query source table clause of the main query statement includes a target table and a target view, which indicates that a connection needs to be established for the target table and the target view, and the establishment of the connection needs to scan the target table again or multiple times. It can be seen that the execution of this SQL statement requires repeated scans of the data table, requiring its equivalent rewrite.
In particular, the SQL statement can be equivalently rewritten by replacing the packet aggregation function with a window function.
Taking SQL statements as an example, the SQL syntax format of the window function can be expressed as: < Window function name > OVER (PARTITION BY < Window PARTITION parameter > ORDER BY < sort parameter >, where the window PARTITION parameter is used to PARTITION the window against the input table, and the sort parameter is used to sort rows of data within the same window.
Illustratively, < window division parameter > and < sort parameter > may be column names in a data table or view for dividing windows or sorting according to the column names. The < window function name > may be an aggregated window function such as SUM, COUNT, MAX, MIN, etc.
It should be noted that, when the database query is performed BY using the window function, the grouping (PARTITION BY) and sorting (ORDER BY) functions can be simultaneously implemented without reducing the number of rows of the input table.
This scheme will be described in detail below.
Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. In fig. 1, the query rewrite module may first receive the target SQL statement. Then, the query rewrite module may analyze the target SQL statement and determine whether the target SQL statement needs to be processed based on predefined determination rules. The determination rule is specifically used for determining whether unnecessary overhead is generated for the execution of the SQL statement, for example, whether repeated scanning of the data table is required; and/or whether a connection needs to be established for two objects that directly or indirectly contain the same data table, etc.
And under the condition that the target SQL statement needs to be processed, equivalently rewriting the target SQL statement to obtain a rewritten SQL statement. In one example, the target SQL statement can be equivalently rewritten using a window function.
Then, the query rewrite module may send the rewritten SQL statement to the query optimizer, so that the query optimizer may formulate a target execution plan of the query database based on the rewritten SQL statement.
It should be understood that fig. 1 is only an exemplary illustration, and in practical applications, the query rewrite module may also be integrated into a query optimizer, which is not limited in this specification.
FIG. 2 illustrates a flow diagram of a method of processing an SQL statement, according to one embodiment. The method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. As shown in fig. 2, the method may include the following steps.
Step 202, obtaining a target SQL statement to be processed, which includes a main query statement and a sub query statement.
It should be noted that the main query/sub-query statement herein at least includes a query selection clause (also referred to as a SELECT clause) and a query source table clause (also referred to as a FROM clause).
The sub-query statement is used to construct a target view based on the target table. In particular, the above-described sub-query statement constructs a target table-based target view, also referred to as an aggregate view, by including the group aggregation function in its query selection clause and the target table in its query source table clause.
The packet aggregation function is used for performing a target aggregation operation, and may be, for example, one of the following: a summation function (SUM ()), an averaging function (AVG ()), a maximum function (MAX ()), a minimum function (MIN ()), a COUNT function (COUNT ()), and the like.
In one specific example, the grouping and aggregation function is used for performing a target aggregation operation on a target column in an input table (e.g., a target table). For example, when the target column is C1, the summation function can be expressed as: SUM (C1), similarly, corresponding target columns may also be added for other grouping aggregation functions.
It should be appreciated that when constructing the target view based on the above sub-query statements, the entire target table needs to be scanned once because the target aggregation operation needs to be performed.
In addition, the query source table clause of the main query statement comprises a target view and a target table. That is, a connection needs to be established between the target view (i.e., aggregated view) and the target table.
In this specification, a connection of two objects that directly or indirectly contains the same data table is referred to as an aggregate connection. Thus, the connection of the target view to the target table is an aggregated connection.
It should be appreciated that to establish the aggregated connection described above, the target table may need to be scanned again or multiple times (the specific number of scans being determined based on the connection type). Thus, the target table needs to be scanned repeatedly for the execution of the target SQL statement. However, repeated scanning of the data table may generate unnecessary overhead, and especially when the data size of the data table is large, computer resources may be greatly wasted, so that the target SQL statement needs to be processed.
In one example, the target SQL statement may be as follows:
select V.agg
from (select sum(t1.c1) agg
from t1) V, t1;
in the target SQL statement, the statement included in the parentheses is a sub-query statement, and the SELECT clause of the sub-query statement includes a grouping aggregation function: sum (t 1. C1), and the FROM clause includes the target table: t1. By this sub-query statement, an aggregated view V based on the target table t1 can be constructed.
In addition, in the target SQL statement, the statement except the sub-query statement is the main query statement, and the FROM sub-statement of the main query statement includes the aggregated view V and the target table t1.
In the above example, the target table t1 needs to be scanned once when constructing the aggregated view V. Thereafter, in establishing a connection (e.g., a Cartesian product connection) of the aggregated view V to the target table t1, the target table t1 needs to be scanned multiple times. Further, since the aggregated view V is constructed based on the target table t1, the connection of the aggregated view V to the target table t1 described above is an aggregated connection.
And step 204, equivalently rewriting the target SQL statement to obtain a rewritten SQL statement.
The equivalent rewriting includes at least replacing the grouping aggregation function with the corresponding window function and deleting the target table from the query source table clause of the main query statement. The window function is used for performing target aggregation operation on each data row in the window to which the data row belongs, and recording a corresponding aggregation result into a corresponding row of a newly added column. It should be understood that the entire target table is treated as one window without specifying window division parameters.
Taking the grouping aggregation function as SUM () for example, the corresponding window function is SUM () OVER (). Taking the packet aggregation function as COUNT () for example, the corresponding window function is COUNT () OVER (), and so on.
When the grouping aggregation function is used to perform the target aggregation operation on the target column in the input table, the corresponding window function is used to perform the target aggregation operation on the target column of each data row in the window to which the grouping aggregation function belongs, and record the corresponding aggregation result into the corresponding row of the newly added column.
It should be appreciated that after the target aggregation operation is performed for the target column of the input table using the window function, the number of rows of the input table does not change.
It should also be noted that after the target table is deleted from the query source table clause of the main query statement, the aggregate join may be eliminated. That is, the rewritten SQL statement only needs to be scanned once for the target table, and thus the execution efficiency of the SQL statement can be improved.
Also taking the target SQL statement in the foregoing example as an example, after it is equivalently rewritten, the rewritten SQL statement may be as follows:
select V.agg
from (select sum(t1.c1) over() agg
from t1) V;
it should be understood that the rewritten SQL statement may be obtained by specifically performing the following equivalent rewriting on the target SQL statement: with the window function: sum (t 1. C1) over () replaces the packet aggregation function: sum (t 1. C1), and deleting target table t1 FROM the FROM clause of the main query statement.
It should be noted that, when the rewritten SQL statement is executed, the aggregated connection does not need to be established, and thus the target table t1 only needs to be scanned once.
The accuracy of the equivalent rewriting method according to this embodiment will be described below with reference to this example.
First, according to the sub-query statement in the target SQL statement in the example of step 202, it can be concluded that the target view V includes only one column, and the column is obtained by summing up the column c1 in the target table t1. Then, a cartesian connection (or a cross connection) is established between the target view V and the target table t1, and the obtained result set is that a column is added for the target table t1, and the summation result of c1 is recorded in each row in the column. The result set is the same as view V constructed by the sub-query statements in the rewritten SQL statement in the example of step 204. That is, the FROM clause of the target SQL statement and the rewritten SQL statement are the same, so that the target SQL statement and the rewritten SQL statement are equivalent.
In summary, the processing method for the SQL statement provided in the embodiment of the present specification may eliminate aggregation connection in the SQL statement based on the window function, thereby avoiding unnecessary overhead, and further facilitating the query optimizer to formulate a more efficient execution plan.
FIG. 3 shows a flow diagram of a method of processing an SQL statement, according to another embodiment. The method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. As shown in fig. 3, the method may include the following steps.
Step 302, obtaining a target SQL statement to be processed, which includes a main query statement and a sub query statement.
The sub-query statement at least includes a query SELECT clause (also called a SELECT clause), a query source table clause (also called an FROM clause), and a grouping clause (also called a GROUP BY clause), and is used for constructing a target view based on the target table.
In particular, the above-described sub-query statement constructs a target table-based target view, also referred to as an aggregated view, by including a grouping aggregation function in its query selection clause, a target table in its query source table clause, and a grouping expression in its grouping clause. The packet expression here is used to determine the packets of the packet aggregation function.
It should be appreciated that when constructing the target view based on the above sub-query statements, the entire target table needs to be scanned once because the target aggregation operation needs to be performed.
In addition, the main query statement at least includes a query SELECT clause (also called a SELECT clause), a query source table clause (also called a FROM clause), and a query condition clause (also called a WHERE clause).
The query source table clause of the main query statement comprises a target view and a target table, and the query condition clause at least comprises equivalent connection conditions set for the target view and the target table.
That is, when executing the main query statement, an aggregate connection needs to be established for the target view and the target table.
It will be appreciated that to establish the aggregated connection described above, the target table needs to be scanned again. Thus, the target table needs to be scanned repeatedly for the execution of the target SQL statement. However, repeated scanning of the data table may generate unnecessary overhead, and especially when the data size of the data table is large, computer resources may be greatly wasted, so that the target SQL statement needs to be processed.
In one example, the target SQL statement may be as follows:
select V.agg,t1.c2
from (select sum(t1.c1) agg,t1.c2
from t1
group by t1.c2) V, t1
where t1.c2 = V.c2;
in the target SQL statement, the statement included in the parentheses is a sub-query statement, and the SELECT clause of the sub-query statement includes a grouping and aggregating function: sum (t 1. C1) and columns of target table t 1: c2, FROM clause includes the target table: t1, and the GROUP BY clause includes a grouping expression: t1.C2. By this sub-query statement, an aggregated view V based on the target table t1 can be constructed.
In addition, in the target SQL statement, the statement except the sub-query statement is the main query statement, the FROM clause of the query statement includes the aggregated view V and the target table t1, and the WHERE clause includes the equal join condition: t1.C2 = v. C2.
In the above example, the target table t1 needs to be scanned once when constructing the aggregated view V. Thereafter, when establishing the aggregation connection of the aggregation view V and the target table t1, the target table t1 needs to be scanned again. I.e. repeated scanning of the target table t1 is required.
And step 304, judging whether the target SQL statement meets the preset rewriting condition.
The preset rewriting condition at least comprises that two expressions on two sides of a medium sign in the equivalent connection condition in the query condition clause of the main query statement are isomorphic, and the group expression in the group clause of the sub query statement is one of the two expressions. Additionally, it may be included that the grouping expression is included in a query selection clause of the sub-query statement.
The above two expressions are isomorphic, meaning that the values determined based on the two expressions are the same. Taking two expressions as a column Ci and a column Cj as an example, isomorphism means that an attribute value corresponding to the column Ci is the same as an attribute value corresponding to the column Cj.
In one specific example, it may be determined whether two columns are isomorphic by determining whether the column names of the two columns are the same.
In the foregoing example, the iso-join condition: t1.C2 = v. C2 the two columns on either side of the equal sign are of the same name, so the two columns are isomorphic. Further, a group expression: t1.C2 is one of t1.C2 and v.c2, and t1.C2 is included in the SELECT clause of the sub-query statement so that the above-described preset rewrite condition is satisfied.
And step 306, carrying out equivalent rewriting on the target SQL statement under the condition that the preset rewriting condition is met, and obtaining the rewritten SQL statement.
The equivalent rewriting includes at least replacing the packet aggregation function with the corresponding window function and deleting the target table from the query source table clause of the main query statement. And taking the grouping expression as a window division parameter of a window function, deleting the grouping clause, deleting the equivalent connection condition, and adding a non-empty constraint aiming at the isomorphic column corresponding to the column in the grouping expression.
It should be understood that when a window division parameter is specified for a window function, then the window of the window function may be determined based on the window division parameter.
Of course, in practical applications, the above equivalent rewriting may further include adding each column in the target table included in the main query statement and the sub-query statement to the query selection clause of the sub-query statement, and replacing each column in the target table included in the main query statement with a corresponding column in the target view.
Also taking the target SQL statement in the foregoing example as an example, after it is equivalently rewritten, the rewritten SQL statement may be as follows:
select V.agg,V.c2
from (select sum(t1.c1) over(partition by t1.c2) agg,t1.c2
from t1) V
where V.c2 is not null and;
it should be understood that the rewritten SQL statement may be obtained by specifically performing the following equivalent rewriting on the target SQL statement: with the window function: sum (t 1. C1) over (partition by t1. C2) replacement packet aggregation function: sum (t 1. C1), and the window division parameter of the window function is: t1.C2. Delete target table t1 FROM the FROM clause of the main query statement, and delete the group clause: group by t1.C2, delete equal join condition: t1.C2 = v.c2 and adds a non-null constraint for v.c2 (the isomorphic column of t1. C2).
It should be noted that after deleting the target table from the query source table clause of the main query statement, the aggregate join may be eliminated. That is, the rewritten SQL statement only needs to be scanned once for the target table, so that the execution efficiency of the SQL statement can be improved.
Of course, in practical applications, with the complexity of the SQL statement, the preset rewrite conditions and the equivalent rewrite method may also be adjusted accordingly, which is not limited in this specification.
For example, when the query source table clause of the sub-query statement further includes other tables, the preset rewrite condition further includes that the target table and the other tables are lossless connections, where the lossless connections refer to connections without changing the number of rows in the target table.
For another example, when the sub-query statement further includes a query condition clause, and the target query condition is one of the query conditions in the query condition clause of the main query statement, the equivalent rewriting further includes deleting the target query condition from the query condition clause of the main query statement.
In connection with a specific example, the target SQL statement may also be as follows:
select V.agg,t1.c2,t3.pk
from (select sum(t1.c1) agg, t1.c2
from t1, t2
where t1.c3>1 and t1.fk = t2.pk
group by t1.c2) V, t1, t3
where t1.c2 = V.c2 and
t1.c3 > 1 and
t1.pk > 1 and
t3.c1 = t1.c1 and
V.agg > 3;
that is, the FROM clause of the sub-query statement further includes other table t2, and the sub-query statement further includes a WHERE clause, and the query conditions therein are: WHERE t1.C3>1 is one of several query conditions in the WHERE clause of the main query statement.
It should be noted that when fk is the foreign key of target table t1 and pk is the primary key of other table t2, based on the query condition: t1.Fk = t2.Pk, the connections established with the other table t2 for the target table t1 are lossless connections, so that the preset rewriting conditions are satisfied.
The rewritten SQL statement corresponding to the target SQL statement is as follows:
select V.agg,V.c2,t3.pk
from (select sum(t1.c1) over(partition by t1.c2) agg, t1.c2, t1.c3 ,t1.pk ,t1.c1, t1.fk
from t1, t2
where t1.c3>1 and t1.fk = t2.pk) V, t3
where V.c2 is not null and
t3.c1 = V.c1 and
V.pk > 1 and
V.agg > 3;
in the rewritten SQL statement, the query conditions in the WHERE clause of the main query statement are deleted: t1.C3>1.
In summary, the processing method for the SQL statement provided in the embodiment of the present specification may eliminate aggregation connection in the SQL statement based on the window function, so as to avoid repeated scanning on the data table, and further facilitate the query optimizer to formulate a more efficient execution plan.
FIG. 4 shows a flow diagram of a method of processing an SQL statement according to yet another embodiment. The method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. As shown in fig. 4, the method may include the following steps.
Step 402, obtaining a target SQL statement to be processed, which includes a main query statement and two sub-query statements.
It should be noted that the main query statement/each sub-query statement herein at least includes a query SELECT clause (also referred to as a SELECT clause) and a query source table clause (also referred to as a FROM clause).
The two sub-query statements are respectively used for constructing two different views based on the target table. Specifically, the two sub-query statements respectively include two different grouping aggregation functions in respective query selection clauses and include a target table in respective query source table clauses, so that two different views based on the target table are constructed. Both views may be referred to herein as aggregate views.
The packet aggregation function is used for performing a target aggregation operation, and may be, for example, one of the following: a summation function (SUM ()), an averaging function (AVG ()), a maximum function (MAX ()), a minimum function (MIN ()), a COUNT function (COUNT ()), and the like.
In a specific example, the grouping and aggregation function is used for performing a target aggregation operation on a target column in an input table (e.g., a target table). For example, when the target column is C1, the summation function can be expressed as: SUM (C1), similarly, corresponding target columns may also be added for other grouping aggregation functions.
It should be appreciated that in building two views based on the two sub-query statements described above, the entire target table needs to be scanned twice.
In addition, the query source table clause of the main query statement described above includes two views. That is, a connection needs to be established for both views. Since both views indirectly contain the target table, the join of the two views may also be referred to as an aggregate join.
It will be appreciated that to establish the aggregated connection described above, the target table may need to be scanned again or multiple times (the specific number of scans being determined based on the connection type). Thus, the target table needs to be scanned repeatedly for the execution of the target SQL statement. However, the repeated scanning of the data table may generate unnecessary overhead, and especially when the data size of the data table is large, the computer resources may be greatly wasted, so that the target SQL statement needs to be processed.
And step 404, equivalently rewriting the target SQL statement to obtain a rewritten SQL statement.
The equivalent rewriting at least includes, for a first sub-query statement of the two sub-query statements, adding a second grouping aggregation function corresponding to the second sub-query statement to a query selection clause therein to obtain an updated first sub-query statement. And generating a target sub-query statement for constructing the target view based on the window function corresponding to the second aggregation function and the updated first sub-query statement, and replacing the two sub-query statements with the target sub-query statement to obtain the rewritten SQL statement.
First, when the two sub-query statements do not include a grouping sub-statement, the first sub-query statement may be any one of the two sub-query statements. Second, when one of the sub-query statements includes a grouping clause, then one of the sub-query statements including the grouping clause may be used as the first sub-query statement. Finally, when both the sub-query statements include the grouping clause, the sub-query statement in which the grouping expression includes the largest number of columns may be used as the first sub-query statement.
The window function is used for performing aggregation operation corresponding to the second grouping aggregation function on each data row in the window to which the data row belongs, and recording a corresponding aggregation result into a corresponding row of a newly added column. It should be understood that the entire target table is treated as one window without specifying window division parameters.
In the embodiment of the present specification, other packet aggregation functions than the packet aggregation functions COUNT () and AVG () are the same as the corresponding window function names.
For example, if the grouping aggregation function is SUM (), the corresponding window function is SUM () OVER (). And if the packet aggregation function is COUNT (), the corresponding window function is SUM () OVER (), and if the packet aggregation function is AVG (), the corresponding window function is SUM ()/SUM () OVER ().
The generating of the target sub-query statement for constructing the target view may specifically include generating the target sub-query statement for constructing the target view by including a window function in the query selection clause thereof and including an intermediate view in the query source table clause thereof.
It should be appreciated that after replacing two sub-query statements with the target sub-query statement used to construct the target view, the query source table clause of the main query statement includes only the target view, and thus the aggregate join may be eliminated. That is, the rewritten SQL statement only needs to be scanned once for the target table, so that the execution efficiency of the SQL statement can be improved.
Of course, in practical applications, the prefix name of the column or the column name may also be adaptively adjusted, which is not limited in this specification.
It should be noted that when the rewritten SQL statement is executed, aggregation connection does not need to be established, and only needs to scan the target table t1 once, so that unnecessary overhead can be avoided, and a more efficient execution plan can be conveniently formulated by the query optimizer.
FIG. 5 illustrates a flow diagram of a method for processing an SQL statement, according to yet another embodiment. The method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. As shown, the method may include the following steps.
Step 502, a target SQL statement to be processed is obtained, which includes a main query statement and two sub query statements.
One of the two sub-query statements (hereinafter referred to as a first sub-query statement) includes at least a query SELECT clause, a query source table clause and a grouping clause. The other sub-query statement (hereinafter referred to as a second sub-query) includes at least a query SELECT clause (also referred to as a SELECT clause) and a query source table clause (also referred to as a FROM clause). The two sub-query statements are used to construct two different views based on the target table, respectively.
Specifically, the two sub-query statements respectively include two different grouping aggregation functions in respective query selection clauses and include a target table in respective query source table clauses, so that two different views based on the target table are constructed. Both views may be referred to herein as aggregate views.
In addition, the first sub-query statement also includes, in its grouping clause (also referred to as a first grouping clause), a first grouping expression for determining a grouping of the first grouping aggregation function in the query selection clause of the first sub-query statement.
It should be appreciated that in building two views based on the two sub-query statements described above, the entire target table needs to be scanned twice.
In addition, the main query statement includes at least a query SELECT clause (also called a SELECT clause), a query source table clause (also called a FROM clause), and a query condition clause (also called a WHERE clause).
The query source table clause of the main query statement comprises two views respectively constructed based on the two sub-query statements, and the query condition clause at least comprises equivalent connection conditions set for the two views.
That is, when executing the main query statement, an aggregate connection needs to be established for the two views. Since both views indirectly contain the target table, the join of the two views may also be referred to as an aggregate join.
It will be appreciated that to establish the aggregated connection described above, the target table needs to be scanned again. Thus, the target table needs to be scanned repeatedly for the execution of the target SQL statement. However, repeated scanning of the data table may generate unnecessary overhead, and especially when the data size of the data table is large, computer resources may be greatly wasted, so that the target SQL statement needs to be processed.
Step 504, judging whether the target SQL statement meets preset rewriting conditions.
The preset rewriting conditions at least comprise that two expressions at two sides of a medium sign in the equivalent connection condition in the query condition clause of the main query statement are isomorphic with the first packet expression.
In the foregoing example, the iso-join condition: v1.C2= v2.C2 the two columns on either side of the equal sign are of the same name, so the two columns are isomorphic. Further, the first packet expression: since the column in t1.C2 is also the same name as the above two columns, the column in t1.C2 is isomorphic with the columns in v1.C2 and v2.C2, and the above-described preset rewrite condition is satisfied.
And step 506, carrying out equivalent rewriting on the target SQL statement under the condition of meeting the preset rewriting conditions to obtain the rewritten SQL statement.
The equivalence rewriting includes, for a first sub-query statement of the two sub-query statements, adding a second grouping aggregation function corresponding to a second sub-query statement to a query selection clause thereof, resulting in an updated first sub-query statement. And generating a target sub-query statement for constructing the target view based on the window function corresponding to the second aggregation function and the updated first sub-query statement, and replacing the two sub-query statements with the target sub-query statement. And deleting the isovalent join condition and adding a non-null constraint to the isomorphic column corresponding to the column in the first packet expression.
Of course, in practical applications, the above equivalent rewriting may further include adding each column in the target table included in the main query statement and the two sub-query statements to the query selection sub-statement of the first sub-query statement, and replacing each column in the two views included in the main query statement with a corresponding column in the target view.
Of course, in practical applications, the prefix name of the column or the column name may also be adaptively adjusted, which is not limited in this specification.
It should be noted that when only the target view is included in the query source table clause of the main query statement, the aggregate join may be eliminated. That is, the rewritten SQL statement only needs to be scanned once for the target table, so that the execution efficiency of the SQL statement can be improved.
Of course, in practical applications, with the complexity of the SQL statement, the preset rewrite conditions and the equivalent rewrite method may also be adjusted accordingly, which is not limited in this specification.
For example, when the second sub-query statement further includes a second grouping clause, and the second grouping expression therein is used to determine a grouping of the second grouping aggregation function in the query selection clause of the second sub-query statement. The preset rewriting condition may be replaced by that the columns in the second grouping expression are subsets of the columns in the first grouping expression, and two expressions on two sides of the equal sign in the query condition clause of the main query statement are isomorphic with the second grouping expression.
Accordingly, the above-mentioned equivalent rewriting step also needs to be adjusted, for example, to replace the adding of the non-null constraint to the isomorphic column corresponding to the column in the first grouping expression with the adding of the non-null constraint to the isomorphic column corresponding to the column in the second grouping expression. In addition, the following rewriting step may be added: and taking the second grouping expression as a window division parameter of the window function.
For another example, when the first sub-query statement further includes a first query condition clause, and the second sub-query statement further includes a second query condition clause, the preset rewrite condition further includes that the query conditions included in the first and second query condition clauses are the same.
With specific examples, the target SQL statement may be as follows:
select v1.agg, v2.agg
from
(select sum(t1.c1) agg, t1.c2 from t1 group by t1.c3, t1.c2) v1,
(select count(t1.c1) agg, t1.c2 from t1 group by t1.c2) v2
where v1.c2=v2.c2;
in the target SQL statement, two statements respectively contained in two brackets are two sub-query statements, where a SELECT clause of one sub-query statement includes a grouping aggregation function: sum (t 1. C1) and column c2, the FROM clause includes the target table: t1, the GROUP BY clause includes a first packet expression: t1.C3, t1.C2. By means of the one sub-query statement, an aggregated view v1 based on the target table t1 can be constructed. The SELECT clause of another sub-query statement includes a grouping aggregation function: count (t 1. C1), and the FROM clause includes the target table: t1, the GROUP BY clause includes a second packet expression: t1.C2. An aggregated view v2 based on the target table t1 can be constructed by the further sub-query statement.
In addition, in the target SQL statement, the statement except for the two sub-query statements is the main query statement, the FROM sub-statement of the main query statement includes aggregated views v1 and v2, and the WHERE sub-statement includes an equal join condition: v1.C2= v2.C2.
Note that, in the above example, when constructing the aggregated views v1 and v2, the target table t1 needs to be scanned twice. Thereafter, when establishing the aggregation connection of the aggregation views v1 and v2, the target table needs to be scanned again.
Regarding the target SQL statement, since the GROUP expression in the GROUP BY clause of the other sub-query statement: the column in c2 is the grouping expression in the GROUP BY clause of the above one sub-query statement: t1.C3, t1. C2; and the columns in t1.C2 are isomorphic with the columns in v1.C2, v2.C2, so that the preset rewrite conditions are satisfied.
The rewritten SQL statement corresponding to the target SQL statement is as follows:
select v.agg1, v.agg2
from
(select sum(vx.agg2) over(partition by t1.c2) as agg2, vx.agg as agg1, vx.c2
from (select sum(t1.c1) agg, count(t1.c1) agg2, t1.c2, t1.c3
from t1
group by t1.c3, t1.c2) vx) v
where v.c2 is not null;
the rewritten SQL statement may be obtained by performing the following equivalent rewriting on the target SQL statement: first, the one sub-query statement is used as a first sub-query statement, and the other sub-query statement is used as a second sub-query statement. Then, adding the second aggregation function count (t 1. C1) in the second sub-query statement to the SELECT clause of the first sub-query statement to obtain an updated first sub-query statement: select sum (t 1. C1) agg, count (t 1. C1) agg2, t1.C2 from t1 group by t1.C3, t1.C2. It should be appreciated that based on this updated first sub-query statement, an intermediate view may be constructed that results in: vx.
Next, by including a window function sum () over () corresponding to sum (t 1. C1) in the SELECT clause, and by grouping the second packet expression: c2 as a window division parameter for the window function. And the FROM clause comprises an intermediate view vx, a target sub-query statement used for constructing a target view v is generated, and the original two sub-query statements are replaced by the target sub-query statement. It should be appreciated that after performing the replacement, the FROM clause of the main query statement includes only the target view v.
Finally, deleting the equivalent connection condition: v1.C2= v2.C2, and adds a non-null constraint for v.c2 (an isomorphic column of t1. C2).
It should be noted that when the rewritten SQL statement is executed, aggregation connection does not need to be established, and only needs to scan the target table t1 once, so that unnecessary overhead can be avoided, and a more efficient execution plan can be conveniently formulated by the query optimizer.
Corresponding to the above processing method of the SQL statement, an embodiment of the present disclosure further provides a processing apparatus of the SQL statement, as shown in fig. 6, where the apparatus may include:
the obtaining unit 602 is configured to obtain a target SQL statement to be processed, where the target SQL statement includes a main query statement and a sub query statement. The sub-query statement constructs a target view based on the target table by including the grouping aggregation function in its query selection clause and the target table in its query source table clause. The packet aggregation function here is used to perform a target aggregation operation. The query source table clause of the main query statement includes a target view and a target table.
The rewriting unit 604 is configured to perform equivalent rewriting on the target SQL statement to obtain a rewritten SQL statement.
The rewriting unit 604 includes:
a replacing module 6042, configured to replace the grouping aggregation function with a corresponding window function;
a deletion module 6044 for deleting the target table from the query source table clause of the main query statement;
wherein the packet aggregation function comprises one of: summing functions, averaging functions, maximizing functions, minimizing functions, and counting functions.
The window function is used for performing target aggregation operation on each data row in the window to which the data row belongs according to each data row of the input table, and recording a corresponding aggregation result into a corresponding row of a newly added column.
Optionally, the main query statement further includes a query condition clause, which at least includes the equivalence connection conditions set for the target view and the target table. The sub query statement also comprises a grouping clause, and a grouping expression in the grouping clause is used for determining grouping of a grouping aggregation function;
the above-mentioned device still includes:
the determining unit 606 is configured to determine whether the target SQL statement satisfies a preset rewrite condition, where the preset rewrite condition at least includes that two expressions on two sides of a medium sign in the equivalent connection condition are isomorphic, and the group expression is one of the two expressions.
The rewrite unit 604 is specifically configured to: and equivalently rewriting the target SQL statement under the condition of meeting the preset rewriting condition.
The rewriting unit further includes:
a determining module 6046, configured to use the grouping expression as a window division parameter of the window function;
the deleting module 6044 is further configured to delete the grouping clause and delete the equivalent connection condition;
an adding module 6048, configured to add a non-null constraint to an isomorphic column corresponding to a column in the grouping expression.
The adding module 6048 is further configured to add each column in the target table included in the main query statement and the sub query statement to the query selection clause of the sub query statement, and replace each column in the target table included in the main query statement with a corresponding column in the target view.
Optionally, the query source table clause of the sub-query statement further includes other tables, and the preset rewrite condition further includes that the target table and the other tables are lossless connections, where the lossless connections refer to connections without changing the number of rows in the target table.
Optionally, the sub-query statement further includes a query condition clause, and the target query condition is one of a plurality of query conditions in the query condition clause of the main query statement;
the deleting module 6044 is further configured to delete the target query condition from the query condition clause of the main query statement.
Optionally, the apparatus may further include:
transmitting unit (not shown in the figure): and the SQL sentence rewriting unit is used for sending the rewritten SQL sentence to the query optimizer so that the query optimizer formulates a target execution plan based on the rewritten SQL sentence.
Optionally, the apparatus may further include:
formulation unit (not shown in the figures): and the system is used for making a target execution plan based on the rewritten SQL statement.
The functions of each functional module of the device in the above embodiments of the present description may be implemented through each step of the above method embodiments, and therefore, a specific working process of the device provided in one embodiment of the present description is not repeated herein.
The processing device for the SQL statement provided in an embodiment of the present specification may eliminate aggregation connection, thereby reducing the number of scanning times for the data table, and further facilitating the query optimizer to make a more efficient execution plan.
Corresponding to the above processing method of the SQL statement, an embodiment of the present disclosure further provides a processing apparatus of the SQL statement, as shown in fig. 7, where the apparatus may include:
the obtaining unit 702 is configured to obtain a target SQL statement to be processed, where the target SQL statement includes a main query statement and two sub query statements. The two sub-query statements respectively comprise two different grouping aggregation functions in respective query selection clauses and comprise target tables in respective query source table clauses, so that two different views based on the target tables are constructed. The query source table clause of the main query statement includes the two views.
And a rewriting unit 704, configured to equivalently rewrite the target SQL statement.
The rewriting unit 704 includes:
an adding module 7042, configured to, for a first sub-query statement in the two sub-query statements, add a second grouping aggregation function corresponding to the second sub-query statement to the query selection clause therein, so as to obtain an updated first sub-query statement;
a generating module 7044, configured to generate a target sub-query statement for constructing the target view based on the window function corresponding to the second aggregation function and the updated first sub-query statement;
a replacement module 7046 for replacing the two sub-query statements with the target sub-query statement;
the window function is used for performing aggregation operation corresponding to the second grouping aggregation function on each data row in the window to which the window function belongs according to each data row of the input table, and recording a corresponding aggregation result into a corresponding row of a newly added column.
Optionally, the updated first sub-query statement is used to construct an intermediate view;
the generating module 7044 is specifically configured to:
generating a target sub-query statement for constructing a target view by including a window function in its query selection clause and an intermediate view in its query source table clause.
Optionally, the main query statement further includes a query condition clause, which at least includes the isovalent join condition set for the two views. The first sub-query statement further includes a first grouping clause, and a first grouping expression in the first grouping clause is used for determining a grouping of the corresponding first grouping aggregation function;
the device also includes:
a determining unit 706, configured to determine whether the target SQL statement satisfies a preset rewrite condition, where the preset rewrite condition at least includes that two expressions on two sides of a medium sign in the equivalence connection condition are isomorphic with the first packet expression;
the rewrite unit 704 is specifically configured to: and equivalently rewriting the target SQL statement under the condition of meeting the preset rewriting conditions.
The rewriting unit 704 further includes:
a deleting module 7048 for deleting the iso-join condition;
an adding module 7042 is further configured to add a non-null constraint to an isomorphic column corresponding to a column in the first packet expression.
Optionally, the main query statement further includes a query condition clause, which at least includes an isojoin condition set for the two views. The first sub-query further includes a first grouping clause, where the first grouping expression is used to determine a grouping of the corresponding first grouping aggregation function. The second sub-query further includes a second grouping clause, wherein the second grouping expression is used for determining a grouping of a second grouping aggregation function;
the device also includes:
the determining unit 706 is configured to determine whether the target SQL statement satisfies a preset rewrite condition, where the preset rewrite condition at least includes that a column in the second grouping expression is a subset of a column in the first grouping expression, and two expressions on two sides of a medium sign in the equivalence connection condition are isomorphic with the second grouping expression.
The rewrite unit 704 is specifically configured to: and equivalently rewriting the target SQL statement under the condition of meeting the preset rewriting conditions.
The rewriting unit 704 further includes:
a deleting module 7048 for deleting the iso-join condition;
an adding module 7042, configured to add a non-null constraint to an isomorphic column corresponding to a column in the second grouping expression;
a determining module 7050 is configured to use the second grouping expression as a window partition parameter of a window function.
Optionally, the adding module 7042 is further configured to add each column in the target table included in the main query statement and the two sub query statements to the query selection clause of the first sub query statement, and replace each column in the two views included in the main query statement with a corresponding column in the target view.
Optionally, the first sub-query statement further includes a first query condition clause, and the second sub-query statement further includes a second query condition clause;
the preset rewriting condition further includes that the query conditions included in the first and second query condition clauses are the same.
The functions of the functional modules of the device in the foregoing embodiments of the present specification may be implemented through the steps of the foregoing method embodiments, and therefore, detailed working processes of the device provided in an embodiment of the present specification are not described herein again.
The processing device for the SQL statement provided in an embodiment of the present specification may eliminate aggregation connection, thereby reducing the number of scanning times for the data table, and further facilitating the query optimizer to make a more efficient execution plan.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with any of fig. 2-5.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in conjunction with any of fig. 2-5.
All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or may be embodied in software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a server. Of course, the processor and the storage medium may reside as discrete components in a server.
Those skilled in the art will recognize that the functionality described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof, in one or more of the examples described above. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above-mentioned embodiments, objects, technical solutions and advantages of the present specification are further described in detail, it should be understood that the above-mentioned embodiments are only specific embodiments of the present specification, and are not intended to limit the scope of the present specification, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present specification should be included in the scope of the present specification.

Claims (25)

1. A processing method of SQL statements comprises the following steps:
acquiring a target SQL statement to be processed, wherein the target SQL statement comprises a main query statement and a sub-query statement; the sub-query statement comprises a grouping aggregation function in a query selection clause thereof and a target table in a query source table clause thereof, so that a target view based on the target table is constructed; the grouping aggregation function is used for carrying out target aggregation operation; a query source table clause of the main query statement comprises the target view and the target table;
equivalently rewriting the target SQL sentence to obtain a rewritten SQL sentence; the equivalent rewriting at least comprises replacing the grouping aggregation function by using a corresponding window function, and deleting the target table from a query source table clause of the main query statement; the window function is used for carrying out the target aggregation operation on each data row in the window to which the data row belongs according to each data row of the input table, and recording the corresponding aggregation result into the corresponding row of the newly added column.
2. The method of claim 1, further comprising:
and sending the rewritten SQL statement to a query optimizer, so that the query optimizer formulates a target execution plan based on the rewritten SQL statement.
3. The method of claim 1, wherein the method is performed by a query optimizer; the method further comprises the following steps:
and formulating a target execution plan based on the rewritten SQL statement.
4. The method of claim 1, wherein the packet aggregation function comprises one of: a summation function, an averaging function, a maximum function, a minimum function, and a count function.
5. The method of claim 1, wherein the main query statement further comprises a query condition clause including at least an isojoin condition set for the target view and target table; the sub-query statement also comprises a grouping clause, and a grouping expression in the grouping clause is used for determining the grouping of the grouping aggregation function;
before the equivalently rewriting the target SQL statement, the method further comprises the following steps:
judging whether the target SQL statement meets preset rewriting conditions or not; the preset rewriting conditions at least comprise that two expressions at two sides of a medium number in the equivalent connection condition are isomorphic, and the grouping expression is one of the two expressions;
under the condition that the preset rewriting condition is met, equivalently rewriting the target SQL statement;
the equivalent rewriting further comprises taking the grouping expression as a window division parameter of the window function; deleting the grouping clause, deleting the equivalent connection condition, and adding a non-null constraint to an isomorphic column corresponding to a column in the grouping expression.
6. The method of claim 5, wherein the query source table clauses of the sub-query statement further comprise other tables; the preset rewriting condition further includes that the target table and the other tables are in lossless connection; the lossless connection refers to connection under the condition that the number of rows of the target table is not changed.
7. The method of claim 1, wherein the sub-query statement further comprises a query condition clause, and wherein the target query condition is one of a number of query conditions in the query condition clause of the main query statement;
the equivalent rewriting further includes deleting the target query condition from the query condition clause of the main query statement.
8. The method of claim 1, wherein the equivalent rewrites further comprises adding each column in the target table contained in the main query statement and a sub-query statement to a query selection clause of the sub-query statement and replacing each column in the target table contained in the main query statement with a corresponding column in the target view.
9. A processing method of SQL statements comprises the following steps:
acquiring a target SQL statement to be processed, wherein the target SQL statement comprises a main query statement and two sub-query statements; the two sub-query sentences respectively comprise two different grouping aggregation functions in respective query selection clauses and comprise target tables in respective query source table clauses, so that two different views based on the target tables are constructed; the query source table clause of the main query statement comprises the two views;
carrying out equivalent rewriting on the target SQL statement; the equivalent rewriting at least comprises that for a first sub-query statement in the two sub-query statements, a second grouping aggregation function corresponding to a second sub-query statement is added to a query selection clause in the two sub-query statements to obtain an updated first sub-query statement; generating a target sub-query statement for constructing a target view based on a window function corresponding to the second grouping aggregation function and the updated first sub-query statement, and replacing the two sub-query statements with the target sub-query statement;
and the window function is used for carrying out aggregation operation corresponding to the second grouped aggregation function on each data row in the window to which the window function belongs aiming at each data row of the input table, and recording the corresponding aggregation result into the corresponding row of the newly added column.
10. The method of claim 9, wherein the updated first sub-query statement is used to construct an intermediate view;
the generating of the target sub-query statement for constructing the target view comprises:
and generating a target sub-query sentence for constructing a target view by including the window function in the query selection clause and including the intermediate view in the query source table clause.
11. The method of claim 9, wherein the main query statement further comprises a query condition clause including at least an isojoin condition set for the two views; the first sub-query statement further comprises a first grouping clause, and a first grouping expression in the first grouping clause is used for determining a grouping of a corresponding first grouping aggregation function;
before the equivalently rewriting the target SQL statement, the method further comprises the following steps:
judging whether the target SQL statement meets preset rewriting conditions or not; wherein, the preset rewrite conditions at least comprise that two expressions on both sides of the medium sign in the equivalent connection condition are isomorphic with the first packet expression;
under the condition that a preset rewriting condition is met, equivalently rewriting the target SQL statement;
the equivalent rewrite further includes deleting the equivalent join condition and adding a non-null constraint for an isomorphic column corresponding to a column in the first packet expression.
12. The method of claim 9, wherein the main query statement further comprises a query condition clause including at least an isojoin condition set for the two views; the first sub-query statement further comprises a first grouping clause, wherein the first grouping expression is used for determining a grouping of the corresponding first grouping aggregation function; the second sub-query statement further comprises a second grouping clause, wherein the second grouping expression is used for determining the grouping of the second grouping aggregation function;
before the equivalently rewriting the target SQL statement, the method further comprises the following steps:
judging whether the target SQL statement meets preset rewriting conditions or not; wherein the preset rewrite conditions at least comprise that columns in the second grouping expression are subsets of columns in the first grouping expression, and two expressions on two sides of a medium sign in the equivalent connection condition are isomorphic with the second grouping expression;
under the condition that a preset rewriting condition is met, equivalently rewriting the target SQL statement;
the equivalence rewriting further includes deleting the equivalence connection condition, adding a non-null constraint to an isomorphic column corresponding to a column in the second grouping expression, and using the second grouping expression as a window division parameter of the window function.
13. The method of claim 11 or 12, wherein the first sub-query statement further comprises a first query condition clause; the second sub-query statement further includes a second query condition sub-statement;
the preset rewriting conditions further include that the query conditions included in the first query condition clause and the second query condition clause are the same.
14. The method of claim 9, wherein the equivalent rewriting further comprises adding each column in the target table contained in the main query statement and two sub-query statements to a query selection sub-statement of the first sub-query statement and replacing each column in the two views contained in the main query statement with a corresponding column in the target view.
15. An apparatus for processing an SQL statement, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a target SQL statement to be processed, and the target SQL statement comprises a main query statement and a sub query statement; the sub-query statement comprises a grouping aggregation function in a query selection clause of the sub-query statement and comprises a target table in a query source table clause of the sub-query statement, so that a target view based on the target table is constructed; the grouping aggregation function is used for carrying out target aggregation operation; a query source table clause of the main query statement comprises the target view and the target table;
the rewriting unit is used for equivalently rewriting the target SQL statement to obtain a rewritten SQL statement;
the rewriting unit includes:
a replacement module for replacing the packet aggregation function with a corresponding window function;
a deletion module for deleting the target table from the query source table clause of the main query statement; the window function is used for carrying out the target aggregation operation on each data row in the window to which the data row belongs according to each data row of the input table, and recording the corresponding aggregation result into the corresponding row of the newly added column.
16. The apparatus of claim 15, wherein the main query statement further comprises a query condition clause including at least an isojoin condition set for the target view and target table; the sub query statement also comprises a grouping clause, and a grouping expression in the grouping clause is used for determining the grouping of the grouping aggregation function;
the device further comprises:
the judging unit is used for judging whether the target SQL statement meets preset rewriting conditions or not; wherein the preset rewrite condition at least comprises that two expressions on two sides of a medium sign in the equivalent connection condition are isomorphic, and the grouping expression is one of the two expressions;
the rewriting unit is specifically configured to: under the condition that the preset rewriting condition is met, equivalently rewriting the target SQL statement;
the rewriting unit further includes:
a determining module, configured to use the grouping expression as a window division parameter of the window function;
the deleting module is further configured to delete the grouping clause and delete the isojunction condition;
and the adding module is used for adding non-empty constraint aiming at the isomorphic column corresponding to the column in the grouping expression.
17. The apparatus of claim 16, wherein a query source table clause of the sub-query statement further comprises other tables; the preset rewriting condition further includes that the target table and the other tables are in lossless connection; the lossless connection refers to connection under the condition that the number of rows of the target table is not changed.
18. The apparatus of claim 15, wherein the sub-query statement further comprises a query condition clause, and wherein the target query condition is one of a plurality of query conditions in the query condition clause of the main query statement;
the deleting module is further configured to delete the target query condition from the query condition clause of the main query statement.
19. An apparatus for processing an SQL statement, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a target SQL statement to be processed, and the target SQL statement comprises a main query statement and two sub-query statements; the two sub-query sentences respectively comprise two different grouping aggregation functions in respective query selection clauses and comprise target tables in respective query source table clauses, so that two different views based on the target tables are constructed; the query source table clause of the main query statement comprises the two views;
the rewriting unit is used for equivalently rewriting the target SQL statement;
the rewriting unit includes:
the adding module is used for adding a second grouping aggregation function corresponding to a second sub-query statement to a query selection sub-statement in a first sub-query statement of the two sub-query statements to obtain an updated first sub-query statement;
a generating module, configured to generate a target sub-query statement for constructing a target view based on the window function corresponding to the second grouping aggregation function and the updated first sub-query statement;
a replacement module for replacing the two sub-query statements with the target sub-query statement;
and the window function is used for carrying out aggregation operation corresponding to the second grouped aggregation function on each data row in the window to which the window function belongs aiming at each data row of the input table, and recording the corresponding aggregation result into the corresponding row of the newly added column.
20. The apparatus of claim 19, wherein the updated first sub-query statement is used to construct an intermediate view;
the generation module is specifically configured to:
generating a target sub-query statement for constructing a target view by including the window function in its query selection clause and the intermediate view in its query source table clause.
21. The apparatus of claim 19, wherein the main query statement further comprises a query condition clause including at least an isojoin condition set for the two views; the first sub-query statement further comprises a first grouping clause, and a first grouping expression in the first grouping clause is used for determining a grouping of a corresponding first grouping aggregation function;
the device further comprises:
the judging unit is used for judging whether the target SQL statement meets preset rewriting conditions or not; wherein, the preset rewrite conditions at least comprise that two expressions at two sides of a medium sign in the equivalent connection condition are isomorphic with the first packet expression;
the rewriting unit is specifically configured to: under the condition that a preset rewriting condition is met, equivalently rewriting the target SQL statement;
the rewriting unit further includes:
a deleting module for deleting the equivalent connection condition;
the adding module is further configured to add a non-empty constraint to an isomorphic column corresponding to a column in the first packet expression.
22. The apparatus of claim 19, wherein the main query statement further comprises a query condition clause including at least an isojoin condition set for the two views; the first sub-query statement further comprises a first grouping clause, wherein the first grouping expression is used for determining the grouping of the corresponding first grouping aggregation function; the second sub-query statement further comprises a second grouping clause, wherein the second grouping expression is used for determining the grouping of the second grouping aggregation function;
the device further comprises:
the judging unit is used for judging whether the target SQL statement meets preset rewriting conditions or not; the preset rewriting condition at least comprises that columns in the second grouping expression are subsets of columns in the first grouping expression, and two expressions on two sides of a medium sign in the equal connection condition are isomorphic with the second grouping expression;
the rewriting unit is specifically configured to: carrying out equivalent rewriting on the target SQL statement under the condition of meeting a preset rewriting condition;
the rewriting unit further includes:
a deleting module for deleting the equivalent connection condition;
the adding module is further configured to add a non-null constraint to an isomorphic column corresponding to a column in the second grouping expression;
a determining module, configured to use the second grouping expression as a window division parameter of the window function.
23. The apparatus of claim 21, wherein the first sub-query statement further comprises a first query condition sub-statement; the second sub-query statement further comprises a second query condition clause;
the preset rewriting condition further includes that the query conditions included in the first and second query condition clauses are the same.
24. A computer-readable storage medium, on which a computer program is stored, wherein the computer program causes a computer to carry out the method of any one of claims 1-14, when the computer program is carried out in the computer.
25. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-14.
CN202210819206.1A 2022-07-13 2022-07-13 SQL statement processing method and device Active CN114969101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210819206.1A CN114969101B (en) 2022-07-13 2022-07-13 SQL statement processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210819206.1A CN114969101B (en) 2022-07-13 2022-07-13 SQL statement processing method and device

Publications (2)

Publication Number Publication Date
CN114969101A CN114969101A (en) 2022-08-30
CN114969101B true CN114969101B (en) 2022-10-21

Family

ID=82968967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210819206.1A Active CN114969101B (en) 2022-07-13 2022-07-13 SQL statement processing method and device

Country Status (1)

Country Link
CN (1) CN114969101B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315790A (en) * 2017-06-14 2017-11-03 腾讯科技(深圳)有限公司 A kind of optimization method and device of irrelevant subquery
CN112100198A (en) * 2020-09-09 2020-12-18 上海达梦数据库有限公司 Database SQL statement optimization method, device, equipment and storage medium
CN112732752A (en) * 2021-01-29 2021-04-30 上海达梦数据库有限公司 Query statement optimization method, device, equipment and storage medium
CN114490724A (en) * 2022-04-15 2022-05-13 北京奥星贝斯科技有限公司 Method and device for processing database query statement

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2418163A1 (en) * 2003-01-31 2004-07-31 Ibm Canada Limited - Ibm Canada Limitee Method of query transformation using window aggregation
US7945560B2 (en) * 2008-05-22 2011-05-17 Oracle International Corporation Technique for removing subquery in group by—having clauses using window functions

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315790A (en) * 2017-06-14 2017-11-03 腾讯科技(深圳)有限公司 A kind of optimization method and device of irrelevant subquery
CN112100198A (en) * 2020-09-09 2020-12-18 上海达梦数据库有限公司 Database SQL statement optimization method, device, equipment and storage medium
CN112732752A (en) * 2021-01-29 2021-04-30 上海达梦数据库有限公司 Query statement optimization method, device, equipment and storage medium
CN114490724A (en) * 2022-04-15 2022-05-13 北京奥星贝斯科技有限公司 Method and device for processing database query statement

Also Published As

Publication number Publication date
CN114969101A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
US6850925B2 (en) Query optimization by sub-plan memoization
US6947927B2 (en) Method and apparatus for exploiting statistics on query expressions for optimization
US7177855B2 (en) Compile-time optimizations of queries with SQL spreadsheet
US7761403B2 (en) Run-time optimizations of queries with SQL spreadsheet
US7171399B2 (en) Method for efficient query execution using dynamic queries in database environments
US8396852B2 (en) Evaluating execution plan changes after a wakeup threshold time
US20040249810A1 (en) Small group sampling of data for use in query processing
US20050050041A1 (en) Use of statistic on view in query optimization
US9218394B2 (en) Reading rows from memory prior to reading rows from secondary storage
CN115292350A (en) SQL statement processing method and device
CN112579610A (en) Multi-data source structure analysis method, system, terminal device and storage medium
CN115964374B (en) Query processing method and device based on pre-calculation scene
CN112100198A (en) Database SQL statement optimization method, device, equipment and storage medium
US20040249845A1 (en) Efficient processing of multi-column and function-based in-list predicates
US9117005B2 (en) Statistics collection using path-value pairs for relational databases
CN114969101B (en) SQL statement processing method and device
CN108549688A (en) A kind of optimization method of data manipulation, device, equipment and storage medium
CN110297858B (en) Optimization method and device for execution plan, computer equipment and storage medium
CN110321388B (en) Quick sequencing query method and system based on Greenplus
WO2023134329A1 (en) Index selection method, electronic device, and storage medium
CN114138821A (en) Database query method, system and electronic equipment
Brucato et al. Scalable computation of high-order optimization queries
WO2017156673A1 (en) Processing method and device for stored procedure
CN108052522B (en) Method and system for dynamically optimizing OLAP pre-calculation model
WO2024041221A1 (en) Selection rate estimation method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant