CN114416784B - Method and device for processing database query statement and native distributed database - Google Patents

Method and device for processing database query statement and native distributed database Download PDF

Info

Publication number
CN114416784B
CN114416784B CN202210308440.8A CN202210308440A CN114416784B CN 114416784 B CN114416784 B CN 114416784B CN 202210308440 A CN202210308440 A CN 202210308440A CN 114416784 B CN114416784 B CN 114416784B
Authority
CN
China
Prior art keywords
query
sub
database
statement
query statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210308440.8A
Other languages
Chinese (zh)
Other versions
CN114416784A (en
Inventor
郑振国
王国平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Oceanbase Technology Co Ltd
Original Assignee
Beijing Oceanbase Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Oceanbase Technology Co Ltd filed Critical Beijing Oceanbase Technology Co Ltd
Priority to CN202210308440.8A priority Critical patent/CN114416784B/en
Publication of CN114416784A publication Critical patent/CN114416784A/en
Application granted granted Critical
Publication of CN114416784B publication Critical patent/CN114416784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24535Query rewriting; Transformation of sub-queries or views
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure discloses a method, an apparatus and a native distributed database for processing database query statements, the method comprising: receiving a first database query statement, wherein the first database query statement comprises an external query and a first sub-query, the query condition of the first sub-query is a relevant condition of the external query, and the first sub-query has a target sub-sentence which enables the first sub-query not to be expanded; generating a second database query statement according to the first database query statement, wherein the second database query statement comprises the external query and a second sub-query equivalent to the first sub-query, the second sub-query is used for querying a first view encapsulating the target sub-sentence, and a query condition of the second sub-query is the relevant condition; and expanding the second sub-query.

Description

Method and device for processing database query statement and native distributed database
Technical Field
The present disclosure relates to the field of databases, and in particular, to a method and an apparatus for processing a database query statement, and a native distributed database.
Background
Since the execution of the related sub-queries depends on the external query, the query efficiency is very low when the computation is executed in the conventional nested loop manner. One solution is that the database system rewrites the related sub-queries to promote half-join, and then the optimizer selects a more efficient join algorithm to compute the related sub-queries, thereby promoting the query efficiency of the related sub-queries.
However, for some complex related sub-queries, since the sub-queries cannot be expanded due to the fact that the related sub-queries contain some specific clauses, the semi-join rewriting cannot be performed, only the calculation can be performed according to the traditional nested loop mode, and the query performance is very low. Especially, when the data size of the driving table is very large, the query execution time often cannot meet the user requirement.
Disclosure of Invention
In view of this, the present disclosure provides a method and an apparatus for processing a database query statement, and a native distributed database, which can rewrite some complex related sub-queries having specific clauses into half-join, thereby greatly improving the execution performance of the related sub-queries.
In a first aspect, a method for processing a database query statement is provided, including: receiving a first database query statement, wherein the first database query statement comprises an external query and a first sub-query, the query condition of the first sub-query is a relevant condition of the external query, and the first sub-query has a target sub-sentence which enables the first sub-query not to be expanded; generating a second database query statement according to the first database query statement, wherein the second database query statement comprises the external query and a second sub-query equivalent to the first sub-query, the second sub-query is used for querying a first view encapsulating the target sub-sentence, and a query condition of the second sub-query is the relevant condition; and expanding the second sub-query.
Optionally, the generating a second database query statement according to the first database query statement includes: generating a third database query statement according to the first database query statement, wherein the third database query statement comprises the external query and a third sub-query equivalent to the first sub-query, and the third sub-query is used for querying a second view encapsulating the first sub-query statement; and extracting the relevant conditions to the outside of the second view to obtain the second sub-query.
Optionally, the generating a second database query statement according to the first database query statement includes: judging whether the first sub-query is an expandable sub-query; and if the first sub-query is a non-expandable sub-query, generating the second database query statement according to the first database query statement.
Optionally, the target clause includes one or more of the following clauses: GROUP BY clause, HAVING clause, WINDOW FUNCTION clause.
Optionally, the performing sub-query expansion on the second sub-query includes: rewriting the second sub-query to a semi-join algorithm; and executing the calculation of the second database query statement according to the semi-join algorithm.
In a second aspect, there is provided a native distributed database comprising: a receiving module, configured to receive a first database query statement, where the first database query statement includes an external query and a first sub-query, a query condition of the first sub-query is a relevant condition of the external query, and the first sub-query has a target sub-sentence that cannot be expanded by the first sub-query; a generating module, configured to generate a second database query statement according to the first database query statement, where the second database query statement includes the external query and a second sub-query equivalent to the first sub-query, the second sub-query is used to query a first view encapsulating the target sub-sentence, and a query condition of the second sub-query is the relevant condition; and the expansion module is used for expanding the second sub-query.
Optionally, the generating module is further configured to: generating a third database query statement according to the first database query statement, wherein the third database query statement comprises the external query and a third sub-query equivalent to the first sub-query, and the third sub-query is used for querying a second view encapsulating the first sub-query statement; the native distributed database further comprises an extraction module for extracting the relevant conditions to the outside of the second view, resulting in the second sub-query.
Optionally, the native distributed database further comprises: the judging module is used for judging whether the first sub-query is an expandable sub-query or not; if the first sub-query is an unexpanded sub-query, the generating module is further configured to generate the second database query statement according to the first database query statement.
Optionally, the native distributed database further comprises: a rewriting module for rewriting the second sub-query into a semi-join algorithm; and the calculation module is used for executing the calculation of the second database query statement according to the semi-connection algorithm.
In a third aspect, an apparatus for processing a database query statement is provided, the apparatus comprising: a receiving module, configured to receive a first database query statement, where the first database query statement includes an external query and a first sub-query, a query condition of the first sub-query is a relevant condition of the external query, and the first sub-query has a target sub-sentence that cannot be expanded by the first sub-query; a generating module, configured to generate a second database query statement according to the first database query statement, where the second database query statement includes the external query and a second sub-query equivalent to the first sub-query, the second sub-query is used to query a first view encapsulating the target sub-sentence, and a query condition of the second sub-query is the relevant condition; and the expansion module is used for expanding the second sub-query.
Optionally, the generating module is further configured to: generating a third database query statement according to the first database query statement, wherein the third database query statement comprises the external query and a third sub-query equivalent to the first sub-query, and the third sub-query is used for querying a second view encapsulating the first sub-query statement; the device further comprises an extraction module for extracting the relevant condition to the outside of the second view, resulting in the second sub-query.
Optionally, the apparatus further comprises: the judging module is used for judging whether the first sub-query is an expandable sub-query or not; if the first sub-query is a non-expandable sub-query, the generating module is further configured to generate the second database query statement according to the first database query statement.
Optionally, the apparatus further comprises: a rewriting module for rewriting the second sub-query into a semi-join algorithm; and the calculation module is used for executing the calculation of the second database query statement according to the semi-connection algorithm.
In a fourth aspect, there is provided a computer readable storage medium storing a computer program which, when executed, implements the method of the first aspect.
In a fifth aspect, there is provided a computer program product comprising executable code which, when executed, is capable of implementing the method of the first aspect.
The embodiment of the disclosure can rewrite the related sub-queries which contain specific clauses (namely target clauses) and cannot be expanded into half-join by providing a new rewriting algorithm, so that an optimizer can select a more efficient execution mode to calculate the related sub-queries, the problem of query timeout execution when the data volume of a driving table is too large is avoided, and the query efficiency and the query performance of the related sub-queries are greatly improved.
Drawings
Fig. 1 is a flowchart illustrating a method for processing a database query statement according to an embodiment of the disclosure.
Fig. 2 is a flowchart illustrating an implementation manner of step S120 in fig. 1.
Fig. 3 is a flowchart illustrating a method for processing a database query statement according to another embodiment of the disclosure.
Fig. 4 is a schematic block diagram of a native distributed database of an embodiment of the present disclosure.
Fig. 5 is a schematic block diagram of an apparatus for processing a database query statement according to an embodiment of the present disclosure.
Fig. 6 is a schematic block diagram of an apparatus for processing a database query statement according to another embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure are clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments.
Structured Query Language (SQL) allows multiple levels of nested queries. A query is an operation that retrieves data from a table or view, the top level of statements are called queries, and the statements nested in a query are called subqueries. I.e., when one query is a condition of another query, is referred to as a sub-query. The sub-queries are divided into non-related sub-queries and related sub-queries. If a sub-query is independent, with no dependencies on external queries, it is referred to as an irrelevant sub-query. The non-related sub-queries are SQL statements that may be executed independently. The non-relevant sub-query is executed once, and the execution result is transmitted to the external query after the execution is finished.
Unlike the non-relevant sub-queries, the relevant sub-queries are sub-queries that depend on external query data. In other words, the related sub-queries depend on the result set of the external query, which is passed to the sub-query for execution once after executing a row.
In particular, a related sub-query indicates that there is a certain conditional association between two queries. The following explains the concept of correlation of related sub-queries according to the SQL statement in example (1).
Example (1):
“select count(*) from T1 where exists (select 1 from T2 where T1.C2=T2.C2 group by T2.C2 having sum(T2.C3)>0)”;
the SQL statement of example (1) contains an external query and a sub-query. Wherein, the external query statement is "select count (— from T1 corner exists ()", and the sub-query statement is "(select 1 from T2 corner T1.c2= T2.c2 group by T2.c2 haing sum (T2. c 3) > 0)". The sub-query includes the expression "where T1.C2= T2. C2", and obviously, the field C2 of the external query object T1 is referenced in the expression. I.e., the query results for the sub-query object T2, are dependent on the external query object T1. The sub-query is therefore a related sub-query.
The following describes the execution process of the related sub-query of the SQL statement of example (1):
in the first step, a tuple (i.e., a row) is fetched from the external query object table T1 and the value of the tuple associated column C2 is passed to the sub-query.
Second, the sub-query is executed with the value of column C2 of Table T1 as a condition, resulting in an execution result, and the execution result is passed to the external query.
Third, an external query is executed.
And fourthly, taking out the next tuple from the external query again, and circularly executing the first step to the third step until all the tuples in the external query object table T1 are completely processed.
As can be seen from the above processing, the execution of the related sub-queries is in a nested loop manner. That is, to execute a sub-query, a value needs to be obtained from the external query. That is, when the external lookup table T1 scans a row of data, the sub lookup table T2 needs to go all the way through once. Thus, the amount of data for which the associated sub-query performs the calculations corresponds to the number of rows in Table T1 multiplied by the number of rows in Table T2. Therefore, the query performance is very inefficient for this traditional nested loop execution. Especially when the data size of the driving table is very large, the query execution time is often not sufficient for the user.
In order to solve the problem of low execution efficiency of the related sub-queries, as an implementation manner, a database system usually rewrites the related sub-queries. For example, the sub-query is rewritten as semi-connected/connected.
A semi-connection is one type of database connection. For example, semi-joining Table T1 with Table T2 is such that when a record in Table T1 finds a match in Table T2, a record in Table T1 is returned and no record in Table T2 is returned. Even if multiple matching records are found in Table T2, the record is returned only once in data Table T1. Where data table T1 is referred to as the left node of the semi-join and data table T2 is referred to as the right node of the semi-join. The following example (2) is a half-connection.
Example (2):
select T1.C1,T1.C2 from T1 where T1.C1 in (select T2.C1 from T2);
after the related sub-queries are rewritten into the semi-join query mode, the optimizer can select a more efficient join algorithm to perform the calculation of the related sub-queries, so that the execution efficiency of the related sub-queries is greatly improved. As one example, the optimizer may select a more efficient algorithm from a hashing algorithm, a ranking algorithm, and a nested loop algorithm to execute the related sub-queries. The method of rewriting the related sub-queries into semi-joins is called promotion of the sub-queries. Obviously, the promotion of sub-queries can greatly improve the execution performance and query efficiency of related sub-queries.
However, not all related sub-queries can be converted into a semi-connected query approach. Query rewrite is essentially pattern matching, and can only be promoted if the pattern of the query matches the requirement of the sub-query to promote rewrite. Alternatively, a related sub-query can be rewritten to semi-join only if the sub-query in the related sub-query can be expanded and all related expressions in the sub-query can be extracted to the external query.
In the related sub-queries, statements of the external query are referenced so that expressions that the sub-queries are associated with the external query are related expressions. As an example, the expression "where T1.c2= T2.c 2" in example (1) is an expression that associates the sub-query object T2 with the external query object T1, and thus "where T1.c2= T2.c 2" is a relational expression for the related sub-query, which is also referred to as a relational condition.
With continued reference to example (1), a "group by T2. C2" statement is also included in the related sub-queries. It should be noted that the group by may be to group the results according to one or more columns, and the role is to divide a data set into several small areas by a certain rule, and then perform data processing on the several small areas. The where before the group by statement is to filter the condition before the group by is executed, and the haiving after the group by is to filter the result after the group by is executed, that is, the group by is grouped according to the corresponding limiting condition, so that the system only returns the result to the group meeting the condition.
Therefore, when the SQL statement contains a group by sub-statement, if the sub-query is expanded, the related expression is extracted to the external query, which causes the group by semantic to change and cannot be equivalent to the original SQL statement, therefore, the sub-query containing the group by sub-statement cannot be directly expanded, and the related sub-query cannot be rewritten to half-join.
Optionally, when the related sub-query includes one or more clauses of group by, haiving and windows function, and the related expression is in the where statement block, the related sub-query cannot be expanded.
For such complex related sub-queries, the above sub-query boosting method is not suitable for, and only can adopt the traditional nested loop manner, so that the problems of low execution performance and query efficiency still exist.
From the foregoing analysis, it can be seen that for a related sub-query that is complex like example (1), the sub-query cannot be expanded because it contains some specific clauses, and the related expression cannot extract an external query and cannot be rewritten into a semi-join. Thus, the present disclosure is directed to rewriting such related sub-queries containing specific clauses so that they can be expanded and rewritten as semi-joins.
In order to solve the above problem, the embodiments of the present disclosure provide a new rewrite algorithm, which can rewrite a sub-query into a semi-join when the related sub-query cannot be expanded, thereby improving the execution performance and query efficiency of such complex related sub-queries.
Fig. 1 is a schematic flow chart diagram of a method 100 for processing a database query statement according to an embodiment of the present disclosure. It should be understood that fig. 1 shows the steps or operations of the method 100, but these steps or operations are merely examples, or these steps may be performed in other orders. The method 100 may include steps S110 to S130, as described below.
Step S110, a first database query statement is received, where the first database query statement includes an external query and a first sub-query, a query condition of the first sub-query is a relevant condition of the external query, and the first sub-query has a target sub-sentence which makes the first sub-query unable to expand.
The first database query statement may be any SQL statement. The first database query statement includes an external query and a first sub-query, which may be a related sub-query or a non-related query. When the query condition in the first sub-query is the relevant condition of the external query, the first sub-query is the relevant sub-query.
As one embodiment, a determination is made as to whether the first sub-query is an expandable related sub-query. And if the first sub-query is an expandable related sub-query, directly expanding the first sub-query, and upgrading the first sub-query into half-join to realize the rewriting of the first sub-query.
If the first sub-query is a related sub-query that is not expandable, the following description in step S120 is continued.
As can be seen from the foregoing, when a first sub-query is a complex related sub-query containing a specific clause, the first sub-query cannot be expanded. In other words, when a first sub-query contains a target clause that makes it undeployable, the first sub-query cannot be directly rewritten by the sub-query.
As an example, when the target clause includes one or more of the following clauses: the first sub-query is a non-expandable sub-query when the GROUP BY sub-sentence, the HAVING sub-sentence, and the WINDOW FUNCTION sub-sentence are used.
Step S120, a second database query statement is generated according to the first database query statement, the second database query statement includes an external query and a second sub-query equivalent to the first sub-query, the second sub-query is used for querying the first view in which the target sub-sentence is encapsulated, and the query condition of the second sub-query is the relevant condition.
And when the first sub-query is a related sub-query which cannot be expanded, the first database query statement needs to be rewritten to generate a second database query statement. The second sub-query in the rewritten second database query statement is an expandable related sub-query and can be rewritten to a semi-join query style.
The second database query statement and the first database query statement are equivalent SQL statements, i.e. the second database query language is consistent with the first database query statement output result. Specifically, the second database query statement includes an external query and a second sub-query, and in essence, the second sub-query is generated by rewriting the first sub-query in step S110, and the second sub-query and the first sub-query are equivalent.
The second sub-query comprises a first view used for querying the packaged target clause, and the query condition of the second sub-query is the relevant condition of the first sub-query. It should be noted that a view can also be regarded as a table, and unlike a table, the view stores a select statement, and the table stores actual data after the select is executed. That is, when reading data in the view, the view internally executes the select statement and creates a temporary table.
The second sub-query essentially encapsulates the target clause of the first sub-query that affects its expansion in the first view, while the relevant conditions are outside the first view as query conditions for the second sub-query. In this way, the target clause is separated from the relevant conditions, and it is ensured that the second sub-query queries the first view through the query conditions without being concerned about the execution of the target clause. Therefore, the second sub-query meets the condition of expansion of the related sub-query, and the expansion of the sub-query can be carried out.
Fig. 2 is a schematic flow chart of one implementation of step S120. The illustrated method 200 may include steps S210 through S220. A detailed description will be given below with reference to fig. 2 of a specific implementation process of rewriting the first database into the second database in step S120.
Step S210, a third database query statement is generated according to the first database query statement, where the third database query statement includes the external query and a third sub-query equivalent to the first sub-query, and the third sub-query is used to query a second view encapsulating the first sub-query statement.
Rewriting the first database query statement into the second database query statement includes rewriting the first database query statement into a third database query statement. And the third database query statement, the first data query statement and the second database query statement are equivalent SQL statements.
And when the first sub-query comprises a target clause which influences the non-expandable state of the first sub-query, rewriting the first sub-query to generate a third sub-query, wherein the third sub-query and the first sub-query are equivalent. As an example, the third sub-query statement includes a second view encapsulating the target sub-sentence and the query condition in the first sub-query, in which case the third sub-query is equivalent to only projection computation. In other words, the third sub-query contains only the second view of the query that encapsulates the first sub-query statement. It should be noted that the projection calculation may include, for example, a table (or view) and a filter condition, but has no relationship with other tables.
As an example, target clauses such as from statement, where statement, and group by in the first sub-query may be packaged into the second view, resulting in the third sub-query. The third sub-query has only a from statement and a second view. This process amounts to pushing down target clauses in the first sub-query, such as from statement, where statement, and group by, into the second view. The rewritten third sub-query is a SPJ (simple project) -type query, and thus the rewriting process may also be referred to as split SPJ. The separation SPJ process realizes the separation of target clauses such as a from statement, a where statement and a group by from the first sub-query.
Step S220, extracting the relevant condition to the outside of the second view to obtain a second sub-query.
Because target clauses such as from clause, where clause and group by are encapsulated in the second view, related sub-queries need to be expanded, and related conditions in the second view need to be extracted and separated from the target clauses.
Therefore, the relevant condition (e.g. the where statement) needs to be extracted outside the second view, and after the relevant condition is extracted outside the second view, the second sub-query in step S120 is generated. At this time, the query condition of the second sub-query is the relevant condition.
And step S130, expanding the second sub-query.
As can be seen from the foregoing step S120, the relevant condition of the second sub-query is already extracted outside the first view, and is separated from the target clause, so that the expansion condition of the relevant sub-query is satisfied, and the expansion of the second sub-query can be performed. The sub-query expansion is that the optimizer does not regard the sub-query as an independent processing unit to be executed independently, but converts the sub-query into an equivalent connection relation between the sub-query and an external query, and the sub-query expansion does not need to be executed in a nested loop mode.
As an embodiment, the second sub-query is rewritten to a semi-join algorithm, and the calculation of the generated database query statement is performed according to the semi-join algorithm.
As another embodiment, before rewriting the first database query statement, it is first checked whether the relevant expression can be extracted outside the second view.
According to the steps, only after the related expression is extracted to the outside of the second view, the subsequent second sub-query can be expanded and rewritten into the semi-connection. Therefore, before rewriting the first database query statement, whether the related expression can be extracted outside the second view is judged, so that the execution cost can be saved, and the execution efficiency can be improved.
Specifically, whether the related expression can be extracted to the outside of the second view includes whether the related expression can be pulled through a target clause such as a group by, a hang, a windows function, and the like, that is, the related expression is separated from a sub-query containing the target clause such as the group by, the hang, the windows function, and the like.
Optionally, if the relevant condition can be pulled over the target clauses such as group by, having, and windows function, it is checked whether the first sub-query contains a limit statement block, and if the first sub-query contains a limit statement block, the next rewriting cannot be performed. If the first sub-query does not contain a limit statement block, rewriting may be performed as per step S120.
As an example, when a condition in the target clause statement is referenced by a present level condition in the related expression statement, it indicates that the related expression may be pulled up through the target clause, such as group by, having, windows function, etc. Referring to example (1) in the foregoing, in the related expression statement "where t1.c2= t2.c 2", the query condition of the present layer is t2.c2, and the condition in the target clause statement "group by t2.c 2" is t2.c 2. Obviously, the column C2 referenced by the related expression is a subset of the column of the target clause group by, and thus the related expression where in example (1) may be pulled up through the group by target clause. As another example, when the target clause sentence in example (1) is "group by t2. C1", the column C2 referenced in the related expression sentence "where t1.C2= t2. C2" is not in the subset of the target clause "group by t2. C1" column, the related expression "where t1.C2= t2. C1" cannot be pulled up through the target clause sentence "group by t2. C1".
The following describes a complete process of rewriting a database query statement according to the present disclosure with reference to fig. 3 and example (2). The method 300 shown in fig. 3 includes steps S310 to S360.
The SQL (1) statement of example (2) is as follows:
CREATE TABLE T1(C1 INT,C2 INT);
CREATE TABLE T2(C1 INT,C2 INT);
SELECT C1
FROM T1
WHERE C1 = ANY(
SELECT COUNT(*)
FROM T2
WHERE T1.C2 = T2.C2
GROUP BY T2.C2);
step S310, judging whether the related sub-query in the SQL sentence containing the complex related sub-query can be expanded.
If the related sub-query can be directly expanded, step S311 and step S360 are executed to directly complete the promotion of the related sub-query and the rewriting of the semi-connected SQL statement by the related sub-query.
If the related sub-query cannot be directly expanded, the process continues to step S320. As shown in the foregoing, whether the related sub-query in the SQL statement can be expanded can be determined by checking whether the related sub-query statement includes target sub-sentences such as group by, having, and windows function.
Referring to example (2), there is one related sub-query in the SQL (1) statement that contains the target sub-statement GROUP BY, and the first related expression is in the WHERE statement block: WHERE t1.c2= t2.c 2. Therefore, related sub-queries in SQL (1) statements cannot be directly expanded.
Step S320, check whether the related expression in the related sub-query can be pulled up.
If the related expression is not pullable, the related sub-query cannot be further rewritten, and step S321 is executed, that is, the query is executed in a conventional nested loop manner.
If the dependent expression can be pulled up, a further overwriting operation is performed.
Since the column C2 of T2 referred to BY the relational expression "WHERE T1.C2= T2. C2" of SQL (1) in example (2) is a subset of columns in the condition of the target clause "GROUP BY T2. C2", satisfying the pull-up condition of the relational expression, the rewrite operation can be performed.
Step S330, the related sub-queries are subjected to separation SPJ operation.
The SPJ splitting operation is performed on the related sub-queries in the SQL (1) statement, that is, the related sub-queries are packaged as the view V1, so that the related sub-query statement only contains projection calculation. The SQL (2) statement obtained after SPJ separation is performed on SQL (1) in example (2) is as follows:
CREATE TABLE T1(C1 INT,C2 INT);
CREATE TABLE T2(C1 INT,C2 INT);
SELECT C1
FROM T1
WHERE C1 = ANY(
SELECT CNT
FROM(
SELECT COUNT(*) CNT
FROM T2
WHERE T1.C2 = T2.C2
GROUP BY T2.C2)V1);
the rewritten SQL (2) statement shows that the rewriting process encapsulates the "FROM T2 WHERE T1.c2= T2.c2 GROUP BY T2.c 2" in the related sub-query in the view V1, and the sub-query statement in SQL (1) is equivalent to the sub-query statement in SQL (2) being "SELECT CNT FROM V1". When executing a sub-query of SQL (2), the system does not care how inside view V1 is executed. Thus, for the rewritten SQL (2) statement, the sub-query includes the projection calculation FROM V1.
Step S340, pull up the correlation expression.
It has been determined from step S320 that example (2) satisfies the pull-up condition of the relational expression, and thus the relational expression in SQL (2) can be directly pulled up. That is, the related expression WHERE in the related sub-query in the SQL (2) statement is extracted outside of view V1. The SQL (3) statement is obtained after the expression is pulled up as follows:
CREATE TABLE T1(C1 INT,C2 INT);
CREATE TABLE T2(C1 INT,C2 INT);
SELECT C1
FROM T1
WHERE C1 = ANY(
SELECT CNT
FROM(
SELECT T2.C2,COUNT(*) CNT
FROM T2
GROUP BY T2.C2)V1
WHERE T1.C2 = T2.C2);
in step S350, the related sub-queries are expanded.
In step S350, the relational expression "WHERE t1.c2= t2.c 2" in the related sub-query of SQL (3) has been extracted to the outside of the view V1, and thus expansion of the related sub-query and rewriting of the half join can be performed.
Step S360, rewriting the related sub-query into a semi-connected SQL statement.
The SQL (3) sub-query is rewritten into a semi-join SQL (4) statement as follows:
SELECT T1.C1
FROM T1 SEMI JOIN(
SELECT T2.C2,COUNT(*) CNT
FROM T2
GROUP BY T2.C2)V1
ON T1.C1=V1.CNT AND T1.C2=V1.C2;
alternatively, the syntax "T1 SEMI JOIN V1" in the rewrite process of step S360 is not a flag SQL syntax, the query cannot be executed directly in the database system, the sub-query can only be described as being rewritten to a SEMI-JOIN with a pseudo syntax, and the optimizer will compute the relevant sub-query in a SEMI-JOIN manner.
In summary, the method for processing a query statement of a database according to the embodiment of the present disclosure rewrites a complex related sub-query into a semi-join by encapsulating a target sub-sentence that affects expansion of the related sub-query and extracting related conditions, thereby allowing an optimizer to select multiple execution modes to calculate the sub-query, such as selecting a plan with the minimum execution time to execute, avoiding a problem that the execution time of the query cannot meet a user requirement when the data amount of a driving table of the query is too large, and greatly improving the execution performance and the query efficiency of the related sub-query.
Method embodiments of the present disclosure are described in detail above in conjunction with fig. 1-3. The following describes embodiments of the apparatus provided by the present disclosure with reference to fig. 4 to 6. It is to be understood that the description of the apparatus embodiments corresponds to the description of the method embodiments, and therefore reference may be made to the preceding method embodiments for parts not described in detail.
In some scenarios, the method of the embodiment of the present disclosure may be applied to a native distributed database, where the native distributed database may be an autonomously developed distributed database, and the native distributed database is not obtained by performing secondary development or encapsulation on an existing distributed database. It should be noted that the solution of the embodiment of the present disclosure may also be applied to other databases, and the embodiment of the present disclosure does not limit this. A native distributed database to which the method of embodiments of the present disclosure applies is described below in conjunction with fig. 4.
Fig. 4 is a schematic diagram of a native distributed database of an embodiment of the present disclosure. The native distributed database 400 in fig. 4 may include a receiving module 410, a generating module 420, and an expanding module 430.
A receiving module 410, configured to receive a first database query statement, where the first database query statement includes an external query and a first sub-query, a query condition of the first sub-query is a relevant condition of the external query, and the first sub-query has a target clause that makes the first sub-query unable to be expanded;
a generating module 420, configured to generate a second database query statement according to the first database query statement, where the second database query statement includes an external query and a second sub-query that is equivalent to the first sub-query, the second sub-query is used to query the first view in which the target sub-sentence is encapsulated, and a query condition of the second sub-query is a relevant condition;
an expansion module 430 for expanding the second sub-query.
Optionally, the generating module 420 is further configured to generate a third database query statement according to the first database query statement, where the third database query statement includes the external query and a third sub-query equivalent to the first sub-query, and the third sub-query is used for querying a second view encapsulating the first sub-query statement; the native distributed database 400 further comprises an extraction module for extracting relevant conditions outside the second view, resulting in a second sub-query.
Optionally, the native distributed database 400 further comprises: the judging module is used for judging whether the first sub-query is an expandable sub-query or not; the generating module 420 is further configured to generate a second database query statement according to the first database query statement if the first sub-query is a non-expandable sub-query.
Optionally, the expansion module 430 in the native distributed database 400 is further configured to: rewriting the second sub-query into a semi-join algorithm; and the calculation module is used for executing the calculation of the second database query statement according to the semi-connection algorithm.
Fig. 5 is a schematic structural diagram of an apparatus for processing data according to an embodiment of the disclosure. The apparatus 500 may include a receiving module 510, a generating module 520, and an unfolding module 530.
A receiving module 510, configured to receive a first database query statement, where the first database query statement includes an external query and a first sub-query, a query condition of the first sub-query is a relevant condition of the external query, and the first sub-query has a target sub-sentence that cannot be expanded by the first sub-query;
a generating module 520, configured to generate a second database query statement according to the first database query statement, where the second database query statement includes an external query and a second sub-query equivalent to the first sub-query, the second sub-query is used to query the first view in which the target sub-sentence is encapsulated, and a query condition of the second sub-query is a relevant condition;
an expansion module 530 for expanding the second sub-query.
Optionally, the generating module is further configured to: generating a third database query statement according to the first database query statement, wherein the third database query statement comprises the external query and a third sub-query equivalent to the first sub-query, and the third sub-query is used for querying a second view encapsulating the first sub-query statement; the device further comprises an extraction module for extracting the relevant condition to the outside of the second view, resulting in the second sub-query.
Optionally, the apparatus further comprises: the judging module is used for judging whether the first sub-query is an expandable sub-query or not; if the first sub-query is an unexpanded sub-query, the generating module is further configured to generate the second database query statement according to the first database query statement.
Optionally, the apparatus further comprises: a rewriting module for rewriting the second sub-query into a semi-join algorithm; and the calculation module is used for executing the calculation of the second database query statement according to the semi-connection algorithm.
Fig. 6 is a schematic structural diagram of an apparatus for processing data according to another embodiment of the present disclosure. The apparatus 600 shown in fig. 6 may be a server, a user terminal, or a portable device. The apparatus 600 may include a memory 610 and a processor 620.
The memory 610 may be used to store executable code. The processor 620 may be configured to execute executable code stored in the memory 610 to implement the steps of the various methods described previously.
In some embodiments, the apparatus 600 may further include a network interface 630, and the data exchange between the processor 620 and the external device may be implemented through the network interface 630.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the disclosure are, in whole or in part, generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., Digital Video Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (8)

1. A method of processing a database query statement, comprising:
receiving a first database query statement, wherein the first database query statement comprises an external query and a first sub-query, the query condition of the first sub-query is a relevant condition of the external query, and the first sub-query has a target sub-sentence which enables the first sub-query not to be expanded;
generating a second database query statement according to the first database query statement, wherein the second database query statement comprises the external query and a second sub-query equivalent to the first sub-query, the second sub-query is used for querying a first view encapsulating the target sub-sentence, and a query condition of the second sub-query is the relevant condition;
expanding the second sub-query;
wherein generating a second database query statement from the first database query statement comprises: generating a third database query statement according to the first database query statement, wherein the third database query statement comprises the external query and a third sub-query equivalent to the first sub-query, and the third sub-query is used for querying a second view encapsulating the first sub-query statement; extracting the relevant condition to the outside of the second view to obtain the second sub-query;
the first sub-query includes at least one of the following target clauses: from statements, where statements, and group by statements.
2. The method of claim 1, the generating a second database query statement from the first database query statement, comprising:
judging whether the first sub-query is an expandable sub-query;
and if the first sub-query is a non-expandable sub-query, generating the second database query statement according to the first database query statement.
3. The method of claim 1, the target clause including one or more of the following clauses: GROUP BY clause, HAVING clause, WINDOW FUNCTION clause.
4. The method of claim 1, the sub-query expansion of the second sub-query, comprising:
rewriting the second sub-query to a semi-join algorithm;
and executing the calculation of the second database query statement according to the semi-join algorithm.
5. An apparatus for processing a database query statement, comprising:
a receiving module, configured to receive a first database query statement, where the first database query statement includes an external query and a first sub-query, a query condition of the first sub-query is a relevant condition of the external query, and the first sub-query has a target sub-sentence that cannot be expanded by the first sub-query;
a generating module, configured to generate a second database query statement according to the first database query statement, where the second database query statement includes the external query and a second sub-query equivalent to the first sub-query, the second sub-query is used to query a first view encapsulating the target sub-sentence, and a query condition of the second sub-query is the relevant condition;
an expansion module for expanding the second sub-query;
wherein generating a second database query statement from the first database query statement comprises: generating a third database query statement according to the first database query statement, wherein the third database query statement comprises the external query and a third sub-query equivalent to the first sub-query, and the third sub-query is used for querying a second view encapsulating the first sub-query statement; extracting the relevant condition to the outside of the second view to obtain the second sub-query;
the first sub-query includes at least one of the following target clauses: from statements, where statements, and group by statements.
6. The apparatus of claim 5, the apparatus further comprising:
the judging module is used for judging whether the first sub-query is an expandable sub-query or not;
if the first sub-query is an unexpanded sub-query, the generating module is further configured to generate the second database query statement according to the first database query statement.
7. The apparatus of claim 5, the apparatus further comprising:
a rewriting module for rewriting the second sub-query into a semi-join algorithm;
and the calculation module is used for executing the calculation of the second database query statement according to the semi-connection algorithm.
8. A computer readable storage medium having stored thereon executable code which when executed is capable of implementing the method of any one of claims 1-4.
CN202210308440.8A 2022-03-28 2022-03-28 Method and device for processing database query statement and native distributed database Active CN114416784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210308440.8A CN114416784B (en) 2022-03-28 2022-03-28 Method and device for processing database query statement and native distributed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210308440.8A CN114416784B (en) 2022-03-28 2022-03-28 Method and device for processing database query statement and native distributed database

Publications (2)

Publication Number Publication Date
CN114416784A CN114416784A (en) 2022-04-29
CN114416784B true CN114416784B (en) 2022-07-08

Family

ID=81262898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210308440.8A Active CN114416784B (en) 2022-03-28 2022-03-28 Method and device for processing database query statement and native distributed database

Country Status (1)

Country Link
CN (1) CN114416784B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2427202A1 (en) * 2003-04-30 2004-10-30 Ibm Canada Limited - Ibm Canada Limitee Method and system for aggregation subquery join elimination
CN104036007B (en) * 2014-06-23 2017-12-12 北京京东尚科信息技术有限公司 A kind of distributed networks database query method and device
US10129256B2 (en) * 2015-01-08 2018-11-13 BlueTalon, Inc. Distributed storage and distributed processing query statement reconstruction in accordance with a policy
CN109241093B (en) * 2017-06-30 2021-06-08 华为技术有限公司 Data query method, related device and database system
CN113010539A (en) * 2019-12-20 2021-06-22 北京奇艺世纪科技有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN114416784A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
US10970270B2 (en) Unified data organization for multi-model distributed databases
US11681702B2 (en) Conversion of model views into relational models
US7895187B2 (en) Hybrid evaluation of expressions in DBMS
US8204903B2 (en) Expressing and executing semantic queries within a relational database
US9471711B2 (en) Schema-less access to stored data
US7233944B2 (en) Determining query cost based on subquery filtering factor
US7246108B2 (en) Reusing optimized query blocks in query processing
JP3297403B2 (en) Method and apparatus for query optimization
US20170147644A1 (en) Query optimization using join cardinality
US9760571B1 (en) Tabular DB interface for unstructured data
US20060041537A1 (en) Selecting candidate queries
US20160063063A1 (en) Database query system
US20200265028A1 (en) Method and systems for mapping object oriented/functional languages to database languages
US6748388B1 (en) Method and mechanism for storing and managing self-descriptive heterogeneous data
CN112970011A (en) Recording pedigrees in query optimization
EP3293645B1 (en) Iterative evaluation of data through simd processor registers
EP3293644B1 (en) Loading data for iterative evaluation through simd registers
CN114416784B (en) Method and device for processing database query statement and native distributed database
CN107622070B (en) Database management method and device
KR102646588B1 (en) System and method for generating customized knowledge graph
US20070299837A1 (en) Data Processing Method and System Based on Networked Relational Dimension
WO2016197924A1 (en) Data preprocessing method and device
WO2024082881A2 (en) Database query method and apparatus
US20220075778A1 (en) Transforming operations of a computer program for execution at a database
US20140280308A1 (en) Flexible Column Selection in Relational Databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant