WO2024082881A2 - Database query method and apparatus - Google Patents

Database query method and apparatus Download PDF

Info

Publication number
WO2024082881A2
WO2024082881A2 PCT/CN2023/118705 CN2023118705W WO2024082881A2 WO 2024082881 A2 WO2024082881 A2 WO 2024082881A2 CN 2023118705 W CN2023118705 W CN 2023118705W WO 2024082881 A2 WO2024082881 A2 WO 2024082881A2
Authority
WO
WIPO (PCT)
Prior art keywords
database query
query statement
query
database
predicate
Prior art date
Application number
PCT/CN2023/118705
Other languages
French (fr)
Chinese (zh)
Inventor
王国平
朱涛
赵占越
Original Assignee
北京奥星贝斯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奥星贝斯科技有限公司 filed Critical 北京奥星贝斯科技有限公司
Publication of WO2024082881A2 publication Critical patent/WO2024082881A2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2445Data retrieval commands; View definitions

Definitions

  • the present disclosure relates to the field of databases, and more specifically, to a database query method and device.
  • a database query statement may contain an OR predicate.
  • the database query statement containing the OR predicate is usually rewritten to expand the OR predicate.
  • the related technology expands the OR predicate based on Union All. However, in certain scenarios, after the OR predicate is expanded based on Union All, the execution performance of the database query statement is still poor.
  • the present disclosure provides a database query method and device to improve the execution performance of database query statements.
  • a database query method comprising: receiving a first database query statement, the first database query statement comprising a first query branch and a second query branch, the first query branch and the second query branch being connected by an OR predicate; expanding the OR predicate in the first database query statement to obtain a second database query statement, the second database query statement comprising a third database query statement and a fourth database query statement, the third database query statement being used to execute the first query branch, the fourth database query statement being used to execute the second query branch, and the third database query statement and the fourth database query statement being connected based on UNION DISTINCT; and querying data in a database according to the second database query statement.
  • the method before expanding the OR predicate in the first database query statement, the method also includes: checking whether the query branches in the first database query statement can output a unique column set; expanding the OR predicate in the first database query statement includes: if all query branches in the first database query statement can output a unique column set, expanding the OR predicate in the first database query statement.
  • the method before expanding the OR predicate in the first database query statement, the method further includes: performing SPJ separation on the query branches in the first database query statement.
  • the predicate is a subquery predicate.
  • the database query statement is a SQL statement.
  • a database query device including: a receiving module, used to receive a first database query statement, the first database query statement includes a first query branch and a second query branch, the first query branch and the second query branch are connected by an OR predicate; an expansion module, used to expand the OR predicate in the first database query statement to obtain a second database query statement, the second database query statement includes a third database query statement and a fourth database query statement, the third database query statement is used to execute the first query branch, the fourth database query statement is used to execute the second query branch, and the third database query statement and the fourth database query statement are connected based on UNION DISTINCT; a query module, used to query data in a database according to the second database query statement.
  • the device also includes: a checking module, used to check whether the query branches in the first database query statement can output a unique column set before expanding the OR predicate in the first database query statement; the expansion module is used to: if the query branches in the first database query statement can all output a unique column set, then expand the OR predicate in the first database query statement.
  • a checking module used to check whether the query branches in the first database query statement can output a unique column set before expanding the OR predicate in the first database query statement
  • the expansion module is used to: if the query branches in the first database query statement can all output a unique column set, then expand the OR predicate in the first database query statement.
  • the device further includes: a separation module, configured to perform SPJ separation on query branches in the first database query statement before expanding the OR predicate in the first database query statement.
  • a separation module configured to perform SPJ separation on query branches in the first database query statement before expanding the OR predicate in the first database query statement.
  • the predicate in the first query branch and/or the second query branch is a sub-query predicate.
  • the database query statement is a SQL statement.
  • a database query device comprising: a memory for storing instructions; and a processor for executing the instructions stored in the memory to execute the method as described in the first aspect or any possible implementation method of the first aspect.
  • a computer-readable storage medium on which instructions for executing the method described in the first aspect or any possible implementation of the first aspect are stored.
  • a computer program product comprising instructions for executing the method described in the first aspect or any possible implementation manner of the first aspect.
  • the LLNVL predicate When expanding the OR predicate based on Union All, it is necessary to use the LLNVL predicate to perform deduplication operations (i.e., remove duplicate data in the query results of multiple query branches of the OR predicate). If the query branch connected by the OR predicate contains a complex predicate (subquery predicate), it will cause the LLNVL predicate to contain a subquery (i.e., the parameter of the LLNVL predicate is a subquery). Since the LLNVL predicate cannot perform subquery promotion (the execution efficiency of the subquery is very low, and it is usually hoped to convert the subquery and the outer query into a connection operation of two database tables to avoid executing the subquery.
  • deduplication operations i.e., remove duplicate data in the query results of multiple query branches of the OR predicate.
  • the embodiment of the present disclosure implements OR predicate expansion based on UNION DISTINCT. Since UNION DISTINCT itself has a deduplication function, there is no need to introduce the LLNVL predicate, thus avoiding the above problem. In other words, even if the query branches connected by the OR predicate contain complex predicates (sub-query predicates), the embodiments of the present disclosure can still well improve the execution efficiency of the database query statement.
  • FIG1 is a schematic diagram of the structure of a database query method provided by an embodiment of the present disclosure.
  • FIG. 2 is a schematic diagram of the structure of a database query method provided by another embodiment of the present disclosure.
  • FIG3 is a schematic diagram of the structure of a database query device provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of the structure of a database query device provided by another embodiment of the present disclosure.
  • database systems such as OceanBase database
  • database systems have their own supported database query statements, so that users of the database system can easily access the database system (such as querying, adding, deleting, etc. data in the database system).
  • the database query language mentioned here can be, for example, structured query language (SQL).
  • the execution performance of database query statements may be different if the expressions of database query statements are different. Therefore, in order to improve the execution performance of database query statements, the database system can query the received database query statements.
  • Query rewriting query rewriting of database query statements is usually an equivalence rewriting, that is, the query rewriting does not change the query results of the database query statement).
  • OR predicate expansion is a common query rewriting method. Taking the database query statement as a SELECT statement as an example, OR predicate expansion refers to splitting a SELECT statement with an OR predicate into multiple SELECT statements. Each of the multiple SELECT statements can be used to execute a query branch of the OR predicate. OR predicate expansion splits the OR predicate into each SELECT statement, so that the split SELECT statements may trigger more optimization space (that is, the split SELECT statements may be executed with an execution algorithm with better execution performance), thereby achieving the purpose of improving the query performance of database statements.
  • Nested loop join is a general but inefficient join algorithm. Specifically, the nested loop join consists of two FOR loops, hence the name. Assuming that the two tables to be joined are T1 and T2, and the join condition is P, the nested loop join selects one as the outer loop and the other as the inner loop, and compares each tuple in table T1 and table T2 one by one to find all matching tuples.
  • Nested loop join is relatively general. Like linear scan, nested loop join does not require indexes, and the algorithm is applicable to any join condition. In other words, for any type of join operation, the algorithm only needs to be slightly adjusted to perform the operation. However, the execution performance of nested loop join is relatively poor.
  • the database query statement Q1 can be rewritten, that is, the OR predicate in the database query statement Q1 can be expanded.
  • the database query statement Q1 can be rewritten as The database query statement Q2 is as follows:
  • the OR predicate is split into multiple simple SELECT statements.
  • merge join the merge join algorithm is also called sort merge join, which can be used to calculate natural joins and equivalue joins. Assuming that the two tables to be connected are T1 and T2, make tables T1 and T2 sorted before connecting. Then scan the two tables separately to complete.
  • Merge joins are very efficient to execute, and their time complexity is linear O(n), where n is the number of tuples in the table with the largest number of tuples in tables T1 and T2) and hash joins (HASH JOIN, hash joins are similar to merge joins.
  • hash join algorithm hash functions are used to partition two tables. The basic idea of this algorithm is to divide the two tables into sets of tuples with the same hash value according to the connection attributes. Hash joins do not require indexes, and compared with nested loop joins, hash joins are easier to handle large result sets) to execute, thereby improving the execution performance of database statements.
  • the "Union All”-based OR predicate expansion method is to use the "Union All” operator to organize multiple database query statements (each database query statement is used to execute a query branch of the OR predicate) to calculate the union of the query structures of the multiple database query statements. If the query results of the multiple database query statements contain multiple duplicate data, the "Union All" operator will return the multiple duplicate data.
  • the introduction of the LLNVL predicate will lead to suboptimal performance of database query statements in certain specific scenarios.
  • the specific scenario mentioned here may be, for example, a scenario where the query branch of the OR predicate contains complex predicates, such as a subquery predicate (that is, the predicate parameter contains a subquery).
  • the database query statement Q3 below contains an OR predicate, which contains two query branches, and each query branch contains a complex predicate (that is, the EXISTS predicate with a select subquery as a parameter below):
  • the predicate EXISTS of the first SELECT statement in the database query statement Q4 contains a subquery, it can be converted into a join (JOIN) through subquery lifting (subquery lifting is also a query rewriting method, which aims to express a subquery in the form of multi-table joins and merge it into the main query to improve query performance), so that more efficient join algorithms can be used to optimize query performance.
  • the EXISTS of the second SELECT statement in the database query statement Q4 can also be converted into a join (JOIN) through subquery lifting, so that more efficient join algorithms can be used to optimize query performance.
  • JOIN join
  • none of the database systems support the lifting operation of subqueries in LLNVL.
  • the predicate LLNVL can only be executed in a manner similar to Nested Loop Join, and the execution performance is low. It can be seen that when the query branch of the OR predicate contains complex predicates (such as subquery predicates), even if the OR predicate is expanded, the query performance of the database query statement cannot be greatly improved.
  • the disclosed embodiment proposes an OR predicate expansion method based on "UNION DISTINCT". Since "UNION DISTINCT” itself has the function of data deduplication, there is no need to use the LLNVL predicate, which avoids the calculation of such complex predicates, thereby improving the execution performance of database query statements.
  • the OR predicate in Q3 can be expanded based on "UNION DISTINCT" to convert the database query statement Q3 into the following database query statement Q5:
  • database query statement Q5 only contains the EXISTS predicate that can perform subquery lifting, and does not contain the LLNVL predicate that cannot perform subquery lifting. Therefore, database query statement Q5 does not need to execute Nested Loop Join, and thus has higher execution performance.
  • the following is a more detailed example of the database query method provided by the embodiment of the present disclosure in conjunction with Figure 1.
  • the method 100 of Figure 1 can be executed by a database system, specifically, by an optimizer in the database system.
  • the embodiment of the present disclosure does not specifically limit the type of the database system, for example, it can be a distributed database.
  • a first database query statement is received.
  • the first database query statement may also be an original database query statement, such as a database query statement input by a user of a database system, or a database query statement after rewriting the original database query statement.
  • the first database query statement may be, for example, an SQL statement.
  • the first database query statement may include a first query branch and a second query branch, and the first query branch and the second query branch are connected by an OR predicate. Therefore, in order to improve the execution performance of the first database query statement, it may be considered to perform an OR predicate expansion on the first database query statement.
  • the predicate in the first query branch can be a complex predicate.
  • the predicate in the first query branch can be a subquery predicate, that is, the parameter of the predicate includes a subquery.
  • the predicate in the second query branch can be a complex predicate.
  • the predicate in the second query branch can be a subquery predicate, that is, the parameter of the predicate includes a subquery.
  • step S120 the OR predicate in the first database query statement is expanded to obtain a second database query statement.
  • the second database query statement may be equivalent to the first database query statement, that is, the first database query statement and the second database query statement may correspond to the same query result.
  • the second database query statement may include a third database query statement and a fourth database query statement.
  • the third database query statement may be used to execute the first query branch of the OR predicate.
  • the fourth database query statement may be used to execute the second query branch of the OR predicate.
  • the third database query statement and the fourth database query statement are based on UNION DISTINCT. That is, the UNION DISTINCT operator can be used to calculate the union of the query results of the third database query statement and the fourth database query statement. Since the UNION DISTINCT operator itself has a deduplication function, there is no need to add a predicate specifically for deduplication, such as the LLNVL predicate mentioned above.
  • the query results of the third database query statement and the fourth database query statement may include a unique column set (or, the third database query statement and the fourth database query statement both project a unique column set).
  • the so-called unique column set refers to a column set that can uniquely identify the rows queried by the third database query statement and the fourth database query statement. Taking the example that the third database query statement and the fourth database query statement both query a database table, the unique column set may be the primary key of the database table.
  • UNION DISTINCT can perform a deduplication operation based on the unique column sets projected by the third database query statement and the fourth database query statement.
  • the unique column set can be deleted from the query results.
  • step S130 the data in the database is queried according to the second database query statement.
  • an execution plan corresponding to the second database query statement may be generated, and then the database query task is executed based on the execution plan, and the query result corresponding to the second database query statement is returned.
  • the embodiment corresponding to Figure 1 performs OR predicate expansion based on UNION DISTINCT. Since UNION DISTINCT itself has the function of data deduplication, there is no need to use the predicate LLNVL, thus avoiding the calculation of such complex predicates, thereby improving the execution performance of the database query statement after the OR predicate expansion.
  • the method of FIG. 1 may further include: checking the first database query statement to determine whether the first database query statement satisfies the expansion condition. For example, it may be checked whether the query branch in the first database query statement can output a unique column set. If the query branch in the first database query statement cannot output a unique column set, deduplication cannot be performed based on UNION DISTINCT (because UNION DISTINCT needs to implement deduplication based on a unique column set, which is the main difference between UNION DISTINCT and UNION ALL). In this case, OR predicate expansion may not be performed, or OR predicate expansion may be performed in a traditional UNION ALL-based manner.
  • OR predicate expansion can be omitted, or the traditional UNION ALL-based method can be used to expand the OR predicate.
  • the OR predicate in the first database query statement is expanded in UNION DISTINCT.
  • the method of FIG. 1 may further include: performing SPJ (Selection-Projection-Join) separation on the query branches in the first database query statement.
  • SPJ query refers to a query that only includes Selection, Projection, and Join.
  • SPJ separation refers to converting the query branches into queries that only include Selection, Projection, and Join.
  • each query branch in the first database query statement is an SPJ query. If a query branch itself is already an SPJ query, then there is no need to perform SPJ separation on the query branch; if a query branch is not an SPJ query (for example, if a query branch contains a Group-by clause, then the query branch is not an SPJ query), then it is necessary to perform SPJ separation on the query branch.
  • step S210 it is checked whether the database query statement can be expanded with an OR predicate. For example, it can be checked whether the OR predicate satisfies the expansion condition, whether the query branch can output a unique column set, and so on.
  • the database query statement is Q6 shown below:
  • the query branch where the OR predicate in the database query statement Q6 is located has two tables: t1 and t4. As can be seen from the creation process, both t1 and t4 have primary keys, so (t1.c1, t4.c1) can be used as the output unique column set, and the value of this set can be used to identify the only row in the table.
  • step S220 SPJ separation is performed on the query branch in the database query statement. If the query branch itself is an SPJ query, then no separation is required, otherwise SPJ separation is required.
  • Q6 includes a Group-by clause, which is not an SPJ query. Therefore, the query branch can be SPJ separated to convert Q6 into Q7 shown below, where temp is the SPJ query branch after separation.
  • step S230 the separated SPJ branches are expanded with an OR predicate.
  • the OR condition is split into multiple simple conditions, and corresponding SELECT statements are created.
  • a corresponding output unique column set is added to each SELECT statement, and finally these queries are combined in the form of UNION DISTINCT.
  • Q8 below is the result of OR predicate expansion on the temp table in Q7, where t1.c1 and t4.c1 are the unique column sets added in the query branch.
  • step S240 the expanded UNION DISTINCT query is encapsulated, and the columns that need to be projected are projected.
  • Q9 below shows a query encapsulation of Q8.
  • This example proposes a method for expanding OR predicates based on Union Distinc. Different from the method for expanding OR predicates based on Union All, the method for expanding OR predicates based on Union Distinc does not need to add LLNVL predicates to avoid duplicate data. For OR predicates that include complex predicates (such as subquery predicates), since the method provided in this example can avoid adding LLNVL predicates, it can reduce the calculation of complex predicates and increase the optimization space for expanding OR predicates, thereby achieving the purpose of improving the execution performance of database query statements.
  • complex predicates such as subquery predicates
  • FIG3 is a schematic diagram of a database query device according to an embodiment of the present disclosure.
  • the database query device 300 shown in FIG3 includes a receiving module 310 , an expansion module 320 and a query module 330 .
  • the receiving module 310 may be configured to receive a first database query statement, wherein the first database query statement includes a first query branch and a second query branch, wherein the first query branch and the second query branch are connected by an OR predicate.
  • the expansion module 320 can be used to expand the OR predicate in the first database query statement to obtain a second database query statement, wherein the second database query statement includes a third database query statement and a fourth database query statement, wherein the third database query statement is used to execute the first query branch, and the fourth database query statement
  • the method is used to execute the second query branch, and the third database query statement and the fourth database query statement are connected based on UNION DISTINCT.
  • the query module 330 may be used to query data in the database according to the second database query statement.
  • the device 300 may also include: a checking module, used to check whether the query branches in the first database query statement can output a unique column set before expanding the OR predicate in the first database query statement; the expansion module 320 can be used to expand the OR predicate in the first database query statement if all query branches in the first database query statement can output a unique column set.
  • a checking module used to check whether the query branches in the first database query statement can output a unique column set before expanding the OR predicate in the first database query statement
  • the expansion module 320 can be used to expand the OR predicate in the first database query statement if all query branches in the first database query statement can output a unique column set.
  • the apparatus 300 may further include: a separation module, configured to perform SPJ separation on the query branches in the first database query statement before expanding the OR predicate in the first database query statement.
  • a separation module configured to perform SPJ separation on the query branches in the first database query statement before expanding the OR predicate in the first database query statement.
  • the predicate in the first query branch and/or the second query branch is a sub-query predicate.
  • the database query statement is a SQL statement.
  • FIG4 is a schematic diagram of the structure of a database query device provided by another embodiment of the present disclosure.
  • the database query device 400 described in FIG4 may include a memory 410 and a processor 420, and the memory 410 may be used to store instructions.
  • the processor 420 may be used to execute the instructions stored in the memory 410 to implement the steps in the various methods described above.
  • the device 400 may also include a network interface 430, and data exchange between the processor 420 and an external device may be implemented through the network interface 430.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions can be transmitted from one website site, computer, server or data center to another website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a medium containing one or more available media.
  • the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (DVD)), or a semiconductor medium (e.g., a solid state disk (SSD)).
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are a database query method and apparatus. The database query method comprises: receiving a first database query statement, wherein the first database query statement comprises a first query branch and a second query branch, and the first query branch and the second query branch are joined by means of a predicate OR; expanding the predicate OR in the first database query statement, so as to obtain a second database query statement, wherein the second database query statement comprises a third database query statement and a fourth database query statement, the third database query statement is used for executing the first query branch, the fourth database query statement is used for executing the second query branch, and the third database query statement and the fourth database query statement are connected on the basis of UNION DISTINCT; and querying data in a database according to the second database query statement.

Description

数据库查询方法和装置Database query method and device 技术领域Technical Field
本公开涉及数据库领域,并且更为具体地,涉及一种数据库查询方法和装置。The present disclosure relates to the field of databases, and more specifically, to a database query method and device.
背景技术Background technique
数据库查询语句可能包含OR谓词。为了提高数据库查询语句的执行性能,通常会对包含OR谓词的数据库查询语句进行查询改写,以将OR谓词展开。A database query statement may contain an OR predicate. In order to improve the execution performance of the database query statement, the database query statement containing the OR predicate is usually rewritten to expand the OR predicate.
相关技术基于Union All对OR谓词进行展开。但在特定场景下,基于Union All对OR谓词进行展开之后,数据库查询语句的执行性能仍然较差。The related technology expands the OR predicate based on Union All. However, in certain scenarios, after the OR predicate is expanded based on Union All, the execution performance of the database query statement is still poor.
发明内容Summary of the invention
本公开提供一种数据库查询方法和装置,以提升数据库查询语句的执行性能。The present disclosure provides a database query method and device to improve the execution performance of database query statements.
第一方面,提供一种数据库查询方法,包括:接收第一数据库查询语句,所述第一数据库查询语句包括第一查询分支和第二查询分支,所述第一查询分支和所述第二查询分支通过OR谓词连接;将所述第一数据库查询语句中的所述OR谓词展开,得到第二数据库查询语句,所述第二数据库查询语句包括第三数据库查询语句和第四数据库查询语句,所述第三数据库查询语句用于执行所述第一查询分支,所述第四数据库查询语句用于执行所述第二查询分支,且所述第三数据库查询语句和所述第四数据库查询语句基于UNION DISTINCT相连;根据所述第二数据库查询语句查询数据库中的数据。In a first aspect, a database query method is provided, comprising: receiving a first database query statement, the first database query statement comprising a first query branch and a second query branch, the first query branch and the second query branch being connected by an OR predicate; expanding the OR predicate in the first database query statement to obtain a second database query statement, the second database query statement comprising a third database query statement and a fourth database query statement, the third database query statement being used to execute the first query branch, the fourth database query statement being used to execute the second query branch, and the third database query statement and the fourth database query statement being connected based on UNION DISTINCT; and querying data in a database according to the second database query statement.
可选地,作为一种可能的实现方式,在所述将所述第一数据库查询语句中的所述OR谓词展开之前,所述方法还包括:检查所述第一数据库查询语句中的查询分支是否能够输出唯一列集合;所述将所述第一数据库查询语句中的所述OR谓词展开,包括:如果所述第一数据库查询语句中的查询分支均能够输出唯一列集合,则将所述第一数据库查询语句中的所述OR谓词展开。Optionally, as a possible implementation method, before expanding the OR predicate in the first database query statement, the method also includes: checking whether the query branches in the first database query statement can output a unique column set; expanding the OR predicate in the first database query statement includes: if all query branches in the first database query statement can output a unique column set, expanding the OR predicate in the first database query statement.
可选地,作为一种可能的实现方式,在所述将所述第一数据库查询语句中的所述OR谓词展开之前,所述方法还包括:对所述第一数据库查询语句中的查询分支进行SPJ分离。Optionally, as a possible implementation manner, before expanding the OR predicate in the first database query statement, the method further includes: performing SPJ separation on the query branches in the first database query statement.
可选地,作为一种可能的实现方式,所述第一查询分支和/或所述第二查询分支中的 谓词为子查询谓词。Optionally, as a possible implementation manner, in the first query branch and/or the second query branch The predicate is a subquery predicate.
可选地,作为一种可能的实现方式,所述数据库查询语句为SQL语句。Optionally, as a possible implementation manner, the database query statement is a SQL statement.
第二方面,提供一种数据库查询装置,包括:接收模块,用于接收第一数据库查询语句,所述第一数据库查询语句包括第一查询分支和第二查询分支,所述第一查询分支和所述第二查询分支通过OR谓词连接;展开模块,用于将所述第一数据库查询语句中的所述OR谓词展开,得到第二数据库查询语句,所述第二数据库查询语句包括第三数据库查询语句和第四数据库查询语句,所述第三数据库查询语句用于执行所述第一查询分支,所述第四数据库查询语句用于执行所述第二查询分支,且所述第三数据库查询语句和所述第四数据库查询语句基于UNION DISTINCT相连;查询模块,用于根据所述第二数据库查询语句查询数据库中的数据。In a second aspect, a database query device is provided, including: a receiving module, used to receive a first database query statement, the first database query statement includes a first query branch and a second query branch, the first query branch and the second query branch are connected by an OR predicate; an expansion module, used to expand the OR predicate in the first database query statement to obtain a second database query statement, the second database query statement includes a third database query statement and a fourth database query statement, the third database query statement is used to execute the first query branch, the fourth database query statement is used to execute the second query branch, and the third database query statement and the fourth database query statement are connected based on UNION DISTINCT; a query module, used to query data in a database according to the second database query statement.
可选地,作为一种可能的实现方式,所述装置还包括:检查模块,用于在所述将所述第一数据库查询语句中的所述OR谓词展开之前,检查所述第一数据库查询语句中的查询分支是否能够输出唯一列集合;所述展开模块用于:如果所述第一数据库查询语句中的查询分支均能够输出唯一列集合,则将所述第一数据库查询语句中的所述OR谓词展开。Optionally, as a possible implementation method, the device also includes: a checking module, used to check whether the query branches in the first database query statement can output a unique column set before expanding the OR predicate in the first database query statement; the expansion module is used to: if the query branches in the first database query statement can all output a unique column set, then expand the OR predicate in the first database query statement.
可选地,作为一种可能的实现方式,所述装置还包括:分离模块,用于在所述将所述第一数据库查询语句中的所述OR谓词展开之前,对所述第一数据库查询语句中的查询分支进行SPJ分离。Optionally, as a possible implementation manner, the device further includes: a separation module, configured to perform SPJ separation on query branches in the first database query statement before expanding the OR predicate in the first database query statement.
可选地,作为一种可能的实现方式,所述第一查询分支和/或所述第二查询分支中的谓词为子查询谓词。Optionally, as a possible implementation manner, the predicate in the first query branch and/or the second query branch is a sub-query predicate.
可选地,作为一种可能的实现方式,所述数据库查询语句为SQL语句。Optionally, as a possible implementation manner, the database query statement is a SQL statement.
第三方面,提供一种数据库查询装置,包括:存储器,用于存储指令;处理器,用于执行所述存储器中存储的指令,以执行如第一方面或第一方面中的任意一种可能的实现方式所述的方法。According to a third aspect, a database query device is provided, comprising: a memory for storing instructions; and a processor for executing the instructions stored in the memory to execute the method as described in the first aspect or any possible implementation method of the first aspect.
第四方面,提供一种计算机可读存储介质,其上存储有用于执行第一方面或第一方面中的任意一种可能的实现方式所述的方法的指令。According to a fourth aspect, a computer-readable storage medium is provided, on which instructions for executing the method described in the first aspect or any possible implementation of the first aspect are stored.
第五方面,提供一种计算机程序产品,包括用于执行第一方面或第一方面中的任意一种可能的实现方式所述的方法的指令。 According to a fifth aspect, a computer program product is provided, comprising instructions for executing the method described in the first aspect or any possible implementation manner of the first aspect.
相关技术在基于Union All对OR谓词展开时,需要使用LLNVL谓词进行去重操作(即去除OR谓词的多个查询分支的查询结果中的重复数据)。如果OR谓词连接的查询分支包含复杂谓词(子查询谓词),则会导致LLNVL谓词包含子查询(即LLNVL谓词的参数为子查询)。由于LLNVL谓词无法进行子查询提升(子查询执行效率很低,通常希望将子查询与外层查询转换成两个数据库表的连接操作,从而避免执行子查询,这种转换即为子查询提升,但并非所有的谓词均可以进行子查询提升,LLNVL谓词就无法进行子查询提升),因此会降低数据库查询语句在OR谓词展开之后的执行效率。本公开实施例基于UNION DISTINCT实现OR谓词展开,由于UNION DISTINCT本身具有去重功能,因此无需引入LLNVL谓词,也就规避了上述问题。换句话说,即使OR谓词连接的查询分支包含复杂谓词(子查询谓词),本公开实施例仍然能够很好地提升数据库查询语句的执行效率。Related technology When expanding the OR predicate based on Union All, it is necessary to use the LLNVL predicate to perform deduplication operations (i.e., remove duplicate data in the query results of multiple query branches of the OR predicate). If the query branch connected by the OR predicate contains a complex predicate (subquery predicate), it will cause the LLNVL predicate to contain a subquery (i.e., the parameter of the LLNVL predicate is a subquery). Since the LLNVL predicate cannot perform subquery promotion (the execution efficiency of the subquery is very low, and it is usually hoped to convert the subquery and the outer query into a connection operation of two database tables to avoid executing the subquery. This conversion is called subquery promotion, but not all predicates can perform subquery promotion, and the LLNVL predicate cannot perform subquery promotion), it will reduce the execution efficiency of the database query statement after the OR predicate is expanded. The embodiment of the present disclosure implements OR predicate expansion based on UNION DISTINCT. Since UNION DISTINCT itself has a deduplication function, there is no need to introduce the LLNVL predicate, thus avoiding the above problem. In other words, even if the query branches connected by the OR predicate contain complex predicates (sub-query predicates), the embodiments of the present disclosure can still well improve the execution efficiency of the database query statement.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本公开实施例或背景技术中的技术方案,下面将对本公开实施例或背景技术中所需要使用的附图进行说明。In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or the background technology, the drawings required for use in the embodiments of the present disclosure or the background technology will be described below.
图1是本公开一个实施例提供的数据库查询方法的结构示意图。FIG1 is a schematic diagram of the structure of a database query method provided by an embodiment of the present disclosure.
图2是本公开另一实施例提供的数据库查询方法的结构示意图。FIG. 2 is a schematic diagram of the structure of a database query method provided by another embodiment of the present disclosure.
图3是本公开一个实施例提供的数据库查询装置的结构示意图。FIG3 is a schematic diagram of the structure of a database query device provided by an embodiment of the present disclosure.
图4是本公开另一实施例提供的数据库查询装置的结构示意图。FIG. 4 is a schematic diagram of the structure of a database query device provided by another embodiment of the present disclosure.
具体实施方式Detailed ways
下面结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例仅是本公开一部分实施例,而不是全部实施例。The technical solutions in the embodiments of the present disclosure are clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, rather than all the embodiments.
随着技术的发展,数据库系统(如OceanBase数据库)的应用越来越广泛。不同类型的数据库系统具有各自支持的数据库查询语句,使得数据库系统的用户可以方便地对该数据库系统进行访问(如对数据库系统中的数据进行查询、增、删等操作)。这里提到的数据库查询语言例如可以是结构化查询语言(structured query language,SQL)。With the development of technology, the application of database systems (such as OceanBase database) is becoming more and more widespread. Different types of database systems have their own supported database query statements, so that users of the database system can easily access the database system (such as querying, adding, deleting, etc. data in the database system). The database query language mentioned here can be, for example, structured query language (SQL).
数据库查询语句的表达式不同,则数据库查询语句的执行性能可能不同。因此,为了提升数据库查询语句的执行性能,数据库系统可以对接收到的数据库查询语句进行查 询改写(数据库查询语句的查询改写通常是一种等价性改写,即该查询改写并不会改变数据库查询语句的查询结果)。The execution performance of database query statements may be different if the expressions of database query statements are different. Therefore, in order to improve the execution performance of database query statements, the database system can query the received database query statements. Query rewriting (query rewriting of database query statements is usually an equivalence rewriting, that is, the query rewriting does not change the query results of the database query statement).
OR谓词展开是一种常见的查询改写方式。以数据库查询语句为SELECT语句为例,OR谓词展开指的是将一个具有OR谓词的SELECT语句拆分成多个SELECT语句。该多个SELECT语句中的每个SELECT语句可用于执行OR谓词的一个查询分支。OR谓词展开通过拆分OR谓词到每个SELECT语句,使得被拆分后的SELECT语句可能可以触发更多的优化空间(即被拆分后的SELECT语句可能以执行性能更优的执行算法被执行),从而达到提升数据库语句的查询性能的目的。OR predicate expansion is a common query rewriting method. Taking the database query statement as a SELECT statement as an example, OR predicate expansion refers to splitting a SELECT statement with an OR predicate into multiple SELECT statements. Each of the multiple SELECT statements can be used to execute a query branch of the OR predicate. OR predicate expansion splits the OR predicate into each SELECT statement, so that the split SELECT statements may trigger more optimization space (that is, the split SELECT statements may be executed with an execution algorithm with better execution performance), thereby achieving the purpose of improving the query performance of database statements.
作为一个示例,首先为数据库系统创建两张表:As an example, first create two tables for the database system:
create table t1(c1 int primary key,c2 int,c3 int,c4 int);create table t1(c1 int primary key,c2 int,c3 int,c4 int);
create table t2(c1 int primary key,c2 int,c3 int,c4 int).create table t2(c1 int primary key,c2 int,c3 int,c4 int).
然后,接收原始数据库查询语句Q1:Then, receive the original database query statement Q1:
select*from t1,t2 where t1.c2=t2.c2 or t1.c3=t2.c3.select * from t1,t2 where t1.c2=t2.c2 or t1.c3=t2.c3.
在数据库查询语句Q1中,“t1.c2=t2.c2”和“t1.c3=t2.c3”分别为两个条件,该两个条件采用OR谓词连接。针对基于OR谓词的连接操作,如果不对数据库查询语句Q1进行查询改写,则只能使用嵌套循环连接(NESTED LOOP JOIN)算法来执行该数据库语句Q1。In the database query statement Q1, "t1.c2 = t2.c2" and "t1.c3 = t2.c3" are two conditions, which are connected by the OR predicate. For the connection operation based on the OR predicate, if the database query statement Q1 is not rewritten, the only way to execute the database statement Q1 is to use the nested loop join (NESTED LOOP JOIN) algorithm.
嵌套循环连接是一种通用、但效率很低的连接算法。具体而言,嵌套循环连接由两个FOR循环构成,并因此而得名。假设要连接的两个表为T1和T2,连接条件为P,则嵌套循环连接就选择其中一个作为外层循环,另一个作为内层循环,逐一比较表T1和表T2中的每个元组,从而找出所有匹配的元组。嵌套循环连接比较通用。嵌套循环连接与线性扫描一样,不需要索引,且不管是什么样的连接条件,该算法都适用。也就是说,对于任何类型的连接操作,该算法都只需要做稍微的调整就能进行运算。但是,嵌套循环连接的执行性能比较差。因为该算法过程中需要逐一比较表T1和表T2中的每一个元组,当数据规模较大而不能完全放入内存中时其引起的磁盘与内存交换比较频繁,即使数据能够完全放入内存,则嵌套循环连接执行过程中也会引起CPU的CACHE命中率低下,从而严重影响系统效率。Nested loop join is a general but inefficient join algorithm. Specifically, the nested loop join consists of two FOR loops, hence the name. Assuming that the two tables to be joined are T1 and T2, and the join condition is P, the nested loop join selects one as the outer loop and the other as the inner loop, and compares each tuple in table T1 and table T2 one by one to find all matching tuples. Nested loop join is relatively general. Like linear scan, nested loop join does not require indexes, and the algorithm is applicable to any join condition. In other words, for any type of join operation, the algorithm only needs to be slightly adjusted to perform the operation. However, the execution performance of nested loop join is relatively poor. Because the algorithm needs to compare each tuple in table T1 and table T2 one by one, when the data size is large and cannot be completely put into the memory, the disk and memory exchange caused by it is relatively frequent. Even if the data can be completely put into the memory, the CPU cache hit rate will be low during the execution of the nested loop join, which seriously affects the system efficiency.
为了能够避免使用嵌套循环连接,可以对数据库查询语句Q1进行查询改写,即对数据库查询语句Q1中的OR谓词进行展开。例如,可以将数据库查询语句Q1改写成 如下所示的数据库查询语句Q2:In order to avoid using nested loop joins, the database query statement Q1 can be rewritten, that is, the OR predicate in the database query statement Q1 can be expanded. For example, the database query statement Q1 can be rewritten as The database query statement Q2 is as follows:
select*from t1,t2 where t1.c2=t2.c2;select * from t1,t2 where t1.c2=t2.c2;
union allunion all
select*from t1,t2 where t1.c3=t2.c3 and llnvl(t1.c2=t2.c2).select * from t1,t2 where t1.c3=t2.c3 and llnvl(t1.c2=t2.c2).
对于数据库查询语句Q2,OR谓词被拆分成了多个简单的SELECT语句,而每个SELECT语句除了可以使用嵌套循环连接的方式来执行,也可以选择更加高效的合并连接(MERGE JOIN,合并连接算法又称排序合并连接,可以用于计算自然连接和等值连接。假设要连接的两个表为T1和T2,则在进行连接之前先让表T1和表T2是有序的。然后分别对两个表进行扫描一遍即可完成。合并连接执行起来非常高效,其时间复杂度是线性的O(n),其中n为表T1和表T2中元组数最多的表的元组个数)和哈希连接(HASH JOIN,哈希连接类似合并连接。在哈希连接算法中,哈希函数用于对两个表进行划分。此算法的基本思想是把两个表按连接属性划分成具有相同哈希值的元组集合。哈希连接不需要索引,并且与嵌套循环连接相比,哈希连接更容易处理大结果集)来执行,从而提升数据库语句的执行性能。For the database query statement Q2, the OR predicate is split into multiple simple SELECT statements. In addition to using the nested loop join method to execute each SELECT statement, you can also choose the more efficient merge join (MERGE JOIN, the merge join algorithm is also called sort merge join, which can be used to calculate natural joins and equivalue joins. Assuming that the two tables to be connected are T1 and T2, make tables T1 and T2 sorted before connecting. Then scan the two tables separately to complete. Merge joins are very efficient to execute, and their time complexity is linear O(n), where n is the number of tuples in the table with the largest number of tuples in tables T1 and T2) and hash joins (HASH JOIN, hash joins are similar to merge joins. In the hash join algorithm, hash functions are used to partition two tables. The basic idea of this algorithm is to divide the two tables into sets of tuples with the same hash value according to the connection attributes. Hash joins do not require indexes, and compared with nested loop joins, hash joins are easier to handle large result sets) to execute, thereby improving the execution performance of database statements.
目前,在支持OR谓词展开的数据库系统中,均采用基于“Union All”的形式执行OR谓词展开。这种OR谓词展开形式对于一些包含复杂谓词(比如子查询谓词)的OR谓词是非常低效的,从而可能导致OR谓词展开带来的效率方面的优势丧失。下面对这一问题产生的原因进行分析。At present, in database systems that support OR predicate expansion, all use the form based on "Union All" to perform OR predicate expansion. This form of OR predicate expansion is very inefficient for some OR predicates containing complex predicates (such as subquery predicates), which may lead to the loss of the efficiency advantage brought by OR predicate expansion. The following analyzes the cause of this problem.
基于“Union All”的OR谓词展开方式是利用“Union All”操作符将多个数据库查询语句(每个据库查询语句用于执行OR谓词的一个查询分支)组织起来,以计算该多个数据库查询语句的查询结构的并集。如果该多个数据库查询语句的查询结果包含多个重复的数据,“Union All”操作符会返回该多个重复的数据。The "Union All"-based OR predicate expansion method is to use the "Union All" operator to organize multiple database query statements (each database query statement is used to execute a query branch of the OR predicate) to calculate the union of the query structures of the multiple database query statements. If the query results of the multiple database query statements contain multiple duplicate data, the "Union All" operator will return the multiple duplicate data.
为了避免多个数据库查询语句产生重复数据,基于“Union All”的OR谓词展开方式会引入LLNVL谓词(如上文提及的数据库查询语句Q2中的“llnvl(t1.c2=t2.c2)”),以去除重复数据。LLNVL谓词的引入在某些特定的场景下会导致数据库查询语句的性能次优。这里提到的特定场景,例如可以是OR谓词的查询分支包含复杂谓词,如子查询谓词的场景(即谓词的参数包含子查询)。In order to avoid duplicate data generated by multiple database query statements, the "Union All"-based OR predicate expansion method introduces the LLNVL predicate (such as "llnvl(t1.c2=t2.c2)" in the database query statement Q2 mentioned above) to remove duplicate data. The introduction of the LLNVL predicate will lead to suboptimal performance of database query statements in certain specific scenarios. The specific scenario mentioned here may be, for example, a scenario where the query branch of the OR predicate contains complex predicates, such as a subquery predicate (that is, the predicate parameter contains a subquery).
作为示例,下文中的数据库查询语句Q3包含OR谓词,该OR谓词包含两个查询分支,且每个查询分支包含复杂谓词(即下文中的以select子查询为参数的EXISTS谓词):
As an example, the database query statement Q3 below contains an OR predicate, which contains two query branches, and each query branch contains a complex predicate (that is, the EXISTS predicate with a select subquery as a parameter below):
对数据库查询语句Q3进行OR谓词展开之后,得到如下数据库查询语句Q4:
After performing OR predicate expansion on database query statement Q3, the following database query statement Q4 is obtained:
数据库查询语句Q4中的第一个SELECT语句的谓词EXISTS虽然包含子查询,但可以通过子查询提升(子查询提升也是一种查询改写方式,旨在将一个子查询表示成多表连接的形式,从而合并到主查询中,从而提升查询性能)转化成连接(JOIN),从而可以使用更多的高效连接算法对查询性能进行优化。同理,数据库查询语句Q4中的第二个SELECT语句的EXISTS也可以通过子查询提升的方式转化成连接(JOIN),从而可以使用更多的高效连接算法对查询性能进行优化。但是,目前数据库系统均不支持LLNVL中的子查询的提升操作,因此,针对谓词LLNVL,只能使用类似Nested Loop Join的方式来执行,执行性能较低。由此可见,当OR谓词的查询分支包含复杂谓词(如子查询谓词)的情况下,即使进行了OR谓词展开,数据库查询语句的查询性能也无法得到较大的提升。Although the predicate EXISTS of the first SELECT statement in the database query statement Q4 contains a subquery, it can be converted into a join (JOIN) through subquery lifting (subquery lifting is also a query rewriting method, which aims to express a subquery in the form of multi-table joins and merge it into the main query to improve query performance), so that more efficient join algorithms can be used to optimize query performance. Similarly, the EXISTS of the second SELECT statement in the database query statement Q4 can also be converted into a join (JOIN) through subquery lifting, so that more efficient join algorithms can be used to optimize query performance. However, currently, none of the database systems support the lifting operation of subqueries in LLNVL. Therefore, for the predicate LLNVL, it can only be executed in a manner similar to Nested Loop Join, and the execution performance is low. It can be seen that when the query branch of the OR predicate contains complex predicates (such as subquery predicates), even if the OR predicate is expanded, the query performance of the database query statement cannot be greatly improved.
针对上述问题,本公开实施例提出一种基于“UNION DISTINCT”的OR谓词展开方式。由于“UNION DISTINCT”自身具有数据去重功能,因此无需使用LLNVL谓词,也就避免了此类复杂谓词的计算,从而可以提升数据库查询语句的执行性能。In view of the above problems, the disclosed embodiment proposes an OR predicate expansion method based on "UNION DISTINCT". Since "UNION DISTINCT" itself has the function of data deduplication, there is no need to use the LLNVL predicate, which avoids the calculation of such complex predicates, thereby improving the execution performance of database query statements.
仍以上文提到的数据库查询语句Q3为例,可以基于“UNION DISTINCT”对Q3中的OR谓词展开,从而将该数据库查询语句Q3转换成如下数据库查询语句Q5:

Still taking the database query statement Q3 mentioned above as an example, the OR predicate in Q3 can be expanded based on "UNION DISTINCT" to convert the database query statement Q3 into the following database query statement Q5:

对比数据库查询语句Q4和数据库查询语句Q5可以看出,数据库查询语句Q5仅包含可以进行子查询提升的EXISTS谓词,并不包含无法进行子查询提升的LLNVL谓词,因此数据库查询语句Q5无需执行Nested Loop Join,从而具有更高的执行性能。By comparing database query statements Q4 and Q5, it can be seen that database query statement Q5 only contains the EXISTS predicate that can perform subquery lifting, and does not contain the LLNVL predicate that cannot perform subquery lifting. Therefore, database query statement Q5 does not need to execute Nested Loop Join, and thus has higher execution performance.
下面结合图1,对本公开实施例提供的数据库查询方法进行更为详细地举例说明。图1的方法100可以由数据库系统执行,具体可以由数据库系统中的优化器执行。本公开实施例对该数据库系统的类型不作具体限定,例如可以是分布式的数据库。The following is a more detailed example of the database query method provided by the embodiment of the present disclosure in conjunction with Figure 1. The method 100 of Figure 1 can be executed by a database system, specifically, by an optimizer in the database system. The embodiment of the present disclosure does not specifically limit the type of the database system, for example, it can be a distributed database.
参见图1,在步骤S110,接收第一数据库查询语句。该第一数据库查询语句也可以是原始数据库查询语句,如数据库系统的用户输入的数据库查询语句,也可以是对原始数据库查询语句进行改写之后的数据库查询语句。该第一数据库查询语句例如可以是SQL语句。Referring to FIG. 1 , in step S110 , a first database query statement is received. The first database query statement may also be an original database query statement, such as a database query statement input by a user of a database system, or a database query statement after rewriting the original database query statement. The first database query statement may be, for example, an SQL statement.
该第一数据库查询语句可以包括第一查询分支和第二查询分支,且第一查询分支和第二查询分支通过OR谓词连接。因此,为了提升第一数据库查询语句的执行性能,可以考虑对第一数据库查询语句进行OR谓词展开。The first database query statement may include a first query branch and a second query branch, and the first query branch and the second query branch are connected by an OR predicate. Therefore, in order to improve the execution performance of the first database query statement, it may be considered to perform an OR predicate expansion on the first database query statement.
该第一查询分支中的谓词可以为复杂谓词。例如,该第一查询分支中的谓词可以为子查询谓词,即谓词的参数包含子查询。以第一数据库查询语句为前文提到的Q3为例,该第一数据库查询语句的第一查询分支包含子查询谓词,即exists(select 1 from t2 where t2.c2=t1.c2)。The predicate in the first query branch can be a complex predicate. For example, the predicate in the first query branch can be a subquery predicate, that is, the parameter of the predicate includes a subquery. Taking the first database query statement as Q3 mentioned above as an example, the first query branch of the first database query statement includes a subquery predicate, that is, exists(select 1 from t2 where t2.c2=t1.c2).
该第二查询分支中的谓词可以为复杂谓词。例如,该第二查询分支中的谓词可以为子查询谓词,即谓词的参数包含子查询。以第一数据库查询语句为前文提到的Q3为例,该第一数据库查询语句的第二查询分支包含子查询谓词,即exsits(select 1 from t3 where t3.c3=t1.c3)。The predicate in the second query branch can be a complex predicate. For example, the predicate in the second query branch can be a subquery predicate, that is, the parameter of the predicate includes a subquery. Taking the first database query statement as Q3 mentioned above as an example, the second query branch of the first database query statement includes a subquery predicate, that is, exsits(select 1 from t3 where t3.c3=t1.c3).
在步骤S120,将第一数据库查询语句中的OR谓词展开,得到第二数据库查询语句。第二数据库查询语句与第一数据库查询语句可以是等价的,即第一数据库语句与第二数据库语句可以对应相同的查询结果。In step S120, the OR predicate in the first database query statement is expanded to obtain a second database query statement. The second database query statement may be equivalent to the first database query statement, that is, the first database query statement and the second database query statement may correspond to the same query result.
第二数据库查询语句可以包括第三数据库查询语句和第四数据库查询语句。第三数据库查询语句可用于执行OR谓词的第一查询分支。第四数据库查询语句可用于执行OR谓词的第二查询分支。第三数据库查询语句和第四数据库查询语句基于UNION  DISTINCT相连。也就是说,可以利用UNION DISTINCT操作符计算第三数据库查询语句和第四数据库查询语句的查询结果的并集。由于UNION DISTINCT操作符本身具有去重功能,因此,无需添加专门用于去重的谓词,如前文提到的LLNVL谓词。The second database query statement may include a third database query statement and a fourth database query statement. The third database query statement may be used to execute the first query branch of the OR predicate. The fourth database query statement may be used to execute the second query branch of the OR predicate. The third database query statement and the fourth database query statement are based on UNION DISTINCT. That is, the UNION DISTINCT operator can be used to calculate the union of the query results of the third database query statement and the fourth database query statement. Since the UNION DISTINCT operator itself has a deduplication function, there is no need to add a predicate specifically for deduplication, such as the LLNVL predicate mentioned above.
第三数据库查询语句和第四数据库语句的查询结果可以包含唯一列集合(或者说,第三数据库查询语句和第四数据库语句均投影出唯一列集合)。所谓唯一列集合,指的是能够唯一标识第三数据库查询语句和第四数据库查询语句查询出的行的列集合。以第三数据库查询语句和第四数据库查询语句均查询数据库表为例,该唯一列集合可以是该数据库表的主键。UNION DISTINCT基于第三数据库查询语句和第四数据库查询语句各自投影出的唯一列集合,可以执行去重操作。以主键作为唯一列集合为例,如果第三数据库查询语句和第四数据库查询语句的查询结果包括相同主键对应的数据,则仅保留一份数据即可。待去重结束之后,可以将唯一列集合从查询结果中删除即可。The query results of the third database query statement and the fourth database query statement may include a unique column set (or, the third database query statement and the fourth database query statement both project a unique column set). The so-called unique column set refers to a column set that can uniquely identify the rows queried by the third database query statement and the fourth database query statement. Taking the example that the third database query statement and the fourth database query statement both query a database table, the unique column set may be the primary key of the database table. UNION DISTINCT can perform a deduplication operation based on the unique column sets projected by the third database query statement and the fourth database query statement. Taking the primary key as the unique column set as an example, if the query results of the third database query statement and the fourth database query statement include data corresponding to the same primary key, only one copy of the data is retained. After the deduplication is completed, the unique column set can be deleted from the query results.
在步骤S130,根据第二数据库查询语句查询数据库中的数据。例如,可以生成第二数据库查询语句对应的执行计划,然后基于该执行计划执行数据库查询任务,并返回第二数据库查询语句对应的查询结果。In step S130, the data in the database is queried according to the second database query statement. For example, an execution plan corresponding to the second database query statement may be generated, and then the database query task is executed based on the execution plan, and the query result corresponding to the second database query statement is returned.
图1对应的实施例基于UNION DISTINCT执行OR谓词展开,由于UNION DISTINCT自身具有数据去重功能,因此无需使用谓词LLNVL,也就避免了此类复杂谓词的计算,从而可以提升OR谓词展开之后的数据库查询语句的执行性能。The embodiment corresponding to Figure 1 performs OR predicate expansion based on UNION DISTINCT. Since UNION DISTINCT itself has the function of data deduplication, there is no need to use the predicate LLNVL, thus avoiding the calculation of such complex predicates, thereby improving the execution performance of the database query statement after the OR predicate expansion.
在一些实施例中,在执行步骤S120之前,图1的方法还可以包括:对第一数据库查询语句进行检查,以确定第一数据库查询语句是否满足展开条件。例如,可以检查第一数据库查询语句中的查询分支是否能够输出唯一列集合。如果第一数据库查询语句中的查询分支无法输出唯一列集合,则无法基于UNION DISTINCT进行去重操作(因为UNION DISTINCT需要基于唯一列集合实现去重功能,这点是UNION DISTINCT与UNION ALL的主要不同)。在这种情况下,可以不进行OR谓词展开,或者,也可以采用传统的基于UNION ALL的方式进行OR谓词展开。In some embodiments, before executing step S120, the method of FIG. 1 may further include: checking the first database query statement to determine whether the first database query statement satisfies the expansion condition. For example, it may be checked whether the query branch in the first database query statement can output a unique column set. If the query branch in the first database query statement cannot output a unique column set, deduplication cannot be performed based on UNION DISTINCT (because UNION DISTINCT needs to implement deduplication based on a unique column set, which is the main difference between UNION DISTINCT and UNION ALL). In this case, OR predicate expansion may not be performed, or OR predicate expansion may be performed in a traditional UNION ALL-based manner.
作为一个示例,假设第一数据库查询语句查询的对象并非真实的数据库表,而是一视图(view),且该视图里面包含了层次查询,那么该视图是没办法输出唯一列集合的。因此,在这种情况下,可以不进行OR谓词展开,或者,也可以采用传统的基于UNION ALL的方式进行OR谓词展开。As an example, suppose the object queried by the first database query statement is not a real database table, but a view, and the view contains a hierarchical query, then the view cannot output a unique column set. Therefore, in this case, OR predicate expansion can be omitted, or the traditional UNION ALL-based method can be used to expand the OR predicate.
相应地,如果第一数据库查询语句中的查询分支均能够输出唯一列集合,则可以基 于UNION DISTINCT将第一数据库查询语句中的OR谓词展开。Correspondingly, if the query branches in the first database query statement can all output unique column sets, The OR predicate in the first database query statement is expanded in UNION DISTINCT.
在一些实施例中,在执行步骤S120之前,图1的方法还可以包括:对所述第一数据库查询语句中的查询分支进行SPJ(Selection-Projection-Join)分离。SPJ查询是指只包括Selection,Projection和Join的查询。SPJ分离指的是将查询分支转换成只包括Selection,Projection和Join的查询。In some embodiments, before executing step S120, the method of FIG. 1 may further include: performing SPJ (Selection-Projection-Join) separation on the query branches in the first database query statement. SPJ query refers to a query that only includes Selection, Projection, and Join. SPJ separation refers to converting the query branches into queries that only include Selection, Projection, and Join.
作为一种可能的实现方式,可以先确定第一数据库查询语句中的每个查询分支是否为SPJ查询。如果某个查询分支本身已经是SPJ查询,那么不需要对该查询分支进行SPJ分离;如果某个查询分支并非SPJ查询(例如,如果某个查询分支包含Group-by子句,则该查询分支不是SPJ查询),则需要对该查询分支进行SPJ分离。As a possible implementation, it can be determined whether each query branch in the first database query statement is an SPJ query. If a query branch itself is already an SPJ query, then there is no need to perform SPJ separation on the query branch; if a query branch is not an SPJ query (for example, if a query branch contains a Group-by clause, then the query branch is not an SPJ query), then it is necessary to perform SPJ separation on the query branch.
下面结合图2,更加详细地描述本公开实施例。应注意,图2所示的方法200仅仅是为了帮助本领域技术人员理解本公开实施例,而非要将本公开实施例限于所例示的具体数值或具体场景。本领域技术人员根据所给出的例子,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本公开实施例的范围内。The following is a more detailed description of the embodiment of the present disclosure in conjunction with FIG2. It should be noted that the method 200 shown in FIG2 is only to help those skilled in the art understand the embodiment of the present disclosure, and is not intended to limit the embodiment of the present disclosure to the specific numerical values or specific scenarios illustrated. Those skilled in the art can obviously make various equivalent modifications or changes based on the examples given, and such modifications or changes also fall within the scope of the embodiment of the present disclosure.
参见图2,在步骤S210,检查数据库查询语句是否能够进行OR谓词展开。例如,可以检查OR谓词是否满足展开条件、检查查询分支是否能够输出唯一列集合等。2 , in step S210 , it is checked whether the database query statement can be expanded with an OR predicate. For example, it can be checked whether the OR predicate satisfies the expansion condition, whether the query branch can output a unique column set, and so on.
例如,数据库查询语句为下文所示的Q6:
For example, the database query statement is Q6 shown below:
数据库查询语句Q6中的OR谓词所在的查询分支存在两个表:t1和t4。从t1和t4 的创建过程可以看出,t1和t4都具有主键,所以(t1.c1,t4.c1)可以作为输出唯一列集合,这个集合的值可以用来标识表中的唯一的一行。The query branch where the OR predicate in the database query statement Q6 is located has two tables: t1 and t4. As can be seen from the creation process, both t1 and t4 have primary keys, so (t1.c1, t4.c1) can be used as the output unique column set, and the value of this set can be used to identify the only row in the table.
在步骤S220,对数据库查询语句中的查询分支进行SPJ分离。如果查询分支本身就是SPJ查询,那么不需要进行分离,否则就需要SPJ分离。In step S220, SPJ separation is performed on the query branch in the database query statement. If the query branch itself is an SPJ query, then no separation is required, otherwise SPJ separation is required.
继续以数据库查询为上文提到的Q6为例,Q6包括了Group-by子句,该子句并不是SPJ查询。因此,可以对该查询分支进行SPJ分离,从而将Q6转换为下文所示的Q7,其中temp为分离之后的SPJ查询分支。
Taking the database query Q6 mentioned above as an example, Q6 includes a Group-by clause, which is not an SPJ query. Therefore, the query branch can be SPJ separated to convert Q6 into Q7 shown below, where temp is the SPJ query branch after separation.
在步骤S230,对分离后的SPJ分支进行OR谓词展开。将OR条件拆分为多个简单条件,并且创建相对应的SELECT语句,同时为每个SELECT语句添加相对应的输出唯一列集合,最终把这些查询以UNION DISTINCT形式组合起来。下文中的Q8即为对Q7中temp表进行OR谓词展开之后的结果,其中t1.c1和t4.c1为在查询分支中添加的唯一列集合。
In step S230, the separated SPJ branches are expanded with an OR predicate. The OR condition is split into multiple simple conditions, and corresponding SELECT statements are created. At the same time, a corresponding output unique column set is added to each SELECT statement, and finally these queries are combined in the form of UNION DISTINCT. Q8 below is the result of OR predicate expansion on the temp table in Q7, where t1.c1 and t4.c1 are the unique column sets added in the query branch.
在步骤S240,对展开后的UNION DISTINCT查询进行查询封装,并对最终需要投影出来的列进行投影。下文中的Q9展示了对Q8的一次查询封装。
In step S240, the expanded UNION DISTINCT query is encapsulated, and the columns that need to be projected are projected. Q9 below shows a query encapsulation of Q8.
本示例提出了一种基于Union Distinc的OR谓词展开方法,不同于基于Union All的OR谓词展开方法,基于Union Distinc的OR谓词展开方式无需添加LLNVL谓词来避免产生重复数据。对于包括复杂谓词(比如子查询谓词)的OR谓词,由于本示例提供的方法可以避免添加LLNVL谓词,因此可以减少复杂谓词的计算,提升OR谓词展开的优化空间,进而达到提升数据库查询语句的执行性能的目的。This example proposes a method for expanding OR predicates based on Union Distinc. Different from the method for expanding OR predicates based on Union All, the method for expanding OR predicates based on Union Distinc does not need to add LLNVL predicates to avoid duplicate data. For OR predicates that include complex predicates (such as subquery predicates), since the method provided in this example can avoid adding LLNVL predicates, it can reduce the calculation of complex predicates and increase the optimization space for expanding OR predicates, thereby achieving the purpose of improving the execution performance of database query statements.
需要说明的是,本公开的说明书和权利要求书中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同的对象,而不是用于描述特定顺序。It should be noted that the terms "first", "second", "third" and "fourth" etc. in the specification and claims of the present disclosure are used to distinguish different objects rather than to describe a specific order.
上文结合图1和图2,详细描述了本公开的方法实施例,下面结合图3和图4,详细描述本公开的装置实施例。应理解,方法实施例的描述与装置实施例的描述相互对应,因此,未详细描述的部分可以参见前面方法实施例。The above describes in detail the method embodiment of the present disclosure in conjunction with Figures 1 and 2, and the following describes in detail the device embodiment of the present disclosure in conjunction with Figures 3 and 4. It should be understood that the description of the method embodiment corresponds to the description of the device embodiment, and therefore, the part not described in detail can refer to the previous method embodiment.
图3是本公开实施例提供的一种数据库查询装置的结构示意图。图3所示的数据库查询装置300包括接收模块310,展开模块320以及查询模块330。FIG3 is a schematic diagram of a database query device according to an embodiment of the present disclosure. The database query device 300 shown in FIG3 includes a receiving module 310 , an expansion module 320 and a query module 330 .
接收模块310可用于接收第一数据库查询语句。所述第一数据库查询语句包括第一查询分支和第二查询分支,所述第一查询分支和所述第二查询分支通过OR谓词连接。The receiving module 310 may be configured to receive a first database query statement, wherein the first database query statement includes a first query branch and a second query branch, wherein the first query branch and the second query branch are connected by an OR predicate.
展开模块320可用于将所述第一数据库查询语句中的所述OR谓词展开,得到第二数据库查询语句,所述第二数据库查询语句包括第三数据库查询语句和第四数据库查询语句,所述第三数据库查询语句用于执行所述第一查询分支,所述第四数据库查询语句 用于执行所述第二查询分支,且所述第三数据库查询语句和所述第四数据库查询语句基于UNION DISTINCT相连。The expansion module 320 can be used to expand the OR predicate in the first database query statement to obtain a second database query statement, wherein the second database query statement includes a third database query statement and a fourth database query statement, wherein the third database query statement is used to execute the first query branch, and the fourth database query statement The method is used to execute the second query branch, and the third database query statement and the fourth database query statement are connected based on UNION DISTINCT.
查询模块330可用于根据所述第二数据库查询语句查询数据库中的数据。The query module 330 may be used to query data in the database according to the second database query statement.
可选地,在一些实施例中,所述装置300还可包括:检查模块,用于在所述将所述第一数据库查询语句中的所述OR谓词展开之前,检查所述第一数据库查询语句中的查询分支是否能够输出唯一列集合;所述展开模块320可用于如果所述第一数据库查询语句中的查询分支均能够输出唯一列集合,则将所述第一数据库查询语句中的所述OR谓词展开。Optionally, in some embodiments, the device 300 may also include: a checking module, used to check whether the query branches in the first database query statement can output a unique column set before expanding the OR predicate in the first database query statement; the expansion module 320 can be used to expand the OR predicate in the first database query statement if all query branches in the first database query statement can output a unique column set.
可选地,在一些实施例中,所述装置300还可包括:分离模块,用于在所述将所述第一数据库查询语句中的所述OR谓词展开之前,对所述第一数据库查询语句中的查询分支进行SPJ分离。Optionally, in some embodiments, the apparatus 300 may further include: a separation module, configured to perform SPJ separation on the query branches in the first database query statement before expanding the OR predicate in the first database query statement.
可选地,在一些实施例中,所述第一查询分支和/或所述第二查询分支中的谓词为子查询谓词。Optionally, in some embodiments, the predicate in the first query branch and/or the second query branch is a sub-query predicate.
可选地,在一些实施例中,所述数据库查询语句为SQL语句。Optionally, in some embodiments, the database query statement is a SQL statement.
图4是本公开另一实施例提供的数据库查询装置的结构示意图。图4所述的数据库查询装置400可以包括存储器410和处理器420,存储器410可以用于存储指令。处理器420可以用于执行存储器410中存储的指令,以实现前文描述的各个方法中的步骤。在一些实施例中,该装置400还可以包括网络接口430,处理器420与外部设备的数据交换可以通过该网络接口430实现。FIG4 is a schematic diagram of the structure of a database query device provided by another embodiment of the present disclosure. The database query device 400 described in FIG4 may include a memory 410 and a processor 420, and the memory 410 may be used to store instructions. The processor 420 may be used to execute the instructions stored in the memory 410 to implement the steps in the various methods described above. In some embodiments, the device 400 may also include a network interface 430, and data exchange between the processor 420 and an external device may be implemented through the network interface 430.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本公开实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介 质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, it can be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented by software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present disclosure is generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions can be transmitted from one website site, computer, server or data center to another website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a medium containing one or more available media. The available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (DVD)), or a semiconductor medium (e.g., a solid state disk (SSD)).
本领域普通技术人员可以意识到,结合本公开实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。Those skilled in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments of the present disclosure can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present disclosure.
在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。 The above is only a specific embodiment of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person skilled in the art who is familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present disclosure, which should be included in the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims (11)

  1. 一种数据库查询方法,包括:A database query method, comprising:
    接收第一数据库查询语句,所述第一数据库查询语句包括第一查询分支和第二查询分支,所述第一查询分支和所述第二查询分支通过OR谓词连接;Receive a first database query statement, the first database query statement includes a first query branch and a second query branch, the first query branch and the second query branch are connected by an OR predicate;
    将所述第一数据库查询语句中的所述OR谓词展开,得到第二数据库查询语句,所述第二数据库查询语句包括第三数据库查询语句和第四数据库查询语句,所述第三数据库查询语句用于执行所述第一查询分支,所述第四数据库查询语句用于执行所述第二查询分支,且所述第三数据库查询语句和所述第四数据库查询语句基于UNION DISTINCT相连;Expand the OR predicate in the first database query statement to obtain a second database query statement, wherein the second database query statement includes a third database query statement and a fourth database query statement, wherein the third database query statement is used to execute the first query branch, and the fourth database query statement is used to execute the second query branch, and the third database query statement and the fourth database query statement are connected based on UNION DISTINCT;
    根据所述第二数据库查询语句查询数据库中的数据。The data in the database is queried according to the second database query statement.
  2. 根据权利要求1所述的方法,在所述将所述第一数据库查询语句中的所述OR谓词展开之前,所述方法还包括:According to the method of claim 1, before expanding the OR predicate in the first database query statement, the method further comprises:
    检查所述第一数据库查询语句中的查询分支是否能够输出唯一列集合;Checking whether the query branch in the first database query statement can output a unique column set;
    所述将所述第一数据库查询语句中的所述OR谓词展开,包括:The step of expanding the OR predicate in the first database query statement includes:
    如果所述第一数据库查询语句中的查询分支均能够输出唯一列集合,则将所述第一数据库查询语句中的所述OR谓词展开。If all query branches in the first database query statement can output a unique column set, the OR predicate in the first database query statement is expanded.
  3. 根据权利要求1所述的方法,在所述将所述第一数据库查询语句中的所述OR谓词展开之前,所述方法还包括:According to the method of claim 1, before expanding the OR predicate in the first database query statement, the method further comprises:
    对所述第一数据库查询语句中的查询分支进行SPJ分离。Perform SPJ separation on the query branches in the first database query statement.
  4. 根据权利要求1所述的方法,所述第一查询分支和/或所述第二查询分支中的谓词为子查询谓词。According to the method of claim 1, the predicate in the first query branch and/or the second query branch is a subquery predicate.
  5. 根据权利要求1所述的方法,所述数据库查询语句为SQL语句。According to the method of claim 1, the database query statement is an SQL statement.
  6. 一种数据库查询装置,包括:A database query device, comprising:
    接收模块,用于接收第一数据库查询语句,所述第一数据库查询语句包括第一查询分支和第二查询分支,所述第一查询分支和所述第二查询分支通过OR谓词连接;A receiving module, configured to receive a first database query statement, wherein the first database query statement includes a first query branch and a second query branch, wherein the first query branch and the second query branch are connected by an OR predicate;
    展开模块,用于将所述第一数据库查询语句中的所述OR谓词展开,得到第二数据库查询语句,所述第二数据库查询语句包括第三数据库查询语句和第四数据库查询语句,所述第三数据库查询语句用于执行所述第一查询分支,所述第四数据库查询语句用于执行所述第二查询分支,且所述第三数据库查询语句和所述第四数据库查询语句基于UNION DISTINCT相连;an expansion module, configured to expand the OR predicate in the first database query statement to obtain a second database query statement, wherein the second database query statement includes a third database query statement and a fourth database query statement, wherein the third database query statement is used to execute the first query branch, and the fourth database query statement is used to execute the second query branch, and the third database query statement and the fourth database query statement are connected based on UNION DISTINCT;
    查询模块,用于根据所述第二数据库查询语句查询数据库中的数据。 A query module is used to query data in the database according to the second database query statement.
  7. 根据权利要求6所述的装置,所述装置还包括:The device according to claim 6, further comprising:
    检查模块,用于在所述将所述第一数据库查询语句中的所述OR谓词展开之前,检查所述第一数据库查询语句中的查询分支是否能够输出唯一列集合;A checking module, configured to check whether a query branch in the first database query statement can output a unique column set before expanding the OR predicate in the first database query statement;
    所述展开模块用于:The expansion module is used to:
    如果所述第一数据库查询语句中的查询分支均能够输出唯一列集合,则将所述第一数据库查询语句中的所述OR谓词展开。If all query branches in the first database query statement can output a unique column set, the OR predicate in the first database query statement is expanded.
  8. 根据权利要求6所述的装置,所述装置还包括:The device according to claim 6, further comprising:
    分离模块,用于在所述将所述第一数据库查询语句中的所述OR谓词展开之前,对所述第一数据库查询语句中的查询分支进行SPJ分离。A separation module is used to perform SPJ separation on the query branches in the first database query statement before expanding the OR predicate in the first database query statement.
  9. 根据权利要求6所述的装置,所述第一查询分支和/或所述第二查询分支中的谓词为子查询谓词。According to the apparatus according to claim 6, the predicate in the first query branch and/or the second query branch is a sub-query predicate.
  10. 根据权利要求6所述的装置,所述数据库查询语句为SQL语句。According to the device according to claim 6, the database query statement is a SQL statement.
  11. 一种数据库查询装置,包括:A database query device, comprising:
    存储器,用于存储指令;A memory for storing instructions;
    处理器,用于执行所述存储器中存储的指令,以执行如权利要求1-5中任一项所述的方法。 A processor, configured to execute instructions stored in the memory to perform the method according to any one of claims 1 to 5.
PCT/CN2023/118705 2022-10-20 2023-09-14 Database query method and apparatus WO2024082881A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211289477.7A CN115563148A (en) 2022-10-20 2022-10-20 Database query method and device
CN202211289477.7 2022-10-20

Publications (1)

Publication Number Publication Date
WO2024082881A2 true WO2024082881A2 (en) 2024-04-25

Family

ID=84746151

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/118705 WO2024082881A2 (en) 2022-10-20 2023-09-14 Database query method and apparatus

Country Status (2)

Country Link
CN (1) CN115563148A (en)
WO (1) WO2024082881A2 (en)

Also Published As

Publication number Publication date
CN115563148A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN109299102B (en) HBase secondary index system and method based on Elastcissearch
US9390115B2 (en) Tables with unlimited number of sparse columns and techniques for an efficient implementation
US11893022B2 (en) Computer-implemented method for improving query execution in relational databases normalized at level 4 and above
US8612421B2 (en) Efficient processing of relational joins of multidimensional data
US10572484B2 (en) Duplicate reduction or elimination with hash join operations
US7814042B2 (en) Selecting candidate queries
US8332389B2 (en) Join order for a database query
US6032143A (en) Evaluation of existential and universal subquery in a relational database management system for increased efficiency
US7814091B2 (en) Multi-tiered query processing techniques for minus and intersect operators
US8965918B2 (en) Decomposed query conditions
US8160999B2 (en) Method and apparatus for using set based structured query language (SQL) to implement extract, transform, and load (ETL) splitter operation
US20090119247A1 (en) Efficient hash based full-outer join
US20140067789A1 (en) Optimization technique of generalized disjunctive semi/anti join
CN109885585B (en) Distributed database system and method supporting stored procedures, triggers and views
US6502089B1 (en) Generating restriction queries using tensor representations
US20190026332A1 (en) Redundant group by and distinct removal
CN112231321B (en) Oracle secondary index and index real-time synchronization method
US11416458B2 (en) Efficient indexing for querying arrays in databases
US9047354B2 (en) Statement categorization and normalization
US6745173B1 (en) Generating in and exists queries using tensor representations
US6487549B1 (en) Generating union queries using tensor representations
US20210286817A1 (en) System and method for disjunctive joins using a lookup table
US20100030756A1 (en) Techniques for coalescing subqueries
WO2024082881A2 (en) Database query method and apparatus
US20230205769A1 (en) System and method for disjunctive joins