CN114265874B

CN114265874B - Method and device for querying data

Info

Publication number: CN114265874B
Application number: CN202210198352.7A
Authority: CN
Inventors: 赵占越; 朱涛; 王国平
Original assignee: Beijing Oceanbase Technology Co Ltd
Current assignee: Beijing Oceanbase Technology Co Ltd
Priority date: 2022-03-02
Filing date: 2022-03-02
Publication date: 2022-05-03
Anticipated expiration: 2042-03-02
Also published as: CN114265874A

Abstract

A method and apparatus for querying data are provided. The method comprises the following steps: receiving a database query statement, wherein the database query statement comprises a connection operation of a first data set and a second data set, the connection condition corresponding to the connection operation comprises a first condition and a second condition, and a logic relationship between the first condition and the second condition is OR; splitting the database query statement into a first sub-query statement and a second sub-query statement, wherein the first sub-query statement comprises the join operation, a join condition of the join operation in the first sub-query statement is the first condition, the second sub-query statement comprises the join operation, and a join condition of the join operation in the second sub-query statement is the second condition; and combining the query result of the first sub-query statement and the query result of the second sub-query statement to determine the query result of the database query statement.

Description

Method and device for querying data

Technical Field

The present disclosure relates to the field of databases, and more particularly, to a method and apparatus for querying data.

Background

Join operations are operations that are often used when accessing databases. In the database query statement, if the join condition of a join operation includes a plurality of conditions, and the plurality of conditions are joined by a predicate OR, when the join operation is executed, only a nested loop join algorithm can be used for join operation at present. However, the execution efficiency of the nested loop join algorithm is low, which results in poor execution performance of the database query statement.

Disclosure of Invention

The disclosure provides a method and a device for querying data, which are used for improving the execution performance of database query statements.

In a first aspect, a method for querying data is provided, including: receiving a database query statement, wherein the database query statement comprises a connection operation of a first data set and a second data set, the connection condition corresponding to the connection operation comprises a first condition and a second condition, and a logical relationship between the first condition and the second condition is OR; splitting the database query statement into a first sub-query statement and a second sub-query statement, wherein the first sub-query statement comprises the join operation, a join condition of the join operation in the first sub-query statement is the first condition, the second sub-query statement comprises the join operation, and a join condition of the join operation in the second sub-query statement is the second condition; and combining the query result of the first sub-query statement and the query result of the second sub-query statement to determine the query result of the database query statement.

Optionally, as a possible implementation, the connection operation is an external connection operation, a semi-connection operation, or an inverse connection operation.

Optionally, as a possible implementation manner, the connecting operation is an external connecting operation, the query result of the first sub-query statement includes first null data generated by executing the external connecting operation according to the first condition, the query result of the second sub-query statement includes second null data generated by executing the external connecting operation according to the second condition, and the merging the query result of the first sub-query statement and the query result of the second sub-query statement includes: and removing repeated data in the first gap filling data and the second gap filling data.

Optionally, as a possible implementation manner, the connecting operation is an external connecting operation, and the splitting the database query statement into a first sub-query statement and a second sub-query statement includes: creating an inline view for the out-connect operation; splitting the inline view into the first sub-query statement and the second sub-query statement.

Optionally, as a possible implementation manner, each of the selection statements in the first sub-query statement and the second sub-query statement includes a primary key of the first data set and/or the second data set.

Optionally, as a possible implementation manner, the join operation of the first sub-query statement and/or the second sub-query statement is performed based on a target join algorithm, where the target join algorithm includes other join algorithms except a nested loop join algorithm.

Optionally, as a possible implementation manner, the other connection algorithm includes a hash connection algorithm and/or a merge connection algorithm.

In a second aspect, an apparatus for querying data is provided, including: the system comprises a receiving module, a judging module and a sending module, wherein the receiving module is used for receiving a database query statement, the database query statement comprises a connection operation of a first data set and a second data set, the connection condition corresponding to the connection operation comprises a first condition and a second condition, and the logic relationship between the first condition and the second condition is OR; a splitting module, configured to split the database query statement into a first sub-query statement and a second sub-query statement, where the first sub-query statement includes the join operation, a join condition of the join operation in the first sub-query statement is the first condition, the second sub-query statement includes the join operation, and a join condition of the join operation in the second sub-query statement is the second condition; and the merging module is used for merging the query result of the first sub-query statement and the query result of the second sub-query statement to determine the query result of the database query statement.

Optionally, as a possible implementation manner, the join operation is an outer join operation, the query result of the first sub-query statement includes first null data generated by performing the outer join operation according to the first condition, the query result of the second sub-query statement includes second null data generated by performing the outer join operation according to the second condition, and the merging module is configured to remove duplicate data in the first null data and the second null data.

Optionally, as a possible implementation manner, the splitting module is configured to create an inline view for the external connection operation; splitting the inline view into the first sub-query statement and the second sub-query statement.

In a third aspect, an apparatus for querying data is provided, including: a memory to store instructions; a processor configured to execute instructions stored in the memory to perform a method according to the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, on which instructions for performing the method of the first aspect or any one of the possible implementations of the first aspect are stored.

In a fifth aspect, a computer program product is provided, which comprises instructions for performing the method of the first aspect or any one of the possible implementations of the first aspect.

The disclosed embodiments expand the OR conditional join, splitting the database query statement into multiple sub-query statements. Because the expanded connection conditions are all simple connections, a plurality of connection algorithms (such as a Hash connection algorithm or a combination connection algorithm with higher execution efficiency) can be flexibly selected to operate the connection operation, and therefore the execution time of the database query statement can be obviously reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present disclosure, the drawings required to be used in the embodiments or the background art of the present disclosure will be described below.

Fig. 1 is a schematic structural diagram of a data query method provided in an embodiment of the present disclosure.

Fig. 2 is a schematic structural diagram of a data query apparatus according to an embodiment of the present disclosure.

Fig. 3 is a schematic structural diagram of a data query apparatus according to another embodiment of the present disclosure.

Detailed Description

The embodiments of the present disclosure are described below with reference to the drawings in the embodiments of the present disclosure. In the following description, reference is made to the accompanying drawings which form a part hereof and in which is shown by way of illustration specific aspects of embodiments of the disclosure or in which aspects of embodiments of the disclosure may be practiced. It should be understood that the disclosed embodiments may be used in other respects, and may include structural or logical changes not depicted in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims. For example, it should be understood that the disclosure in connection with the described methods may equally apply to the corresponding apparatus or system for performing the methods, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may comprise one or more units, such as functional units, to perform the described one or more method steps (e.g., a unit performs one or more steps, or multiple units, each of which performs one or more of the multiple steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a particular apparatus is described based on one or more units, such as functional units, the corresponding method may comprise one step to perform the functionality of the one or more units (e.g., one step performs the functionality of the one or more units, or multiple steps, each of which performs the functionality of one or more of the units), even if such one or more steps are not explicitly described or illustrated in the figures. Further, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless explicitly stated otherwise.

For ease of understanding, some relevant technical knowledge related to the embodiments of the present disclosure will be introduced. The following related arts as alternatives can be arbitrarily combined with the technical solutions of the embodiments of the present disclosure, which all belong to the scope of protection of the embodiments of the present disclosure. Embodiments of the present disclosure include at least some of the following.

Connecting operation

In relational databases, database query statements (e.g., SQL statements) often contain JOIN Operations (JOIN). Based on join operations, two data sets (e.g., two database tables) may be merged according to a certain condition to form a new data set.

Databases typically support multiple types of join operations, differing primarily between them: the method used to select rows for connection from different sets of data that overlap each other is different. Taking SQL statements as an example, the SQL syntax format of JOIN Operations (JOIN) is: data set 1 [ connection type ] JOIN data set 2 [ connection condition ].

There are a variety of connection types. For example, the connection type may include an INNER connection (INNER JOIN), an OUTER connection (OUTER JOIN), a SEMI-connection (SEMI-JOIN), an ANTI-connection (ANTI-JOIN), and the like. The meaning of these connections is briefly described below.

The inner join only connects matching rows in the two data tables. Assuming tables a and B, the result of (a INNER JOIN B ON a.a = b.b) is such: a and B are first subjected to a cartesian product and then a selection operation is performed, only the rows of the cartesian product that satisfy a.a = b.b are retained.

And returning the outer connection to the row matched with the two tables, and filling the empty table on one side of the unmatched row and then outputting the empty table. The OUTER connections may include LEFT OUTER connections (LEFT OUTER connections), RIGHT OUTER connections (RIGHT OUTER connections), and FULL OUTER connections (FULL OUTER connections). The left outer join means that each tuple in the left table is taken as one tuple of the result, and the attribute corresponding to the tuple meeting the condition in the right table is added to the result. For tuples in the right table that cannot satisfy the predicate condition, their corresponding attributes are set to NULL (NULL) in the final result. From the results produced, the left outer join is equivalent to adding such tuples on the basis of the inner join: their attribute values cannot match any tuple of the right table. The right outer connection is symmetrical to the left outer connection. And a full outer connection is the union of the left outer connection and the right outer connection.

Semi-join refers to data that returns only on a match in two tables and only one table (e.g., the left table) when the two tables are associated. The semi-join is typically represented by key words such as IN and EXISTS that appear IN the database query statement.

Anti-join refers to returning only data of the main table (e.g., left table) when two tables are associated, and returning only data on which the main table is not associated with a sub table. The anti-join is generally represented by keywords such as NOT IN and NOT EXISTS appearing IN the database query statement.

Connection condition

The joining operation is generally performed under a certain joining condition. Common ligation conditions include three: NATURAL JOIN (NATURAL JOIN), conditional JOIN (ON predicate condition), and JOIN of specified attributes (USING attribute).

For the conditional connection, the condition of the connection operation may be a simple condition or may be a complex condition. The complex condition is a combination of a plurality of simple conditions. That is, a plurality of simple elements may be combined together using a logical relationship such as "AND", "OR", OR ", etc., to form a more complex condition.

Connection algorithm

In the database, the join operation may be implemented using one or more join algorithms. The connection algorithm may include one or more of the following algorithms: nested circular join (NESTEDLOOPJOIN), merge join (MERGEJOIN) and hash join (HASHJOIN).

A nested loop join consists of two FOR loops and is therefore named. Assuming that the two tables to be connected are T1 and T2 and the connection condition is P, the nested loop connection selects one of them as the outer loop and the other as the inner loop, and compares each tuple in the tables T1 and T2 one by one to find out all matched tuples. Nested loop connections are relatively versatile. Nested loop joining, like linear scanning, does not require indexing, and the algorithm works regardless of the joining conditions. That is, the algorithm needs only slight adjustments to operate for any type of join operation. However, nested loop joins are less efficient to execute. Because each tuple in the table T1 and the table T2 needs to be compared one by one in the algorithm process, the exchange between a disk and a memory caused by the tuple is more frequent when the data size is large and cannot be completely put into the memory, and even if the data can be completely put into the memory, the CACHE hit rate of a CPU is low in the nested loop connection execution process, thereby seriously affecting the system efficiency.

The join algorithm, also known as sort join, can be used to compute natural joins and isojoins. Assuming that the two tables to be joined are T1 and T2, then tables T1 and T2 are ordered before joining. And then scanning the two tables respectively once. The merge join is very efficient to perform, with a linear time complexity of O (n), where n is the number of tuples in the table with the largest number of tuples in tables T1 and T2.

The hash join is similar to the merge join. In the hash join algorithm, a hash function is used to divide two tables. The basic idea of this algorithm is to divide the two tables by join attributes into sets of tuples having the same hash value. Hash joins do not require indexing and are easier to process with large result sets than nested loop joins.

For a join operation in a database query statement, if the join condition of the join operation includes a plurality of conditions, and the plurality of conditions are joined in a logical relationship (OR predicate) OR, the join operation can only be performed using a nested loop join algorithm when executed. As described above, the execution efficiency of the nested loop join algorithm is poor.

In view of the above problems, the embodiments of the present disclosure OR develop the connection conditions of the connection operation. By OR expansion, we mean splitting a database query statement into multiple sub-query statements, each of which executes one branch of the predicate OR. Through OR unfolding, a complex OR conditional join can be converted into a plurality of simple conditional joins. Different from the OR conditional join, which can only be executed by using a nested loop join algorithm with low efficiency, the simple conditional join can select a suitable one OR more join algorithms (such as a join algorithm with high efficiency OR excellent performance) from a plurality of join algorithms, thereby improving the execution performance of the database query language.

The embodiments of the present disclosure are described in detail below.

FIG. 1 is a method of querying data. The method 100 of FIG. 1 may be performed by a query optimizer in a database. The database may be a general database or a distributed database, such as an OceanBase database. A user of the database may access the database through a database query statement. For example, a user may read data from the database using a database query statement. Alternatively, the user may write data to the database using a database query statement. The database query statement referred to herein may be, for example, an SQL statement. One or more data sets may be stored in the database. The data collection is typically in the form of database tables.

Referring to fig. 1, in step S110, a database query statement is received. The database query statement includes a join operation of a first data set (e.g., a first data table) and a second data set (e.g., a second data table).

The embodiments of the present disclosure do not specifically limit the type of the connection operation. In some embodiments, the connection operation may be an external connection operation, a semi-connection operation, or an inverse connection operation. Taking the connection operation as an external connection operation as an example, the connection operation may be a left external connection operation, a right external connection operation, or a full external connection operation.

The connection condition corresponding to the connection operation may include a first condition and a second condition. The logical relationship between the first condition and the second condition is OR, OR the first condition and the second condition are connected by predicate OR. In some embodiments, the first condition and the second condition may be referred to as a simple condition, which may be an equal value join condition, for example.

As an example, the join operation in the database query statement is an external join operation, and the specific form of the database query statement is as follows:

SELECT T1.C1 FROM T1 LEFT JOIN T2 ON (T1.C1 = T2.C1 OR T1.C2 = T2.C2) LEFT JOIN T3 ON T1.C1=T3.C1.

in the above example, the left join represents a left outer join operation whose join condition (i.e., the join condition after the ON predicate) includes a first condition "t 1.c1 = t2.c 1" and a second condition "t 1.c2 = t2.c 2", and the first condition and the second condition are connected by the predicate OR.

As another example, the join operation in the database query statement is a semi-join operation, and the specific form of the database query statement is as follows:

SELECT C1, C2 FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T1.C1 = T2.C1 OR T1.C2 = T2.C2).

in the above example, EXISTS represents a half join operation whose join condition includes a first condition "t 1.c1 = t2.c 1" and a second condition "t 1.c2 = t2.c 2", and the first condition and the second condition are connected by the predicate OR.

As yet another example, the join operation in the database query statement is a reverse join operation, and the specific form of the database query statement is as follows:

SELECT C1, C2 FROM T1 WHERE NOT EXISTS (SELECT 1 FROM T2 WHERE T1.C1 = T2.C1 OR T1.C2 = T2.C2).

in the above example, NOT EXISTS represents a half-join operation whose join condition includes a first condition "t 1.c1 = t2.c 1" and a second condition "t 1.c2 = t2.c 2", and the first condition and the second condition are connected by the predicate OR.

In step S120, the database query statement is split into a first sub-query statement (or first query branch) and a second sub-query statement (or second query branch).

The first sub-query statement comprises a join operation, and the join operation is the same as the join operation in the database query statement before splitting. In addition, the join condition of the join operation in the first sub-query statement is a first condition. In other words, the join operation in the first sub-query statement is consistent with the join operation in the database query statement, except that the join condition corresponding to the join operation in the first sub-query statement becomes one of the simple join conditions of the OR join conditions, i.e., the first condition mentioned above.

The second sub-query statement includes a join operation, and the join operation is the same as the join operation in the database query statement before splitting. In addition, the join condition of the join operation in the second sub-query statement is a second condition. In other words, the join operation in the second sub-query statement is consistent with the join operation in the database query statement, except that the join condition corresponding to the join operation in the second sub-query statement becomes one of the simple join conditions, i.e., the second condition mentioned above, in the OR join condition.

In some embodiments, a selection statement in the first and second sub-query statements may include a primary key of the first data set (or first data table) and/or the second data set (or second data table) to facilitate subsequent merging of the query structures of the two sub-query statements.

It should be understood that the type of join operation included in the database query statement may be different, and the implementation of step S120 may be different. For example, for some join operations, some pre-processing of the database query statement may be required first. For example, for an outerjoin operation, an inline view (which may be understood as one sub-statement in a database query statement) may be created first, and then split into a first sub-query statement and a second sub-query statement.

In some embodiments, prior to unrolling the OR connection condition, the OR connection condition may be examined to determine whether the OR connection condition satisfies the unrollable base condition. For example, the OR connection condition may be checked to determine if the connection table can create a non-empty/unique output column, and if so, the OR connection condition is expanded; if not, the database query statement can be directly executed without OR expansion.

In step S130, the query result of the first sub-query statement and the query result of the second sub-query statement are merged to obtain the query result of the database query statement.

Since the join condition in the first sub-query statement and/or the second sub-query statement is a simple join condition, the first sub-query statement and/or the second sub-query statement may be implemented by using a plurality of join algorithms. For example, other join algorithms besides the nested loop join algorithm may be employed to compute the query results for the first sub-query statement and/or the second sub-query statement. The other algorithms may include, for example, the hash join algorithm and/or the merge join algorithm mentioned earlier. The other algorithm has higher execution efficiency relative to the nested loop join algorithm.

In summary, the embodiment of the present disclosure expands OR condition connection, and because the condition obtained after expansion is a simple connection condition, it is possible to flexibly select multiple connection algorithms including hash connection to perform operations on connection operations, so that the execution time of such database query statements can be significantly reduced.

Step S130 may include, for example: a UNION (UNION) of the query results of the first sub-query statement and the query results of the second sub-query statement is computed. In addition, in some embodiments, if there is duplicate data after the first query structure and the second query structure are merged, the data may also need to be deduplicated. For example, assuming that the join operation is an external join operation, the external join operation needs to fill and output the unmatched rows in the two tables. Therefore, the query result of the first sub-query statement and the query result of the second sub-query statement may contain repeated NULL-fill data (i.e., data in NULL-fill rows that takes a value of NULL). In this case, repeated null data in the query result of the first sub-query statement and the query result of the second sub-query statement needs to be removed. There are many ways to perform deduplication, for example, deduplication can be performed by adding a window function and a predicate condition (detailed description can be referred to as an example below).

In some embodiments, if the query results of the first sub-query statement and/or the query results of the second sub-query statement lack a non-empty/unique column, a respective output column may be added to the query results of the first sub-query statement and/or the query results of the second sub-query statement.

Hereinafter, the embodiments of the present disclosure will be described in more detail by taking the external connection operation, the half connection operation, and the internal connection operation as examples, respectively. It should be noted that the following examples are merely intended to assist those skilled in the art in understanding the disclosed embodiments and are not intended to limit the disclosed embodiments to the specific values or specific contexts illustrated. It will be apparent to those skilled in the art from the examples set forth below that various equivalent modifications or variations can be made, and such modifications or variations fall within the scope of the embodiments of the present disclosure.

Example 1: operation of external connection

The database query statement is as follows:

for this statement, the inline view V may first be created using the outer connections containing the OR connection conditions:

SELECT V.C1 FROM (SELECT T1.C1 FROM T1 LEFT JOIN T2 ON (T1.C1 = T2.C1 OR T1.C2 = T2.C2)) V LEFT JOIN T3 ON V.C1 = T3.C1.

the linkage view V may then be OR expanded to form two sub-query statements:

SUB_SELECT_1:

SELECT T1.C1, T1.PK PK1, T2.PK PK2 FROM T1 LEFT JOIN T2 ON T1.C1 = T2.C1;

SUB_SELECT_2:

SELECT T1.C1, T1.PK PK1, T2.PK PK2 FROM T1 LEFT JOIN T2 ON T1.C2 = T2.C2 AND LNNVL (T1.C1 = T2.C1).

as can be seen from SUB _ SELECT _1, the join operation of SUB _ SELECT _1 is the same as the join operation in the original database query statement, and is the external join operation of T1 table and T2 table, except that the join condition of the external join operation in SUB _ SELECT _1 is changed into a simple join condition: t1.c1 = t2.c 1.

As can be seen from SUB _ SELECT _2, the join operation of SUB _ SELECT _2 is the same as the join operation in the original database query statement, and is the external join operation of T1 table and T2 table, except that the join condition of the external join operation in SUB _ SELECT _2 is changed into a simple join condition: t1.c2 = t2.c 2. In addition, in order to avoid that SUB _ SELECT _2 repeatedly SELECTs a matching row that has been selected by SUB _ SELECT _1, a condition AND LNNVL (t1.c1 = t2.c1) is added to SUB _ SELECT _2, that is, the query result of SUB _ SELECT _2 includes a matching row that does not include a matching row satisfying "t 1.c1 = t2.c 1".

In addition, in order to make the query of SUB _ SELECT _1 and SUB _ SELECT _2 equivalent to the query of the original database query statement, the query results of SUB _ SELECT _1 and SUB _ SELECT _2 need to be merged. To facilitate the subsequent merging process, the SELECT statements of SUB _ SELECT _1 and SUB _ SELECT _2 are added with the primary keys of T1 table and T2 table, i.e., T1.PK 1 and T2.PK 2.

And processing the two expanded sub-query sentences. The left external connection needs to fill in and output the ROW of the right table which IS not matched, and the two expanded sub query statements may repeatedly output the filled-in ROW, so after the query results of the two expanded sub query statements are merged by using UNION ALL, a ROW _ NUMBER window function and a predicate RN = 1 OR PK2 IS not are added to remove the redundant filled-in ROW:

SELECT C1

FROM (SELECT C1, PK2, ROW_NUMBER() OVER (PARTITION BY PK1 ORDER BY PK2 NULLS LAST) AS RN

FROM (SUB_SELECT_1 UNION ALL SUB_SELECT_2) V1) V2

WHERE RN = 1 OR PK2 IS NOT NULL.

finally, the query optimizer generates an execution plan as follows:

=========================================

|ID|OPERATOR |NAME |

-----------------------------------------

|0 |HASH RIGHT OUTER JOIN | |

|1 | TABLE SCAN |T3 |

|2 | SUBPLAN SCAN |VIEW3 |

|3 | WINDOW FUNCTION | |

|4 | SORT | |

|5 | SUBPLAN SCAN |VIEW2 |

|6 | UNION ALL | |

|7 | MERGE OUTER JOIN| |

|8 | TABLE SCAN |T1(IDX_T1_C1)|

|9 | TABLE SCAN |T2(IDX_T2_C1)|

|10| HASH OUTER JOIN | |

|11| TABLE SCAN |T1 |

|12| TABLE SCAN |T2 |

=========================================

accordingly, the output result corresponding to each ID in the execution plan is as follows:

-------------------------------------

0 - output([VIEW3.VIEW2.T1.C1]), filter(nil),

equal_conds([VIEW3.VIEW2.T1.C1 = T3.C1]), other_conds(nil)

1 - output([T3.C1]), filter(nil),

access([T3.C1]), partitions(p0)

2 - output([VIEW3.VIEW2.T1.C1]), filter([VIEW3.RN = 1 OR (T_OP_IS_NOT, VIEW3.VIEW2.T2.PK, NULL, 0)]),

access([VIEW3.VIEW2.T1.C1], [VIEW3.VIEW2.T2.PK], [VIEW3.RN])

3 - output([VIEW2.T1.C1], [VIEW2.T2.PK], [T_WIN_FUN_ROW_NUMBER()]), filter(nil),

win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([VIEW2.T1.PK]), order_by([VIEW2.T2.PK, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)

4 - output([VIEW2.T1.C1], [VIEW2.T2.PK], [VIEW2.T1.PK]), filter(nil), sort_keys([VIEW2.T1.PK, ASC], [VIEW2.T2.PK, ASC])

5 - output([VIEW2.T1.C1], [VIEW2.T2.PK], [VIEW2.T1.PK]), filter(nil),

access([VIEW2.T1.C1], [VIEW2.T2.PK], [VIEW2.T1.PK])

6 - output([UNION([1])], [UNION([2])], [UNION([3])]), filter(nil)

7 - output([T1.C1], [T2.PK], [T1.PK]), filter(nil),

equal_conds([T1.C1 = T2.C1]), other_conds(nil)

8- output([T1.C1], [T1.PK]), filter(nil),

access([T1.C1], [T1.PK]), partitions(p0)

9- output([T2.C1], [T2.PK]), filter(nil),

access([T2.C1], [T2.PK]), partitions(p0)

10 - output([T1.C1], [T2.PK], [T1.PK]), filter(nil),

equal_conds([T1.C2 = T2.C2]), other_conds([lnnvl(cast(T1.C1 = T2.C1, TINYINT(-1, 0)))])

11- output([T1.C1], [T1.C2], [T1.PK]), filter(nil),

access([T1.C1], [T1.C2], [T1.PK]), partitions(p0)

12 - output([T2.C1], [T2.C2], [T2.PK]), filter(nil),

access([T2.C1], [T2.C2], [T2.PK]), partitions(p0)

and executing the execution plan to obtain a query result of the database query statement.

Example 2:semi-joining operation

The database query statement is as follows:

for the SQL statement, it can be directly split into the following two sub-query statements:

SUB_SELECT_1:

SELECT PK, C1, C2 FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T1.C1 = T2.C1).

SUB_SELECT_2:

SELECT PK, C1, C2 FROM T1 WHERE EXISTS (SELECT 1 FROM T2 WHERE T1.C2 = T2.C2).

as can be seen from SUB _ SELECT _1, the join operation of SUB _ SELECT _1 is the same as the join operation in the original database query statement, and is the half join operation of the T1 table and the T2 table, except that the join condition of the half join operation in SUB _ SELECT _1 is changed into a simple join condition: t1.c1 = t2.c 1.

As can be seen from SUB _ SELECT _2, the join operation of SUB _ SELECT _2 is the same as the join operation in the original database query statement, and is the half join operation of the T1 table and the T2 table, except that the join condition of the half join operation in SUB _ SELECT _2 is changed into a simple join condition: t1.c2 = t2.c 2.

In addition, in order to make the query results of SUB _ SELECT _1 and SUB _ SELECT _2 equivalent to the query results of the original database query statement, the query results of SUB _ SELECT _1 and SUB _ SELECT _2 need to be merged. To facilitate the subsequent merge process, the primary key PK of the T1 table is added to the SELECT statements of SUB _ SELECT _1 and SUB _ SELECT _2 (since the half join operation returns only the row matching the T1 table, the primary key of the T2 table does not need to be added).

The half join operation returns only the row that the left table matches, and the two sub-query statements may return duplicate data, so the data returned by the two sub-query statements are de-duplicated using UNION and:

SELECT C1, C2FROM (SUB_SELECT_1 UNION SUB_SELECT_2) V.

finally, the query optimizer generates an execution plan as follows:

=========================================

|ID|OPERATOR |NAME |

-----------------------------------------

|0 |SUBPLAN SCAN |VIEW2 |

|1 | HASH UNION DISTINCT | |

|2 | HASH RIGHT SEMI JOIN| |

|3 | TABLE SCAN |T2(IDX_T2_C1)|

|4 | TABLE SCAN |T1 |

|5 | HASH RIGHT SEMI JOIN| |

|6 | TABLE SCAN |T2(IDX_T2_C2)|

|7 | TABLE SCAN |T1 |

=========================================

-------------------------------------

0 - output([VIEW2.C1], [VIEW2.C2]), filter(nil),

access([VIEW2.C1], [VIEW2.C2])

1 - output([UNION([1])], [UNION([2])], [UNION([3])]), filter(nil)

2 - output([T1.C1], [T1.C2], [T1.PK]), filter(nil),

equal_conds([T1.C1 = T2.C1]), other_conds(nil)

3 - output([T2.C1]), filter(nil),

access([T2.C1]), partitions(p0)

4 - output([T1.C1], [T1.C2], [T1.PK]), filter(nil),

access([T1.C1], [T1.C2], [T1.PK]), partitions(p0)

5 - output([T1.C1], [T1.C2], [T1.PK]), filter(nil),

equal_conds([T1.C2 = T2.C2]), other_conds(nil)

6 - output([T2.C2]), filter(nil),

access([T2.C2]), partitions(p0)

7 - output([T1.C1], [T1.C2], [T1.PK]), filter(nil),

access([T1.C1], [T1.C2], [T1.PK]), partitions(p0)

Example 3: reverse connect operation

The database query statement is as follows:

the expansion modes of the anti-join operation and the semi-join operation are similar, except that the anti-join operation returns rows with unmatched left tables, so that the query results of each expanded sub-query statement can be merged and deduplicated by using INTERSECT. The specific deployment process can be seen in example 2 and will not be described in detail here.

Finally, the query optimizer generates an execution plan as follows:

===========================================

|ID|OPERATOR |NAME |

-------------------------------------------

|0 |SUBPLAN SCAN |VIEW2 |

|1 | HASH INTERSECT DISTINCT| |

|2 | HASH RIGHT ANTI JOIN | |

|3 | TABLE SCAN |T2(IDX_T2_C1)|

|4 | TABLE SCAN |T1 |

|5 | HASH RIGHT ANTI JOIN | |

|6 | TABLE SCAN |T2(IDX_T2_C2)|

|7 | TABLE SCAN |T1 |

===========================================

Method embodiments of the present disclosure are described in detail above in conjunction with fig. 1, and apparatus embodiments of the present disclosure are described in detail below in conjunction with fig. 2. It is to be understood that the description of the method embodiments corresponds to the description of the apparatus embodiments, and therefore reference may be made to the preceding method embodiments for parts not described in detail.

Fig. 2 is a schematic structural diagram of an apparatus for querying data according to an embodiment of the present disclosure. The apparatus 200 of fig. 2 includes a receiving module 210, a splitting module 220, and a merging module 230. The receiving module 210 is configured to receive a database query statement, where the database query statement includes a connection operation of a first data set and a second data set, a connection condition corresponding to the connection operation includes a first condition and a second condition, and a logical relationship between the first condition and the second condition is "or". The splitting module 220 is configured to split the database query statement into a first sub-query statement and a second sub-query statement, where the first sub-query statement includes the join operation, a join condition of the join operation in the first sub-query statement is the first condition, the second sub-query statement includes the join operation, and a join condition of the join operation in the second sub-query statement is the second condition. The merging module 230 is configured to merge the query result of the first sub-query statement and the query result of the second sub-query statement to determine the query result of the database query statement.

Optionally, in some embodiments, the connection operation is an external connection operation, a semi-connection operation, or an inverse connection operation.

Optionally, in some embodiments, the join operation is an outer join operation, the query result of the first sub-query statement includes first null data generated by performing the outer join operation according to the first condition, the query result of the second sub-query statement includes second null data generated by performing the outer join operation according to the second condition, and the merging module is configured to remove duplicate data in the first null data and the second null data.

Optionally, in some embodiments, the connection operation is an external connection operation, and the splitting module is configured to create an inline view for the external connection operation; splitting the inline view into the first sub-query statement and the second sub-query statement.

Optionally, in some embodiments, a selection statement in the first sub-query statement and the second sub-query statement each include a primary key of the first data set and/or the second data set.

Optionally, in some embodiments, the join operation of the first sub-query statement and/or the second sub-query statement is performed based on a target join algorithm, the target join algorithm including other join algorithms other than a nested loop join algorithm.

Optionally, in some embodiments, the other join algorithm comprises a hash join algorithm and/or a merge join algorithm.

Fig. 3 is a schematic structural diagram of an apparatus for querying data according to another embodiment of the present disclosure. The apparatus 300 for querying data depicted in fig. 3 may include a memory 310 and a processor 320, where the memory 310 may be used to store instructions. The processor 320 may be configured to execute instructions stored in the memory 310 to implement the steps of the various methods described previously. In some embodiments, the apparatus 300 may further include a network interface 330, and the data exchange between the processor 320 and the external device may be implemented through the network interface 330.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the disclosure are, in whole or in part, generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of querying data, comprising:

receiving a database query statement, wherein the database query statement comprises a connection operation of a first data set and a second data set, the connection condition corresponding to the connection operation comprises a first condition and a second condition, and a logic relationship between the first condition and the second condition is OR;

splitting the database query statement into a first sub-query statement and a second sub-query statement, wherein the first sub-query statement comprises the join operation, a join condition of the join operation in the first sub-query statement is the first condition, the second sub-query statement comprises the join operation, and a join condition of the join operation in the second sub-query statement is the second condition;

and combining the query result of the first sub-query statement and the query result of the second sub-query statement to determine the query result of the database query statement.

2. The method of claim 1, wherein the join operation is an external join operation, a semi-join operation, or an inverse join operation.

3. The method of claim 1, wherein the join operation is an outerjoin operation, wherein the query result of the first sub-query statement comprises first null-fill data generated by performing the outerjoin operation under the first condition, wherein the query result of the second sub-query statement comprises second null-fill data generated by performing the outerjoin operation under the second condition,

the merging the query result of the first sub-query statement and the query result of the second sub-query statement includes:

and removing repeated data in the first gap filling data and the second gap filling data.

4. The method of claim 1, wherein the join operation is an outerjoin operation, and wherein splitting the database query statement into a first sub-query statement and a second sub-query statement comprises:

creating an inline view for the out-connect operation;

splitting the inline view into the first sub-query statement and the second sub-query statement.

5. The method of claim 1, wherein a select statement in the first sub-query statement and the second sub-query statement comprises a primary key of the first data set and/or the second data set.

6. The method of claim 1, wherein the join operation of the first sub-query statement and/or the second sub-query statement is performed based on a target join algorithm, the target join algorithm comprising a join algorithm other than a nested loop join algorithm.

7. The method of claim 6, wherein the other join algorithms comprise hash join algorithms and/or merge join algorithms.

8. An apparatus for querying data, comprising:

the system comprises a receiving module, a judging module and a sending module, wherein the receiving module is used for receiving a database query statement, the database query statement comprises a connection operation of a first data set and a second data set, the connection condition corresponding to the connection operation comprises a first condition and a second condition, and the logic relationship between the first condition and the second condition is OR;

a splitting module, configured to split the database query statement into a first sub-query statement and a second sub-query statement, where the first sub-query statement includes the join operation, a join condition of the join operation in the first sub-query statement is the first condition, the second sub-query statement includes the join operation, and a join condition of the join operation in the second sub-query statement is the second condition;

and the merging module is used for merging the query result of the first sub-query statement and the query result of the second sub-query statement to determine the query result of the database query statement.

9. The apparatus of claim 8, wherein the connection operation is an external connection operation, a semi-connection operation, or an inverse connection operation.

10. The apparatus of claim 8, wherein the join operation is an outerjoin operation, wherein the query result of the first sub-query statement comprises first null-fill data generated by performing the outerjoin operation according to the first condition, wherein the query result of the second sub-query statement comprises second null-fill data generated by performing the outerjoin operation according to the second condition, and wherein the merge module is configured to remove duplicate data from the first null-fill data and the second null-fill data.

11. The apparatus of claim 8, wherein the connection operation is an out-connection operation, and wherein the splitting module is configured to create an inline view for the out-connection operation; splitting the inline view into the first sub-query statement and the second sub-query statement.

12. The apparatus of claim 8, wherein a select statement in the first sub-query statement and the second sub-query statement each comprise a primary key of the first data set and/or the second data set.

13. The apparatus of claim 8, wherein the join operation of the first sub-query statement and/or the second sub-query statement is performed based on a target join algorithm, the target join algorithm comprising a join algorithm other than a nested loop join algorithm.

14. The apparatus of claim 13, wherein the other join algorithms comprise a hash join algorithm and/or a merge join algorithm.

15. An apparatus for querying data, comprising:

a memory to store instructions;

a processor for executing instructions stored in the memory to perform the method of any one of claims 1-7.