CN117762974A

CN117762974A - Data query method, device, electronic equipment and storage medium

Info

Publication number: CN117762974A
Application number: CN202311723580.2A
Authority: CN
Inventors: 原显智
Original assignee: Jinzhuan Xinke Co Ltd
Current assignee: Jinzhuan Xinke Co Ltd
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-03-26

Abstract

The embodiment of the application relates to a data query method, a data query device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining a target sentence, wherein the target sentence is a structured query language; determining a query operator tree of the target sentence to obtain a first query operator tree, wherein nodes of the first query operator tree represent operators or data tables; pushing down the association connection operator under the single connection operator to obtain a second query operator tree under the condition that the operators represented by the nodes of the first query operator tree comprise the association connection operator and the single connection operator; and carrying out data query based on the second query algorithm tree. Thus, the efficiency of data query can be improved.

Description

Data query method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data query method, a data query device, an electronic device, and a storage medium.

Background

In the related art, in the logic optimizing part of the database optimizer, the time complexity of a query join operator (join operator) can be reduced by converting the associated sub-query into the non-associated sub-query, and the execution time is compressed. In the method for decorrelating the current associated sub-queries, the associated connection operator push-down technology has stronger universality. But the current technology does not cover the push down situation of all operators in application, resulting in inefficient data querying.

It can be seen that how to improve the efficiency of data query is a technical problem of concern.

Disclosure of Invention

In view of this, in order to solve some or all of the above technical problems, embodiments of the present application provide a data query method, a device, an electronic apparatus, and a storage medium.

In a first aspect, an embodiment of the present application provides a data query method, where the method includes:

obtaining a target sentence, wherein the target sentence is a structured query language;

determining a query operator tree of the target sentence to obtain a first query operator tree, wherein nodes of the first query operator tree represent operators or data tables;

pushing down the association connection operator under the single connection operator to obtain a second query operator tree under the condition that the operators represented by the nodes of the first query operator tree comprise the association connection operator and the single connection operator;

and carrying out data query based on the second query algorithm tree.

In a possible implementation manner, the pushing down the association join operator under the single join operator includes:

determining a de-duplication association column of a target data table, wherein the target data table indicates a queried data table for an outer query statement in the target statements, and the de-duplication association column represents a result after data de-duplication is carried out on the association column of the target data table;

Based on the deduplication association column, the association join operator is pushed down under the single join operator.

In a possible implementation manner, the pushing down the association linkage operator under the single linkage operator based on the deduplication association column includes:

determining predicates in the association connection operator to obtain a first predicate;

determining whether the first predicate relates to a target column or not, and obtaining discrimination information, wherein the target column is a column in a right side table of the single-connection operator;

pushing down the associated join operator under the single join operator based on the discrimination information and the deduplication associated column.

In one possible implementation manner, the pushing down the association linkage operator under the single linkage operator based on the discrimination information and the deduplication association column includes:

and under the condition that the discrimination information indicates that the first predicate does not relate to the target column, determining that a root node of a second query operator tree to be generated represents a single connection operator, a first left child node of the root node represents a first association connection operator, and a first right child node of the root node represents a second association connection operator.

In one possible implementation manner, the right child node of the first left child node represents a left side table of the single connection operator, the left child node of the first left child node represents a data table corresponding to the deduplication association column, the left child node of the first right child node represents a data table corresponding to the deduplication association column, and the right child node of the first right child node represents a right side table of the single connection operator.

determining a target processing result corresponding to the first predicate based on the first predicate when the discrimination information indicates that the first predicate relates to the target column;

pushing down the associated join operator under the single join operator based on the target processing result and the deduplication associated column.

In one possible implementation manner, the pushing down the association linkage operator under the single linkage operator based on the target processing result and the de-duplication association column includes:

and under the condition that the target processing result is necessarily a false value, determining that a root node of a second query algorithm tree to be generated represents the first predicate, a unique child node of the root node represents an interconnection operator with single result constraint, a second left child node of the unique child node represents a third association connection operator, a second right child node of the unique child node represents a fourth association connection operator, a left child node of the second left child node represents a data table corresponding to a deduplication association column, a right child node of the second left child node represents a left table of a single connection operator in the first query algorithm tree, a left child node of the second right child node represents a data table corresponding to a deduplication association column, and a right child node of the second right child node represents a right table of a single connection operator in the first query algorithm tree.

and under the condition that the target processing result is not necessarily a false value, determining that a root node of a second query algorithm tree to be generated represents the first predicate, a unique child node of the root node represents a single-connection operator, a third left child node of the unique child node represents a fifth association connection operator, a third right child node of the unique child node represents a sixth association connection operator, a left child node of the third left child node represents a data table corresponding to a deduplication association column, a right child node of the third left child node represents a left table of the single-connection operator in the first query algorithm tree, a left child node of the third right child node represents a data table corresponding to a deduplication association column, and a right child node of the third right child node represents a right table of the single-connection operator in the first query algorithm tree.

In a second aspect, an embodiment of the present application provides a data query apparatus, including:

the system comprises an acquisition unit, a query unit and a query unit, wherein the acquisition unit is used for acquiring a target sentence, and the target sentence is a structured query language;

The determining unit is used for determining a query algorithm tree of the target sentence to obtain a first query algorithm tree, wherein nodes of the first query algorithm tree represent operators or data tables;

a pushing unit, configured to push, when an operator represented by a node of the first query operator tree includes a correlation connection operator and a single connection operator, the correlation connection operator below the single connection operator, so as to obtain a second query operator tree;

and the query unit is used for carrying out data query based on the second query algorithm tree.

In a third aspect, an embodiment of the present application provides an electronic device, including:

a memory for storing a computer program;

and a processor, configured to execute a computer program stored in the memory, where the computer program is executed to implement a method according to any embodiment of the data query method of the first aspect of the present application.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as in any of the embodiments of the data query method of the first aspect described above.

In a fifth aspect, embodiments of the present application provide a computer program comprising computer readable code which, when run on a device, causes a processor in the device to implement a method as in any of the embodiments of the data query method of the first aspect described above.

According to the data query method provided by the embodiment of the application, a target sentence can be obtained, wherein the target sentence is a structured query language, then a query operator tree of the target sentence is determined to obtain a first query operator tree, wherein nodes of the first query operator tree represent operators or data tables, then under the condition that the operators represented by the nodes of the first query operator tree comprise a correlation operator and a single connection operator, the correlation operator is pushed down below the single connection operator to obtain a second query operator tree, and then data query is performed based on the second query operator tree. Therefore, the logic optimization of the database optimizer can be performed by pushing the association connection operator down to the single connection operator, so that the efficiency of data query is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.

Fig. 1 is a schematic flow chart of a data query method provided in an embodiment of the present application;

fig. 2 is a flow chart of another data query method according to an embodiment of the present application;

fig. 3A is a schematic diagram of a first query algorithm tree related to a data query method according to an embodiment of the present application;

FIG. 3B is a schematic diagram of a query algorithm tree after a de-duplication association list is generated in a data query method according to an embodiment of the present application;

FIG. 3C is a schematic diagram of a second query algorithm tree related to a data query method according to an embodiment of the present application;

FIG. 3D is a schematic diagram of another second query algorithm tree related to a data query method according to an embodiment of the present application;

FIG. 3E is a schematic diagram of a second query algorithm tree according to another embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a data query device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings, it being apparent that the described embodiments are some, but not all embodiments of the present application. It should be noted that: the relative arrangement of the parts and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise.

It will be appreciated by those skilled in the art that terms such as "first," "second," and the like in the embodiments of the present application are used merely to distinguish between different steps, devices, or modules, and do not represent any particular technical meaning or logical sequence therebetween.

It should also be understood that in this embodiment, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.

It should also be appreciated that any component, data, or structure referred to in the embodiments of the present application may be generally understood as one or more without explicit limitation or the contrary in the context.

In addition, the term "and/or" in this application is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In this application, the character "/" generally indicates that the associated object is an or relationship.

It should also be understood that the description of the embodiments herein emphasizes the differences between the embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.

The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the application, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. For an understanding of the embodiments of the present application, the present application will be described in detail below with reference to the drawings in conjunction with the embodiments. It will be apparent that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In order to solve the technical problem of how to improve the efficiency of data query in the prior art, the application provides a data query method which can improve the efficiency of data query.

Fig. 1 is a flow chart of a data query method according to an embodiment of the present application. The method can be applied to one or more electronic devices such as smart phones, notebook computers, desktop computers, portable computers, servers and the like. The main execution body of the method may be hardware or software. When the execution body is hardware, the execution body may be one or more of the electronic devices. For example, a single electronic device may perform the method, or a plurality of electronic devices may cooperate with one another to perform the method. When the execution subject is software, the method may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module. The present invention is not particularly limited herein.

As shown in fig. 1, the method specifically includes:

step 101, obtaining a target sentence, wherein the target sentence is a structured query language.

In this embodiment, the structured query language (Structure Query Language, SQL) is a database query and programming language for accessing data and querying, updating and managing relational database systems.

In some cases, the structured query language may include sub-queries. Sub-queries (i.e., sub-query statements), refer to query languages in which one query statement is nested within another query statement. For example, if a select statement is capable of returning a single value or a list of values and the select statement is nested in another SQL statement (e.g., a select statement, insert statement, update statement, or delete statement), then the select statement may be referred to as a sub-query (also called an inner-layer query, inner-layer query statement), and the SQL statement containing the sub-query is referred to as a main query (also called an outer-layer query, outer-layer query statement). To mark the relationship between the sub-queries and the main query, the sub-queries are typically written in brackets. Sub-queries are typically used in a sphere clause or a havin clause of a main query, together with comparison operators or logical operators, to construct sphere screening conditions or havin screening conditions. The sub-queries are divided into associated sub-queries (Dependent Subquery) and non-associated sub-queries. The non-associated sub-queries may be sub-queries that can be independently run without relying on the main query. If only self-defined data sources are used in a sub-query, such a query is a non-associated sub-query. The non-associated sub-queries are sub-queries independent of the external query, the sub-queries being executed a total of one time, and the values being passed to the main query after execution. If a data source of a main query is used in a sub-query, such a query is an associated sub-query, where execution of the main query is interdependent with execution of the associated sub-query.

Here, the associated sub-query may be exists, =any, or the like, for example.

Step 102, determining a query algorithm tree of the target sentence, and obtaining a first query algorithm tree, wherein nodes of the first query algorithm tree represent operators or data tables.

In this embodiment, each SQL statement (including the target statement) may be abstracted into a query operator tree. The first query computation tree may be a query computation tree of the target statement. Wherein an operator may represent an operation on the data table.

Step 103, pushing down the association connection operator under the single connection operator to obtain a second query operator tree under the condition that the operators represented by the nodes of the first query operator tree comprise the association connection operator and the single connection operator.

In this embodiment, the relevance join (correlated join) operator may be to generate, for each record in the outer layer query, a result set corresponding to the sub-query represented by the inner layer query, and participate in predicate judgment of the relevance join operator together with the outer layer query record to determine whether to output finally.

The single join (single join) operator increases the runtime constraints over the left join (left join) operator. In the absence of run-time errors, the result of a single connection is the same as the left connection. The difference from the left connection is that for a certain record on the left, if the number of records in the right table that meet the connection condition is greater than one, an error is reported.

Left connection: for each left record, if the right table has records meeting the connection condition, the records are spliced with the left record, and output after an output column is reserved; if no record meeting the condition exists, the column of the right table in the output record is filled with null, and one filled record is output.

Here, the second query operator tree may be obtained after pushing down the nodes representing the associated join operators in the first query operator tree below the nodes representing the single join operators. In the obtained second query operator tree, the node representing the single join operator may be a parent node representing a node of the associated join operator.

And step 104, carrying out data query based on the second query algorithm tree.

In this embodiment, since each query operator tree may correspond to one SQL statement, the SQL statement corresponding to the second query operator tree may be determined, and then the data query is performed using the SQL statement corresponding to the second query operator tree.

Fig. 2 is a flow chart of another data query method according to an embodiment of the present application.

As shown in fig. 2, the method specifically includes:

step 201, obtaining a target sentence, wherein the target sentence is a structured query language.

In this embodiment, step 201 is substantially identical to step 101 in the corresponding embodiment of fig. 1, and will not be described herein.

Step 202, determining a query algorithm tree of the target sentence, and obtaining a first query algorithm tree, wherein nodes of the first query algorithm tree represent operators or data tables.

In this embodiment, step 202 is substantially identical to step 102 in the corresponding embodiment of fig. 1, and will not be described here again.

Step 203, determining a de-duplication association column of a target data table in the case that the operators represented by the nodes of the first query operator tree comprise an association connection operator and a single connection operator, wherein the target data table indicates a queried data table for an outer query statement in the target statement, and the de-duplication association column represents a result after performing data de-duplication on the association column of the target data table.

In this embodiment, the outer query statement, i.e. the outer query described above,

The association column of the data table can be used for establishing association between the inner layer query and the outer layer query.

As an example, for the following target statement, it may be determined that its associated column includes two columns of age, region:

further, for every two records, it may be determined whether the field values of the two columns of the two records, including the age and the region, are the same, and if so, one of the repeated records is removed. And removing the records whether the field values of the two columns of the age and the region are respectively the same or not in the mode to obtain a de-duplication association column. The de-duplication association column in the above example may include only two columns including an age and a region.

Step 204, pushing down the association linkage operator under the single linkage operator based on the de-duplication association column to obtain a second query operator tree.

In this embodiment, after obtaining the deduplication association column, the association join operator may be pushed down under the single join operator based on the deduplication association column.

In some optional implementations of this embodiment, the association join operator may be pushed down under the single join operator based on the deduplication association column in the following manner:

and a first step of determining predicates in the association connection operator to obtain a first predicate.

The first predicate may be a predicate in the association join operator.

And a second step of determining whether the first predicate relates to a target column to obtain discrimination information.

Wherein the target column is a column in a right-hand table of the single join operator.

Discrimination information may indicate whether the first predicate relates to a target column. In particular, where the first predicate representation operates on a column in a right-hand table of the single join operator, it may be determined that the first predicate relates to the target column; in the event that the first predicate does not represent an operation on a column in the right-hand table of the single join operator, i.e., the first predicate represents an operation on a column other than a column in the right-hand table of the single join operator (e.g., a column in a data table other than the right-hand table), it may be determined that the first predicate does not involve the target column.

The right-hand table of the single join operator may be a data table represented by a right child node of a node representing the single join operator. In other words, the right-hand table of the single join operator may be determined as follows: first, a node representing a single join operator (hereinafter referred to as node 1) is determined, then, a right child node of node 1 (hereinafter referred to as node 2) is determined, and then, the data table represented by node 2 is determined as a right-side table of the single join operator.

And thirdly, pushing the association connection operator down to the position below the single connection operator based on the discrimination information and the de-duplication association column.

In some application scenarios in the above alternative implementations, the association join operator may be pushed down under the single join operator based on the discrimination information and the deduplication association column in the following manner:

The first left child node may be a left child node of a root node of the second query computation tree to be generated. The first right child node may be the right child node of the root node of the second query computation tree to be generated.

The first association join operator may be an association join operator represented by the first left child node.

The second associated join operator may be an associated join operator represented by the first right child node.

Further, in some cases in the application scenario, the right child node of the first left child node represents a left table of the single-connection operator, the left child node of the first left child node represents a data table corresponding to the de-duplication association column, the left child node of the first right child node represents a data table corresponding to the de-duplication association column, and the right child node of the first right child node represents a right table of the single-connection operator.

The left table of the single join operator may be a data table represented by a left child node of a node representing the single join operator. In other words, the left-hand table of the single join operator may be determined as follows: first, a node representing a single join operator (hereinafter referred to as node a) is determined, then, the left child node of node a (hereinafter referred to as node B) is determined, and then, the data table represented by node B is determined as the left table of the single join operator.

first, when the discrimination information indicates that the first predicate relates to the target column, a target processing result corresponding to the first predicate is determined based on the first predicate.

The target processing result corresponding to the first predicate may be a processing result of the first predicate obtained by the first predicate determination. The target processing result corresponding to the first predicate may represent one of: the processing result of the first predicate must be a FALSE value (FALSE); the processing result of the first predicate is not necessarily FALSE, i.e., the processing result of the first predicate may be TRUE.

It should be appreciated that in practice, where a record in the data table is unknown, it may be determined by the predicate (including the first predicate described above) whether the result of the predicate's processing must be false (i.e., whether there is a possibility of being true); in the case of a definite determination of a record in the data table, it may be determined whether the predicate processes a result of a definite determination of the record by the predicate (including the first predicate described above) and the definite record as a false value or a true value.

And secondly, pushing the association connection operator down to the position below the single connection operator based on the target processing result and the de-duplication association column.

In some cases in the above application scenario, the association join operator may be pushed down under the single join operator based on the target processing result and the deduplication association column in the following manner:

Wherein the predicate (including the first predicate described above) can be considered an operator.

The second left child node may be the left child node of the unique child node of the root node of the second query computation tree.

The third associated join operator may be an operator represented by the second left child node,

the second right child node may be the right child node of the unique child node of the root node of the second query computation tree.

The fourth associative linkage operator may be an operator represented by the second right child node.

Internal connection operators with single result constraints: on the basis of the inner join (i.e. the inner join operator), there is at most one row of records in the right table that meet the join predicate for each record on the left side. Otherwise, the runtime reports errors.

An inner join (inner join) operator: each row in both data tables is required to have a matching column value.

Wherein the single join operator increases the runtime constraints over the left join operator. In the absence of run-time errors, the result of a single connection is the same as the left connection. The difference from the left connection is that for a certain record on the left, if the number of records in the right table that meet the connection condition is greater than one, an error is reported.

The determination manner of the association column and the de-duplication association column may refer to the above description, and will not be repeated here.

The left table of the single join operator may be a data table represented by the left child node of the node representing the single join operator. In other words, the left-hand table of the single join operator may be determined as follows: first, a node representing a single join operator (hereinafter referred to as node a) is determined, then, the left child node of node a (hereinafter referred to as node B) is determined, and then, the data table represented by node B is determined as the left table of the single join operator.

The right side table of the single join operator may be a data table represented by the right child node of the node representing the single join operator. In other words, the right-hand table of the single join operator may be determined as follows: first, a node representing a single join operator (hereinafter referred to as node a) is determined, then, a right child node of node a (hereinafter referred to as node C) is determined, and then, a data table represented by node C is determined as a right table of the single join operator.

The third left child node may be the left child node of the unique child node of the root node of the second query computation tree.

The fifth associative linkage operator may be an operator represented by the third left child node,

the third right child node may be the right child node of the unique child node of the root node of the second query computation tree.

The sixth associated join operator may be an operator represented by the third right child node.

Wherein a single connection increases the runtime constraints over a left connection. In the absence of run-time errors, the result of a single connection is the same as the left connection. The difference from the left connection is that for a certain record on the left, if the number of records in the right table that meet the connection condition is greater than one, an error is reported.

The determination manners of the association column, the de-duplication association column, the right table of the single join operator, and the left table of the single join operator may refer to the above descriptions, and are not repeated herein.

And step 205, performing data query based on the second query algorithm tree.

In this embodiment, step 205 is substantially identical to step 104 in the corresponding embodiment of fig. 1, and will not be described herein.

It should be noted that, in addition to the above descriptions, the present embodiment may further include the corresponding technical features described in the embodiment corresponding to fig. 1, so as to further achieve the technical effects of the data query method shown in fig. 1, and the detailed description with reference to fig. 1 is omitted herein for brevity.

According to the data query method provided by the embodiment of the application, the association connection operator is pushed down below the single connection operator through the de-duplication association column, so that the equivalence of the second query operator tree and the first query operator tree can be ensured, and the accuracy of data query can be ensured.

The following exemplary description of the embodiments of the present application is provided, but it should be noted that the embodiments of the present application may have the features described below, and the following description should not be construed as limiting the scope of the embodiments of the present application.

At present, big data technology is continuously innovated, and SQL (structured query language) is still a part of the current mainstream data processing software which cannot be abandoned. The related sub-queries in SQL language represent that a result set is logically generated by the inner-layer query and predicate judgment is performed on the records according to each record of the outer-layer query, so that service development can be greatly simplified. The simplicity of the business expression presents challenges to the realization of the database calculation engine, and if the database calculation engine is realized according to the original semantics, the number of times of the result set is generated if the outer layer query has a number of records, and the price is huge when the data volume of the outer layer query is large. Therefore, if the operator tree corresponding to the associated sub-query is equivalently changed on the basis of not changing the final result set, all associated connection operators are finally removed, and the calculation efficiency is possibly greatly improved. The associative linkage operator push down technique is a solution framework whose application relies on the implementation of the associative linkage operator push down individual operators. However, as far as the current development result of the technology is concerned, no scheme of pushing down the association join operator under the single join operator appears.

The method is an optimization method for the associated sub-queries in the database. In the logic optimizing part of the database optimizer, the time complexity of join and the compression execution time can be reduced by converting the associated sub-queries into the non-associated sub-queries. In the current method for separating the relevance sub-queries, the relevance join operator pushing technology has strong universality, but the current technology does not cover the situation that the relevance join operator is pushed down under a single join operator. The method makes up this gap.

The method is used for correctly pushing the association connection operator under the single connection operator in the logic optimization stage of the database optimizer, and the correctness of an execution result is ensured by giving a scheme that the correct association connection operator pushes the single connection operator.

The specific scheme of the association connection operator pushing down the single connection operator is as follows:

fig. 3A is a schematic diagram of a first query algorithm tree related to a data query method according to an embodiment of the present application. At this point, no deduplication association column is generated, and the right child (i.e., right child node) of the association join operator represents a single join operator.

On this basis, the optimization of the query computation tree can be performed in the following manner:

First, a de-duplication association list is generated. Fig. 3B is a schematic diagram of a query algorithm tree after generating a de-duplication association list in the data query method according to the embodiment of the present application. Where D represents the result after de-duplication of the association column, hereinafter de-duplication association list. In this section, the predicate among the join operators (i.e., the first predicate) is denoted as p1, and the predicate among the single join operators in the query operator tree is denoted as p2.

Secondly, pushing the association connection operator under a single connection operator on the basis of generating a de-duplication association list, wherein the specific scheme is divided into the following two cases:

in case one, when p1 (i.e., the first predicate) is irrelevant to a column (i.e., the target column) in the right table of the original single join operator, the root node of the conversion result represents the single join operator, and the predicates in the single join operator include p2 and p3. Wherein predicate p2 represents a predicate in the single join operator in the first step. Predicate p3 indicates that the left and right deduplication associated columns are identical (hereinafter Dmatch). The left side and the right side of the node representing the single connection operator are the associated connection nodes (namely, the nodes representing the associated connection operators). The predicate in the left-side association connection node is p1 (namely the first predicate), and the left table and the right table of the predicate are respectively a duplicate removal association list and a left table connected with the original list; the left and right tables of the meaningless words in the right-side association connection node are respectively a duplicate removal association list and a right-side table of the original single connection. See fig. 3C. That is, when the discrimination information indicates that the first predicate does not relate to the target column, it is determined that a root node of a second query algorithm tree to be generated represents a single join operator, a first left child node of the root node represents a first join operator, a first right child node of the root node represents a second join operator, a left child node of the first left child node represents a data table corresponding to the deduplication association column, a right child node of the first left child node represents a left table of the single join operator, a left child node of the first right child node represents a data table corresponding to the deduplication association column, and a right child node of the first right child node represents a right table of the single join operator.

Case two, when p1 (i.e., the first predicate described above) is related to a column in the right-hand table of the original single join operator (i.e., the target column described above), is discussed in the following two cases.

If p1 (d, l, null) (i.e., the above target processing result, where d is from the data table corresponding to the de-duplication association column, l is from the left table of the original single join operator) must be FALSE, the root node of the result after the conversion (i.e., the above second query algorithm tree) is predicate p1, the child node of the root node represents the join operator with a single result constraint, and predicates of the join operator are p2 and Dmatch. The child nodes on the left and right sides of the node representing the join operator are both the join nodes (i.e., the nodes representing the join operators). The child nodes of the left-side association connection node respectively represent a duplicate removal association list and a left-side list of the original single connection; wherein the child nodes of the right associated connection node represent the de-duplication associated list and the right table of the original single connection, respectively. See fig. 3D. Namely: and under the condition that the target processing result is necessarily a false value, determining that a root node of a second query algorithm tree to be generated represents the first predicate, a unique child node of the root node represents an interconnection operator with single result constraint, a second left child node of the unique child node represents a third association connection operator, a second right child node of the unique child node represents a fourth association connection operator, a left child node of the second left child node represents a data table corresponding to a deduplication association column, a right child node of the second left child node represents a left table of a single connection operator in the first query algorithm tree, a left child node of the second right child node represents a data table corresponding to a deduplication association column, and a right child node of the second right child node represents a right table of a single connection operator in the first query algorithm tree.

If p1 (d, l, null) (i.e., the above target processing result, where d is from the data table corresponding to the de-duplication association column, and l is from the left table of the original single join operator) may be TRUE, the root node of the result (i.e., the above second query operator tree) after the conversion is p1, the child of the root node is a single join node (i.e., the node representing the single join operator), and the predicates of the single join operator are p2 and Dmatch. The left and right children of the node are the associated connection nodes (i.e., the nodes representing the associated connection operators). The child nodes of the left-side association connection node are respectively expressed as a de-duplication association list and a left-side list of the original single connection; wherein the child nodes of the right associated connection node represent the de-duplication associated list and the right table of the original single connection, respectively. See fig. 3E. Namely: and under the condition that the target processing result is not necessarily a false value, determining that a root node of a second query algorithm tree to be generated represents the first predicate, a unique child node of the root node represents a single-connection operator, a third left child node of the unique child node represents a fifth association connection operator, a third right child node of the unique child node represents a sixth association connection operator, a left child node of the third left child node represents a data table corresponding to a deduplication association column, a right child node of the third left child node represents a left table of the single-connection operator in the first query algorithm tree, a left child node of the third right child node represents a data table corresponding to a deduplication association column, and a right child node of the third right child node represents a right table of the single-connection operator in the first query algorithm tree.

In the illustration, an operator not having a predicate may indicate that the operator has no predicate.

Taking the following SQL statement as an example, an example is provided in which the method can be utilized to illustrate the meaning of two operators and corresponding predicates before the associated join operator pushes down (including the initial state and the associated column deduplication table to generate two states).

In this statement, select p.personId, p.name, (-)

Select a.name from Assistant a where a.Boss＝p.personId and a.climate＝O.climate

) from Professor p where p. region=o.region.

The table O in the statement is a nursing result corresponding to the outer layer query, and the predicate p1 in the association connection operator is len (pa) +len (O.region) <30; predicate p2 in the single join operator is a.boss=p.personid. In the inner layer query, predicate in the associative join operator of the table professor (alias p) and the outer layer query is p.region=o.region, and associative join operator predicate of the table assside (alias a) and the outer layer query is a.close=o.close. Thus, the associated columns of the outer look-up table O include two columns of o.close and o.region, and the de-duplication associated list generated in the first step of conversion is the new table generated after the two columns are de-duplicated.

Wherein, the associative linkage operator: and aiming at the record in each outer layer query, generating a result set corresponding to the sub-query represented by the inner layer query, participating in predicate judgment of the association connection operator together with the outer layer query record, and determining whether to output finally.

Single connection: the single connection increases the runtime constraints over the left connection. In the absence of run-time errors, the result of a single connection is the same as the left connection. The difference from the left connection is that for a certain record on the left, if the number of records in the right table that meet the connection condition is greater than one, an error is reported (Neumann, t., leis, v., & Kemper, a., & 2017, march).

Internal connection with single result constraint: on the basis of the internal connection, for each record on the left side, there is only one row at most of records in the right table meeting the connection predicate. Otherwise, the runtime reports errors.

It should be noted that, in addition to the above descriptions, the present embodiment may further include the technical features described in the above embodiments, so as to achieve the technical effects of the data query method shown above, and the detailed description is referred to above, and is omitted herein for brevity.

The data query method provided by the embodiment of the application ensures that the related sub-queries in the database can be based on the method to improve the execution efficiency, and the accuracy of the query result can be ensured.

Fig. 4 is a schematic structural diagram of a data query device according to an embodiment of the present application. The method specifically comprises the following steps:

an obtaining unit 401, configured to obtain a target sentence, where the target sentence is a structured query language;

a determining unit 402, configured to determine a query operator tree of the target sentence, to obtain a first query operator tree, where a node of the first query operator tree represents an operator or a data table;

a pushing unit 403, configured to, in a case where an operator represented by a node of the first query operator tree includes a relevance join operator and a single join operator, push the relevance join operator below the single join operator, to obtain a second query operator tree;

and a query unit 404, configured to perform a data query based on the second query algorithm tree.

The data query device provided in this embodiment may be a data query device as shown in fig. 4, and may perform all the steps of each data query method described above, so as to achieve the technical effects of each data query method described above, and specific reference is made to the above related description, which is omitted herein for brevity.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and an electronic device 500 shown in fig. 5 includes: at least one processor 501, memory 502, at least one network interface 504, and other user interfaces 503. The various components in the electronic device 500 are coupled together by a bus system 505. It is understood that bus system 505 is used to enable connected communications between these components. The bus system 505 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 505 in fig. 5.

The user interface 503 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, a trackball, a touch pad, or a touch screen, etc.).

It is to be appreciated that the memory 502 in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DRRAM). The memory 502 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some implementations, the memory 502 stores the following elements, executable units or data structures, or a subset thereof, or an extended set thereof: an operating system 5021 and application programs 5022.

The operating system 5021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 5022 includes various application programs such as a Media Player (Media Player), a Browser (Browser), and the like for realizing various application services. A program for implementing the method of the embodiment of the present application may be included in the application 5022.

In this embodiment, the processor 501 is configured to execute the method steps provided in the method embodiments by calling a program or an instruction stored in the memory 502, specifically, a program or an instruction stored in the application 5022, for example, including:

and carrying out data query based on the second query algorithm tree.

The method disclosed in the embodiments of the present application may be applied to the processor 501 or implemented by the processor 501. The processor 501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 501. The processor 501 may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software elements in a decoded processor. The software elements may be located in a random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 502, and the processor 501 reads information in the memory 502 and, in combination with its hardware, performs the steps of the method described above.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (dspev, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, microcontrollers, microprocessors, other electronic units configured to perform the above-described functions of the application, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The electronic device provided in this embodiment may be an electronic device as shown in fig. 5, and may perform all the steps of each data query method described above, so as to achieve the technical effects of each data query method described above, and specific reference is made to the above related description, which is not repeated herein for brevity.

The embodiment of the application also provides a storage medium (computer readable storage medium). The storage medium here stores one or more programs. Wherein the storage medium may comprise volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk, or solid state disk; the memory may also comprise a combination of the above types of memories.

When one or more programs in the storage medium are executable by one or more processors, the data query method executed on the electronic device side is implemented.

The processor is configured to execute an association join operator push-down program stored in the memory, so as to implement the following steps of a data query method executed on the electronic device side:

And carrying out data query based on the second query algorithm tree.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

It is to be understood that the terminology used herein is for the purpose of describing particular example embodiments only, and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "includes," "including," and "having" are inclusive and therefore specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order described or illustrated, unless an order of performance is explicitly stated. It should also be appreciated that additional or alternative steps may be used.

The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of querying, the method comprising:

and carrying out data query based on the second query algorithm tree.

2. The method of claim 1, wherein pushing the join operator down under the single join operator comprises:

3. The method of claim 2, wherein pushing down the join operator under the single join operator based on the deduplication association column comprises:

4. A method according to claim 3, wherein said pushing down the join operator under the single join operator based on the discrimination information and the deduplication association column comprises:

and under the condition that the discrimination information indicates that the first predicate does not relate to the target column, determining that a root node of a second query algorithm tree to be generated represents a single connection operator, a first left sub node of the root node represents a first association connection operator, a first right sub node of the root node represents a second association connection operator, a left sub node of the first left sub node represents a data table corresponding to the deduplication association column, a right sub node of the first left sub node represents a left side table of the single connection operator, a left sub node of the first right sub node represents a data table corresponding to the deduplication association column, and a right sub node of the first right sub node represents a right side table of the single connection operator.

5. A method according to claim 3, wherein said pushing down the join operator under the single join operator based on the discrimination information and the deduplication association column comprises:

6. The method of claim 5, wherein pushing down the join operator under the single join operator based on the target processing result and the deduplication association column comprises:

7. The method of claim 5, wherein pushing down the join operator under the single join operator based on the target processing result and the deduplication association column comprises:

8. A query device, the device comprising:

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing a computer program stored in said memory, and which, when executed, implements the method of any of the preceding claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of the preceding claims 1-7.