CN111949686B

CN111949686B - Data processing method, device and equipment

Info

Publication number: CN111949686B
Application number: CN201910400008.XA
Authority: CN
Inventors: 李韬
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2024-05-14
Anticipated expiration: 2039-05-14
Also published as: CN111949686A

Abstract

The application provides a data processing method, a device and equipment, wherein the method comprises the following steps: acquiring an original execution plan, wherein the original execution plan comprises a plurality of nodes; selecting a target node which cannot be processed by a data source from the nodes of the original execution plan; performing equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan; determining a target execution plan according to the original execution plan and the equivalent execution plan; and sending the target execution plan to a data source so that the data source executes the target execution plan. By the technical scheme, the computing capacity of the data source can be reasonably utilized, the data transmission quantity is reduced, and higher query performance is obtained.

Description

Data processing method, device and equipment

Technical Field

The present application relates to the field of internet technologies, and in particular, to a data processing method, apparatus, and device.

Background

At present, data is typically stored in multiple data sources (i.e., databases), for example, some of the enterprise's data is stored in data source 1 and another part of the enterprise's data is stored in data source 2. Because data is stored in disparate data sources, it is necessary to connect the various data sources through a query system, reading data from the various data sources to support data processing across the data sources. For example, the querying system may read data from data source 1, read data from data source 2, and process using the read data.

However, if the query system reads all the data in the data source 1 and reads all the data in the data source 2, the cost of data reading is high, and the overall processing efficiency of the query system is ultimately affected. Based on this, the querying system typically pushes portions of the processing to the data source execution, so that, on the one hand, the computing power of the data source itself may be utilized, and, on the other hand, the amount of data returned by the data source to the querying system may be reduced.

However, for the processing request of the user, which processing should be pushed to the data source to be executed, no effective determination mode exists at present, the computing power of the data source cannot be reasonably utilized, and the user experience is poor.

Disclosure of Invention

The application provides a data processing method, which comprises the following steps:

Acquiring an original execution plan, wherein the original execution plan comprises a plurality of nodes;

Selecting a target node which cannot be processed by a data source from the nodes of the original execution plan;

Performing equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan;

determining a target execution plan according to the original execution plan and the equivalent execution plan;

And sending the target execution plan to a data source so that the data source executes the target execution plan.

Performing segmentation processing on the target node in the original execution plan to obtain a first child node which can be processed by a data source and a second child node which can not be processed by the data source;

Replacing a target node in an original execution plan by the first child node and the second child node to obtain an equivalent execution plan, wherein the execution result of the equivalent execution plan is the same as that of the original execution plan;

determining a father node corresponding to the target node from the nodes of the original execution plan;

Pulling up the target node to be an upper node of the father node to obtain an equivalent execution plan; the execution result of the equivalent execution plan is the same as the execution result of the original execution plan;

and carrying out data processing according to the equivalent execution plan.

The present application provides a data processing apparatus, the apparatus comprising: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an original execution plan, and the original execution plan comprises a plurality of nodes; the selecting module is used for selecting a target node which cannot be processed by a data source from the nodes of the original execution plan; the processing module is used for carrying out equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan; the determining module is used for determining a target execution plan according to the original execution plan and the equivalent execution plan; and the sending module is used for sending the target execution plan to the data source so as to enable the data source to execute the target execution plan.

The present application provides a data processing apparatus comprising:

A processor and a machine-readable storage medium having stored thereon computer instructions that when executed by the processor perform the following:

Based on the above technical solution, in the embodiment of the present application, the target node in the original execution plan may be equivalently transformed to obtain an equivalent execution plan, so as to increase the number of execution plans, generate more execution plans, effectively expand the solution space of the execution plans, and select an optimal execution plan from the execution plans, and take the optimal execution plan as the target execution plan. Obviously, as more selectable execution plans are provided, the final selected target execution plan is better, so that the effect of the target execution plan is improved, the processing of the data source execution is more reasonable, the computing capacity of the data source is reasonably utilized, the data transmission quantity is reduced, higher query performance is obtained, the overall query processing efficiency is higher, and the user experience is better.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly describe the drawings required to be used in the embodiments of the present application or the description in the prior art, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings of the embodiments of the present application for a person having ordinary skill in the art.

FIG. 1 is a flow chart of a data processing method in one embodiment of the application;

FIG. 2 is a schematic diagram of an application scenario in one embodiment of the present application;

FIG. 3 is a flow chart of a data processing method in another embodiment of the application;

FIGS. 4A-4F are schematic illustrations of an execution plan in one embodiment of the application;

FIG. 5 is a schematic diagram of a data processing apparatus in one embodiment of the application;

fig. 6 is a schematic diagram of a data processing apparatus in one embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to any or all possible combinations including one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. Depending on the context, furthermore, the word "if" used may be interpreted as "at … …" or "at … …" or "in response to a determination".

The embodiment of the application provides a data processing method, which can be applied to any device, such as any device of a query system, and is shown in fig. 1, and is a flowchart of the method, and the method may include:

step 101, an original execution plan is obtained, the original execution plan comprising a plurality of nodes.

Step 102, selecting a target node which cannot be processed by the data source from the nodes of the original execution plan.

Specifically, a tree structure may be used to arrange a plurality of nodes of the original execution plan; on the basis, each node of the tree structure is traversed from the lowest node of the tree structure from bottom to top in turn until nodes which cannot be processed by the data source are traversed, and the traversed nodes are determined to be target nodes.

Step 103, performing equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan; wherein the execution result of the equivalent execution plan is the same as the execution result of the original execution plan.

Specifically, the target node is equivalently transformed to obtain an equivalent execution plan, including but not limited to:

the method comprises the steps that firstly, target nodes in an original execution plan are subjected to segmentation processing, and a first child node which can be processed by a data source and a second child node which cannot be processed by the data source are obtained; and then, replacing the target node in the original execution plan by the first child node and the second child node to obtain an equivalent execution plan.

Wherein, replacing the target node in the original execution plan with the first child node and the second child node to obtain an equivalent execution plan may include, but is not limited to: and arranging the first child node and the second child node of the equivalent execution plan by adopting a tree structure, wherein the first child node is positioned at the lower layer of the second child node.

Determining a father node corresponding to the target node from the nodes of the original execution plan; and then, the target node is pulled up to be an upper node of the father node, and an equivalent execution plan is obtained.

Wherein, determining the father node corresponding to the target node from the nodes of the original execution plan may include, but is not limited to: if the plurality of nodes of the original execution plan are arranged in a tree structure, an upper node connected to the target node may be determined as a parent node corresponding to the target node.

Step 104, determining a target execution plan according to the original execution plan and the equivalent execution plan.

Specifically, a cost value corresponding to the original execution plan may be determined, and a cost value corresponding to the equivalent execution plan may be determined; then, the target execution plan is determined according to the execution plan with the minimum cost value.

Step 105, the target execution plan is sent to the data source, so that the data source executes the target execution plan. After the data source executes the target execution plan, the data source may return the execution results to the query system instead of returning all of the original data to the query system, thereby reducing the amount of data transferred.

In the above embodiment, the obtaining of the original execution plan may include, but is not limited to: acquiring a data processing request, and acquiring an original execution plan according to the data processing request; or after the equivalent execution plan is obtained (i.e., step 103), the equivalent execution plan may be determined to be the original execution plan.

In one example, the above execution sequence is only given for convenience of description, and in practical application, the execution sequence between steps may be changed, which is not limited. Moreover, in other embodiments, the steps of the corresponding methods need not be performed in the order shown and described herein, and the methods may include more or less steps than described herein. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; various steps described in this specification, in other embodiments, may be combined into a single step.

Based on the same application concept as the above method, another data processing method is also provided in the embodiment of the present application, where the method may be applied to any device of the query system, and the method may include:

Acquiring an original execution plan, which may include a plurality of nodes; selecting a target node from the nodes of the original execution plan, wherein the target node cannot be processed by a data source; performing segmentation processing on the target node in the original execution plan to obtain a first child node which can be processed by a data source and a second child node which can not be processed by the data source; and replacing the target node in the original execution plan by the first child node and the second child node to obtain an equivalent execution plan, wherein the equivalent execution plan is identical to the execution result of the original execution plan. Then, a target execution plan is determined based on the original execution plan and the equivalent execution plan, and the target execution plan is sent to a data source to cause the data source to execute the target execution plan.

In one example, replacing a target node in an original execution plan with a first child node and a second child node to obtain an equivalent execution plan includes: and arranging the first child node and the second child node of the equivalent execution plan by adopting a tree structure, wherein the first child node can be positioned at the lower layer of the second child node.

Acquiring an original execution plan, which may include a plurality of nodes; selecting a target node from the nodes of the original execution plan, wherein the target node cannot be processed by a data source; determining a father node corresponding to the target node from the nodes of the original execution plan, and pulling up the target node as an upper node of the father node to obtain an equivalent execution plan; wherein the execution result of the equivalent execution plan is the same as the execution result of the original execution plan. Then, a target execution plan is determined based on the original execution plan and the equivalent execution plan, and the target execution plan is sent to a data source to cause the data source to execute the target execution plan.

In one example, determining a parent node corresponding to the target node from the nodes of the original execution plan may include, but is not limited to: if the plurality of nodes of the original execution plan are arranged by adopting the tree structure, an upper node connected with the target node is determined to be a father node corresponding to the target node.

Acquiring an original execution plan, wherein the original execution plan comprises a plurality of nodes; selecting a target node from the nodes of the original execution plan, wherein the target node cannot be processed by a data source; performing equivalent transformation on a target node in the original execution plan to obtain an equivalent execution plan; and performing data processing according to the equivalent execution plan.

Alternatively, in one example, data processing according to the equivalent execution plan may include, but is not limited to, the following: determining a target execution plan according to the original execution plan and the equivalent execution plan, and sending the target execution plan to a data source so that the data source executes the target execution plan.

The implementation manner of each step may be referred to the above embodiment, and will not be described herein.

The above data processing method is further described below with reference to specific application scenarios.

Referring to fig. 2, a schematic view of an Application scenario in an embodiment of the present application is shown, where a client may be an APP (Application) included in a terminal device (such as a PC (Personal Computer, personal computer), a notebook computer, or a mobile terminal), or may be a browser included in the terminal device, which is not limited.

The query system is used to implement the data processing function in the embodiment of the present application, since the user data is stored in a plurality of data sources (i.e., databases), it is necessary to connect each data source through the query system, and read data from each data source to support data processing across the data sources. When using a query system, a user typically describes a query task in a query language (e.g., SQL) and then communicates to the query system for execution. The query system has a query optimization function, and can automatically pick out a reasonable execution plan to process the user query.

The data sources may be databases, and in the embodiment of the present application, the data sources may be application scenarios for heterogeneous data sources, that is, the data sources may be data sources of the same type, or may be data sources of different types, and the data sources may be relational databases or non-relational databases.

Further, for each data source, the type of this data source may also include, but is not limited to: OSS (Object Storage Service ), tableStore (table storage), HBase (Hadoop Database ), HDFS (Hadoop Distributed FILE SYSTEM, hadoop distributed file system), mySQL (i.e., relational Database), RDS (Relational Database Service ), DRDS (Distribute Relational Database Service, distributed relational Database service), RDBMS (Relational Database MANAGEMENT SYSTEM ), SQLServer (i.e., relational Database), postgreSQL (i.e., object relational Database), monglodb (i.e., database based on distributed file storage), etc., although the above types are just a few examples of data source types and the types of such data sources are not limiting.

The data stored in the data source may be various types of data, and the type of data is not limited, such as user data, commodity data, map data, video data, image data, audio data, and the like.

In the above application scenario, referring to fig. 3, a flowchart of a data processing method according to an embodiment of the present application may be applied to any device of a query system, where the method may include:

Step 301, a data processing request, such as a data processing request of the SQL (Structured Query Language ) type, is obtained, and the type of the data processing request is not limited.

Specifically, the client may send a data processing request to the querying system, and the querying system may receive the data processing request. For example, one example of a data processing request may be: SELECT c2, sum (c 3) FROM datasource1.Tab 1 WHERE c2>10 AND udf (c 2) >20 GROUP BY c2.

In the data processing request, datasource1 denotes the name of the data source, table1 denotes the name of the data table, and c2 and c3 denote the names listed in the data table "table 1". Based on this, the query system may determine the data table "table1" of the data source "datasource1", the data source "datasource1" according to the data processing request, and need to operate on the data of the columns "c2, c3" in the data table "table 1".

In the above data processing request, c2>10 means that data larger than 10 is filtered from all data of the column "c 2". UDF (c 2) >20, representing all data based on column "c2", filtering UDF (c 2) data greater than 20, UDF (c 2) representing the operation of data of column "c2" using UDF (user-defined function).

In the above data processing request, sum (c 3) GROUP BY c2 indicates grouping according to column "c2", and after grouping according to column "c2", the summation operation is performed on column "c 3".

Of course, the above data processing request is only an example of the present application, and the content of the data processing request does not affect the technical solution of the present application, and in the subsequent embodiment, the above data processing request is taken as an example.

Step 302, obtaining at least one original execution plan according to the data processing request.

In particular, the data processing request is a user-written SQL-type data processing request, which may be converted into a machine-executable execution plan describing a specific execution step, which may be generated by an optimizer of the query system, and which may be referred to as an original execution plan for convenience of distinction, without limiting the generation process of the original execution plan.

The query system may obtain at least one original execution plan according to the data processing request, and for convenience of description, take an example of generating an original execution plan, which is referred to as an original execution plan a hereinafter.

Where the original execution plan a may include a plurality of nodes, each of which may represent a calculation step. For example, scanning nodes (e.g., seq Scan, index Only Scan, bitmap Scan, etc.); connecting nodes (e.g., join, new Loop, hash Join, merge Join, etc.); materialized nodes (e.g., materialize); ordering nodes (e.g., sort); grouping nodes (e.g., groups); aggregation nodes (e.g., aggregate); filtering nodes (e.g., filters); projection nodes (e.g., projection); additional nodes (e.g., applications). Of course, the foregoing are just a few examples of nodes, and other types of nodes may be included, without limitation.

In one example, a tree structure may be used to arrange the nodes of the original execution plan a, and this arrangement is not limited. After the plurality of nodes of the original execution plan a are arranged, the process of executing the original execution plan a is that each node is sequentially executed from the lowest node (i.e., the bottom node) of the original execution plan a, and the execution process of the original execution plan a is not repeated.

For example, for the data processing request "SELECT c2, sum (c 3) FROM data source1.Table1where c2>10AND udf (c 2) >20group BY c2", in the original execution plan a generated FROM the data processing request, a scan node (for performing the scan step), a filter node (for performing the filter step), and an aggregate node (for performing the aggregate step) may be included. Of course, the original execution plan a may also include other types of nodes, related to data processing requests and specific algorithms, and will not be described in detail herein.

Referring to fig. 4A, a schematic diagram of arranging a plurality of nodes of an original execution plan a in a tree structure is shown. Based on the tree structure, starting from the lowest node (i.e., the bottom node) of the original execution plan a, the data in the data table "table1" of the scanning node, i.e., the scanning data source "datasource1" is executed first. Then, a filtering node, that is, "c2>10 AND UDF (c 2) >20", is executed, that is, data greater than 10 is filtered from all data of the column "c2", data greater than 20 is filtered UDF (c 2) based on all data of the column "c2", and UDF (c 2) represents an operation on data of the column "c2" using UDF.

Then, the aggregation node, that is, "sum (c 3) GROUP BY c2", is executed, that is, grouping is performed according to the column "c2", and after grouping is performed according to the column "c2", the column "c3" is subjected to a summation operation.

Step 303, selecting a target node which cannot be processed by the data source from the nodes of the original execution plan.

Specifically, after the plurality of nodes of the original execution plan are arranged by adopting the tree structure, each node of the tree structure (i.e., searching the tree structure from bottom to top) may be traversed from the lowest node of the tree structure (i.e., the bottom-up node), until a node that cannot be processed by the data source is traversed, and the traversed node (i.e., the node that cannot be processed by the data source) is determined as the target node.

For example, referring to fig. 4A, in order to implement the tree structure of the plan a, the first node, i.e., the scan node, is traversed from the lowest node of the tree structure, and since the data source has a scan function, the scan node can be implemented, and thus, the scan node is a node that can be processed by the data source.

Then, a second node, i.e. a filtering node, is traversed, and the filtering node is a node which cannot be processed by the data source, and is determined as a target node (namely, the traversed first node which cannot be processed by the data source is determined as the target node), so that the traversal process is ended, because the filtering node needs to adopt UDF (user-defined function), and the data source does not know what the UDF (user-defined function) is, and the UDF (user-defined function) cannot be executed, so that the data source cannot execute the filtering node.

Of course, the above "user-defined function" is merely an example, and other manners of determining a node that cannot be processed by the data source may be used, which is not limited thereto, so long as a node that cannot be processed by the data source is found. For example, a node needs to use the data of data source 2, and this node is a node that cannot be processed by data source 1, i.e., data source 1 cannot obtain the data of data source 2, resulting in that this node cannot be processed. As another example, a node needs to utilize function X, while the data source does not have function X, resulting in an inability to handle this node.

Step 304, performing equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan; wherein the execution result of the equivalent execution plan may be the same as the execution result of the original execution plan.

Specifically, after the target node in the original execution plan is found, if the target node can perform equivalent transformation, the equivalent transformation can be performed on the target node in the original execution plan to obtain the equivalent execution plan. If the target node is not capable of performing the equivalence transformation, the target node does not need to be subjected to the equivalence transformation.

Wherein, the target node can perform equivalent transformation or the target node cannot perform equivalent transformation means that: after the target node is transformed, if the obtained execution result of the execution plan is the same as the execution result of the original execution plan, it is stated that the equivalent transformation can be performed on the target node, and the obtained execution plan is the equivalent execution plan. After the target node is transformed, if the execution result of the obtained execution plan is different from the execution result of the original execution plan, it is indicated that the equivalent transformation cannot be performed on the target node.

In one example, as a typical example, a number of rules of equivalent transformation are described in relational algebra, which can be applied to produce an equivalent execution plan.

For example, assuming that the execution result of the execution plan 1 after the transformation of the target node is the data set a, the execution result of the original execution plan is the data set B, and the data set a is identical to the data set B, it is explained that the equivalent transformation can be performed on the target node, and the execution plan 1 is the equivalent execution plan. If data set a is different from data set B, it is indicated that the equivalent transformation cannot be performed on the target node.

In one example, an equivalence transformation policy (or rule) may be configured, and if the target node matches the equivalence transformation policy, it is indicated that the target node is capable of performing an equivalence transformation, and the equivalence transformation may be performed on the target node in the original execution plan, thereby obtaining an equivalence execution plan. If the target node does not match the equivalent transformation policy, it is indicated that the target node is not capable of performing the equivalent transformation. The content of the equivalent transformation policy may be empirically configured, and is not limited as long as it can distinguish whether or not equivalent transformation is performed.

For example, the equivalent transformation policy may include a splitting policy (for performing a splitting operation on the target node) and a pulling policy (for performing a pulling operation on the target node), where the splitting policy and the pulling policy both need to ensure equivalent transformation, and do not change the semantics of the original query and the query result, so as to ensure the generality of the algorithm.

If the target node is matched with the segmentation strategy, the target node is indicated to be capable of performing equivalent transformation, and the equivalent transformation can be performed on the target node in the original execution plan, so that an equivalent execution plan is obtained; if the target node is not matched with the segmentation strategy, the target node is not capable of performing equivalent transformation. The content of the segmentation policy may be configured empirically, so long as it can distinguish whether to perform equivalent transformation.

If the target node is matched with the pull-up strategy, the target node is indicated to be capable of performing equivalent transformation, and the equivalent transformation can be performed on the target node in the original execution plan, so that an equivalent execution plan is obtained; if the target node does not match the pull-up policy, it is indicated that the target node is not capable of performing an equivalence transformation. The content of the pull-up policy may be configured empirically, so long as it can distinguish whether to perform an equivalent transformation.

The process of step 304 is described in detail below in connection with two specific implementations.

The method comprises the steps that firstly, target nodes in an original execution plan are subjected to segmentation processing, and a first child node which can be processed by a data source and a second child node which cannot be processed by the data source are obtained; and replacing the target node in the original execution plan by the first child node and the second child node to obtain an equivalent execution plan. Specifically, the first child node and the second child node of the equivalent execution plan may be arranged in a tree structure, where the first child node is located at a lower layer of the second child node, that is, the lower layer is the first child node that can be processed by the data source.

For example, referring to fig. 4A, the target node may be a filter node, and the filter node is used to execute "c2>10 AND udf (c 2) >20", and it is obvious that the filter node in the original execution plan a may be subjected to the splitting process, and a filter sub-node 1 and a filter sub-node 2 are obtained, where the filter sub-node 1 is used to execute "c2>10", and the filter sub-node 2 is used to execute "udf (c 2) >20". Also, since the data source can perform "c2>10", but cannot perform "udf (c 2) >20", the filter sub-node 1 is a first sub-node that can be processed by the data source, and the filter sub-node 2 is a second sub-node that cannot be processed by the data source.

Further, the filtering sub-node 1 and the filtering sub-node 2 are used for replacing a target node in the original execution plan to obtain an equivalent execution plan, which is called an equivalent execution plan B in the following for convenience of distinction. Referring to fig. 4B, nodes of the equivalent execution plan B may be arranged in a tree structure, and the filtering sub-node 1 and the filtering sub-node 2 replace the filtering nodes, and the filtering sub-node 1 is located at a lower layer of the filtering sub-node 2.

Determining a father node corresponding to the target node from the nodes of the original execution plan; and pulling up the target node to be an upper node of the father node to obtain an equivalent execution plan. Wherein, the upper node connected with the target node in the original execution plan can be determined as the father node corresponding to the target node.

For example, referring to fig. 4A, the target node may be a filter node, and the filter node is configured to perform "c2>10 AND udf (c 2) >20". Obviously, the upper node connected to the filtering node is an aggregation node, and the aggregation node is used for executing "sum (C3) GROUP BY C2", so that the aggregation node is determined as a parent node corresponding to the filtering node, and the filtering node is pulled up as the upper node of the parent node (i.e., the filtering node is pulled up along the tree structure, thereby becoming the upper node of the parent node), so as to obtain an equivalent execution plan, which may be referred to as an equivalent execution plan C in the following for convenience of distinction.

Referring to fig. 4C, the nodes of the equivalent execution plan C may be arranged in a tree structure, and the order of the aggregate nodes and the filter nodes is changed, i.e., the aggregate nodes are located at the lower layer of the filter nodes.

Step 305, determining a target execution plan according to the original execution plan and the equivalent execution plan.

Specifically, a Cost value (i.e., cost) corresponding to the original execution plan may be determined, and a Cost value corresponding to the equivalent execution plan may be determined; and determining a target execution plan according to the execution plan with the minimum cost value. For example, a partial execution plan of the execution plans having the smallest cost value may be determined as the target execution plan.

For example, a cost value 1 corresponding to the original execution plan a may be determined, a cost value 2 corresponding to the equivalent execution plan B may be determined, and a cost value 3 corresponding to the equivalent execution plan C may be determined, and for the determination manners of the cost value 1, the cost value 2 and the cost value 3, the execution cost on the data source and the execution cost in the query system, and the data transmission cost between the two may be synthesized, and the specific calculation method may be referred to the conventional manner and will not be described herein. Then, assuming that the cost value 2 is minimum, the target execution plan may be determined from the equivalent execution plan B. For example, a partial execution plan of the equivalent execution plan B is determined as the target execution plan.

In another example, the original execution plan a may be added to the execution plan set after the original execution plan a is acquired according to the data processing request. Then, it is determined whether or not an unprocessed original execution plan exists in the execution plan set, and since an unprocessed original execution plan a exists, the original execution plan a can be processed in steps 303 and 304 to obtain an equivalent execution plan B and an equivalent execution plan C, and then the equivalent execution plan B and the equivalent execution plan C are taken as the original execution plans, and the equivalent execution plan B and the equivalent execution plan C are added to the execution plan set.

Then, it is determined whether or not there is an unprocessed original execution plan in the execution plan set, and since there is an unprocessed equivalent execution plan B (which has been already the original execution plan), the equivalent execution plan B can be processed in steps 303 and 304. As shown in fig. 4B, the target node that cannot be processed by the data source is the filtering sub-node 2, and the filtering sub-node 2 cannot be split, so that the processing in the first mode is not performed any more. In the second mode, the filtering sub-node 2 is pulled up to be an upper node of the aggregation node, so as to obtain an equivalent execution plan D, see fig. 4D. Then, the equivalent execution plan D is taken as the original execution plan, and is added to the execution plan set.

Then, it is determined whether or not there is an unprocessed original execution plan in the execution plan set, and since there is an unprocessed equivalent execution plan C (which has been already the original execution plan), the equivalent execution plan C is processed in steps 303 and 304. Referring to fig. 4C, the target node that cannot be processed by the data source is a filtering node, and the filtering node has no parent node, so the second processing is not performed in the above manner. In the above manner, the filtering node is split into the filtering sub-node 1 and the filtering sub-node 2, where the filtering sub-node 1 is located at the lower layer of the filtering sub-node 2, to obtain the equivalent execution plan E, as shown in fig. 4E, and the equivalent execution plan E is used as the original execution plan and added to the execution plan set.

Then, it is determined whether or not there is an unprocessed original execution plan in the execution plan set, and since there is an unprocessed equivalent execution plan D (which has been already the original execution plan), the equivalent execution plan D can be processed in steps 303 and 304. Referring to fig. 4D, the target node that cannot be processed by the data source is the filtering sub-node 2, and the filtering sub-node 2 cannot be split, and the filtering sub-node 2 has no parent node, so the processing in the first and second modes is not performed any more.

Then, it is determined whether or not there is an unprocessed original execution plan in the execution plan set, and since there is an unprocessed equivalent execution plan E (which has been already the original execution plan), the equivalent execution plan E can be processed in steps 303 and 304. Referring to fig. 4E, the target node that cannot be processed by the data source is the filtering sub-node 2, and the filtering sub-node 2 cannot be split, and the filtering sub-node 2 has no parent node, so the processing in the first and second modes is not performed any more.

Then, it is determined whether or not an unprocessed original execution plan exists in the execution plan set, and since the unprocessed original execution plan does not exist, the traversal process is ended. Further, based on the cost value of each execution plan (e.g., original execution plan A, equivalent execution plan B-equivalent execution plan E, etc.) in the set of execution plans, the target execution plan may be determined in the manner of step 305.

In one example, a cost value 1 corresponding to original execution plan A may be determined, a cost value 2 corresponding to equivalent execution plan B may be determined, a cost value 3 corresponding to equivalent execution plan C may be determined, a cost value 4 corresponding to equivalent execution plan D may be determined, and a cost value 5 corresponding to equivalent execution plan E may be determined. Then, assuming that the cost value 4 is minimum, the target execution plan may be determined from the equivalent execution plan D.

When determining the target execution plan according to the equivalent execution plan D, each node of the tree structure may be traversed from bottom to top in sequence (i.e., the tree structure is searched from bottom to top) starting from the lowest node of the tree structure of the equivalent execution plan D, until a node that cannot be processed by the data source is traversed, and all nodes below this node are determined as the target execution plan. For example, referring to fig. 4D, the scan node, the filtering sub-node 1, and the aggregation node may be regarded as a target execution plan, that is, all nodes in the target execution plan are nodes that can be processed by the data source, and referring to fig. 4F, an example of the target execution plan is shown.

In addition, for the remaining execution plans (i.e., execution plans other than the target execution plan) in the equivalent execution plan D, such as the filtering sub-node 2, it may be processed by the query system itself.

Step 306, the target execution plan is sent to the data source to cause the data source to execute the target execution plan.

After the data source executes the target execution plan (without limitation to the execution process), the data source may return a portion of the data to the query system based on the execution result, instead of returning all of the data source to the query system. After receiving the data returned by the data source, the query system may execute the remaining execution plans (i.e., other execution plans beyond the target execution plan), such as the filtering sub-node 2, which will not be described in detail.

Based on the technical scheme, in the embodiment of the application, the query system pushes part of query to the data source for execution by sending the target execution plan to the data source, so that the computing capacity of the data source can be utilized, and the data returned by the data source to the query system can be reduced. In addition, the target nodes in the original execution plans can be equivalently transformed to obtain equivalent execution plans, so that the number of the execution plans is increased, more execution plans are generated, the solution space of the execution plans is effectively expanded, the optimal execution plans are selected from the execution plans, and the optimal execution plans are taken as target execution plans. Obviously, as the selectable execution plans are more, the final selected target execution plan is better, so that the effect of the target execution plan is improved, the processing of the data source execution is more reasonable, the computing capacity of the data source is reasonably utilized, the method has good portability, can be applied to different heterogeneous scenes and query optimizers, and has good user experience.

Based on the same application concept as the above method, an embodiment of the present application further provides a data processing apparatus, as shown in fig. 5, which is a structural diagram of the data processing apparatus, where the data processing apparatus includes:

An obtaining module 51, configured to obtain an original execution plan, where the original execution plan includes a plurality of nodes; a selection module 52, configured to select a target node from the nodes of the original execution plan that cannot be processed by the data source; the processing module 53 is configured to perform an equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan; a determining module 54, configured to determine a target execution plan according to the original execution plan and the equivalent execution plan; and the sending module 55 is used for sending the target execution plan to the data source so as to enable the data source to execute the target execution plan.

The processing module 53 performs an equivalent transformation on the target node in the original execution plan, and is specifically configured to: performing segmentation processing on the target node in the original execution plan to obtain a first child node which can be processed by a data source and a second child node which can not be processed by the data source; and replacing the target node in the original execution plan by the first child node and the second child node to obtain an equivalent execution plan.

The processing module 53 performs an equivalent transformation on the target node in the original execution plan, and is specifically configured to: determining a father node corresponding to the target node from the nodes of the original execution plan; and pulling up the target node to be an upper node of the father node to obtain an equivalent execution plan.

Based on the same application concept as the method, the embodiment of the application further provides a data processing device, which comprises: a processor and a machine-readable storage medium having stored thereon computer instructions that when executed by the processor perform the following:

Embodiments of the present application also provide a machine-readable storage medium having stored thereon a number of computer instructions; the computer instructions, when executed, perform the following:

Referring to fig. 6, which is a block diagram of a data processing apparatus according to an embodiment of the present application, the data processing apparatus 60 may include: processor 61, network interface 62, bus 63, memory 64. Memory 64 may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, the memory 64 may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state disk, any type of storage disk (e.g., optical disk, dvd, etc.).

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Moreover, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method of data processing, the method comprising:

Performing equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan; wherein performing an equivalence transformation on the target node comprises: after the target node is transformed, if the execution result of the obtained execution plan is the same as the execution result of the original execution plan, performing equivalent transformation on the target node, wherein the execution plan obtained after the equivalent transformation is the equivalent execution plan;

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

Selecting a target node from the nodes of the original execution plan that cannot be processed by a data source, comprising:

Arranging the plurality of nodes of the original execution plan by adopting a tree structure;

traversing from the lowest node of the tree structure from bottom to top in turn until nodes which cannot be processed by the data source are traversed, and determining the traversed nodes as the target nodes.

3. The method of claim 1, wherein performing an equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan comprises:

performing segmentation processing on the target node in the original execution plan to obtain a first child node which can be processed by a data source and a second child node which can not be processed by the data source; and replacing a target node in the original execution plan by the first child node and the second child node to obtain an equivalent execution plan.

4. A method according to claim 3, wherein replacing the target node in the original execution plan with the first child node and the second child node results in an equivalent execution plan, comprising:

and arranging the first child node and the second child node of the equivalent execution plan by adopting a tree structure, wherein the first child node is positioned at the lower layer of the second child node.

5. The method of claim 1, wherein performing an equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan comprises:

and pulling up the target node to be an upper node of the father node to obtain an equivalent execution plan.

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

Determining a parent node corresponding to the target node from the nodes of the original execution plan, including:

and if the plurality of nodes of the original execution plan are arranged by adopting a tree structure, determining an upper node connected with the target node as a father node corresponding to the target node.

7. The method according to any one of claims 1 to 6, wherein,

The execution result of the equivalent execution plan is the same as the execution result of the original execution plan.

8. The method of claim 1, wherein the step of determining the position of the substrate comprises,

Determining a target execution plan according to the original execution plan and the equivalent execution plan, including:

determining a cost value corresponding to the original execution plan;

determining a cost value corresponding to the equivalent execution plan;

and determining the target execution plan according to the execution plan with the minimum cost value.

9. The method of claim 1, wherein obtaining the original execution plan comprises:

acquiring a data processing request, and acquiring an original execution plan according to the data processing request; or alternatively

After the equivalent execution plan is obtained, the equivalent execution plan is determined to be the original execution plan.

10. A method of data processing, the method comprising:

11. The method of claim 10, wherein replacing the target node in the original execution plan with the first child node and the second child node results in an equivalent execution plan, comprising:

12. A method of data processing, the method comprising:

13. The method of claim 12, wherein the step of determining the position of the probe is performed,

14. A method of data processing, the method comprising:

and carrying out data processing according to the equivalent execution plan.

15. A data processing apparatus, the apparatus comprising:

The system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an original execution plan, and the original execution plan comprises a plurality of nodes;

The selecting module is used for selecting a target node which cannot be processed by a data source from the nodes of the original execution plan;

The processing module is used for carrying out equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan; wherein performing an equivalence transformation on the target node comprises: after the target node is transformed, if the execution result of the obtained execution plan is the same as the execution result of the original execution plan, performing equivalent transformation on the target node, wherein the execution plan obtained after the equivalent transformation is the equivalent execution plan;

the determining module is used for determining a target execution plan according to the original execution plan and the equivalent execution plan;

And the sending module is used for sending the target execution plan to the data source so as to enable the data source to execute the target execution plan.

16. The apparatus of claim 15, wherein the processing module performs an equivalence transformation on a target node in the original execution plan to obtain an equivalence execution plan, and is specifically configured to:

17. The apparatus of claim 15, wherein the processing module performs an equivalence transformation on a target node in the original execution plan to obtain an equivalence execution plan, and is specifically configured to:

18. A data processing apparatus, comprising: