WO2024221908A1 - 数据查询方法及装置 - Google Patents
数据查询方法及装置 Download PDFInfo
- Publication number
- WO2024221908A1 WO2024221908A1 PCT/CN2023/135018 CN2023135018W WO2024221908A1 WO 2024221908 A1 WO2024221908 A1 WO 2024221908A1 CN 2023135018 W CN2023135018 W CN 2023135018W WO 2024221908 A1 WO2024221908 A1 WO 2024221908A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- nodes
- execution
- branch
- scheduling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the embodiments of the present specification belong to the field of computer technology, and more particularly, to a data query method and device.
- the purpose of the present invention is to provide a data query solution that can improve the overall execution efficiency of data query.
- the first aspect of the present specification provides a data query method, including: obtaining an execution plan of a target query statement to be executed, the target query statement is used for a joint query of multiple data tables, the execution plan includes multiple nodes and connecting edges between the nodes, a leaf node among the multiple nodes is a query call node for a data table in the multiple data tables, and there are several branch nodes among the multiple nodes, each of the several branch nodes has multiple child nodes and dependencies between the corresponding child nodes; along the several branch nodes, the execution plan is divided into multiple sub-graphs, and the multiple sub-graphs are converted into multiple scheduling nodes; wherein a single sub-graph includes at least one node; based on the dependency, a relationship graph between the multiple scheduling nodes is constructed; based on the relationship graph, the multiple scheduling nodes are concurrently scheduled.
- the second aspect of the present specification provides a data query device, including: an acquisition unit, configured to acquire an execution plan of a target query statement to be executed, the target query statement is used for a joint query of multiple data tables, the execution plan includes multiple nodes and connecting edges between nodes, a leaf node among the multiple nodes is a query call node for a data table in the multiple data tables, and there are several branch nodes among the multiple nodes, each of the several branch nodes has multiple child nodes and dependencies between the corresponding child nodes; a segmentation unit, configured to segment the execution plan into multiple subgraphs along the several branch nodes, and convert the multiple subgraphs into multiple scheduling nodes; wherein a single subgraph includes at least one node; a generation unit, configured to construct a relationship graph between the multiple scheduling nodes based on the dependency; a scheduling unit, configured to concurrently schedule the multiple scheduling nodes based on the relationship graph.
- a third aspect of the present specification provides a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to execute the method described in any implementation manner in the first aspect.
- a fourth aspect of the present specification provides a computing device, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the method described in any implementation manner in the first aspect is implemented.
- a fifth aspect of the present specification provides a computer program product, which, when executed in a computer, causes the computer to execute the method described in any implementation manner in the first aspect.
- an execution plan of a target query statement to be executed can be obtained, wherein the target query statement is used for a joint query of multiple data tables, and the execution plan includes multiple nodes and connecting edges between nodes, and the leaf nodes among the multiple nodes are query call nodes for the data tables in the multiple data tables, and there are several branch nodes among the multiple nodes, and the several branch nodes all have multiple child nodes and the dependencies between the corresponding child nodes.
- the execution plan can be divided into multiple subgraphs along the several branch nodes, and the multiple subgraphs can be converted into multiple scheduling nodes. Then, based on the dependency, a relationship graph between the multiple scheduling nodes can be constructed. Then, based on the relationship graph, the multiple scheduling nodes can be concurrently scheduled.
- the parallelization of the multiple data table queries can be achieved, thereby improving the overall execution efficiency of the data query.
- FIG1 is a schematic diagram of an application scenario in which the embodiments of this specification can be applied.
- FIG2 is a flow chart of a data query method according to an embodiment of the present specification
- FIG3 is a flow chart of the conversion process from a logical execution plan to a physical execution plan
- FIG4 is a schematic diagram of the conversion process from a logical execution plan to a physical execution plan
- FIG5 is a flow chart of the subgraph processing process
- FIG6 is a flow chart of the subgraph processing process
- FIG7 is a flow chart of the subgraph processing process
- FIG8 is a schematic diagram of a subgraph processing process
- FIG9 is a schematic diagram of a relationship diagram between multiple scheduling nodes
- FIG10 is a schematic diagram of a concurrent scheduling process based on a relationship diagram
- FIG. 11 is a schematic diagram of the structure of a data query device in an embodiment of the present specification.
- the embodiments of this specification provide a data query solution.
- FIG1 is a schematic diagram of an application scenario in which the embodiment of the present specification can be applied.
- a user device 101 used by user A and a target platform 102 that supports joint query of multiple data tables can be included.
- the target platform 102 can be, for example, any one of a data flow hub platform, a data development platform, a privacy collaboration application platform, and a digital matrix service platform.
- the above-mentioned multiple data tables may be associated with multiple institutions (such as enterprises, etc.).
- User A may be one of the multiple institutions, or other users with the authority to query the above-mentioned multiple data tables.
- the above-mentioned multiple data tables may be ordinary data tables located on the institution side; or, the above-mentioned multiple data tables may be multiple virtual data tables located in the target platform 102, and the multiple virtual data tables may be formed by abstracting the data resources of the multiple institutions.
- any virtual data table may correspond to at least one data resource, and the target platform 102 may perform data query on the data resource corresponding to the virtual data table based on the query operation on the virtual data table.
- a target query statement for the joint query of the above-mentioned multiple data tables can be submitted to the target platform 102 through the user device 101.
- the target platform 102 can obtain the execution plan PlanA of the target query statement.
- the execution plan PlanA can be a tree-like execution plan.
- the execution plan PlanA may include multiple nodes and connecting edges between nodes, and the leaf nodes among the multiple nodes are query call (QueryCall) nodes for the data tables in the above-mentioned multiple data tables.
- the multiple nodes correspond to multiple operations
- the leaf nodes among the multiple nodes correspond to query call operations for the data tables in the above-mentioned multiple data tables.
- there are several branch nodes in the multiple nodes and the several branch nodes all have multiple child nodes and dependencies between the corresponding child nodes.
- Any of the several branch nodes may include any one of a Union node, a Join node, and a Correlate node.
- the Union node may be used to obtain the union of the result data of its child nodes.
- the dependency relationship between the child nodes corresponding to the Union node may be, for example, that the child nodes of the Union node are not mutually dependent.
- the Join node may be used to perform a join operation on the result data of its child nodes.
- the dependency relationship between the child nodes corresponding to the Join node may be, for example, that the child nodes of the Join node are not mutually dependent.
- the Correlate node may be used to use the data in the result data of one of its child nodes as an input parameter of another child node, so that the other child node performs a data query based on the input parameter.
- the dependency relationship between the child nodes corresponding to the Correlate node may be, for example, that the query of one child node of the Correlate node depends on another child node.
- the target platform 102 can divide the execution plan PlanA into multiple subgraphs along the above-mentioned branch nodes, and convert the multiple subgraphs into multiple scheduling nodes. Among them, a single subgraph includes at least one node. Afterwards, the target platform 102 can construct a relationship graph between the multiple scheduling nodes based on the dependency relationship between the subnodes corresponding to the above-mentioned branch nodes, and concurrently schedule the multiple scheduling nodes based on the relationship graph. In this way, the parallelization of the above-mentioned multiple data table queries can be achieved, thereby improving the overall execution efficiency of data queries.
- FIG. 2 is a flow chart of a data query method in an embodiment of the present specification.
- the method can be executed by any device, platform or device cluster with data storage, computing and processing capabilities, for example, as shown in FIG.
- the target platform 102 executes.
- step S201 an execution plan of the target query statement to be executed is obtained.
- the target query statement is used for a joint query of multiple data tables.
- the execution plan includes multiple nodes and connecting edges between the nodes.
- the leaf nodes among the multiple nodes are query call nodes for a data table in the multiple data tables, and there are several branch nodes among the multiple nodes.
- the several branch nodes all have multiple child nodes, and the corresponding child nodes have dependencies.
- the target query statement may specify the above-mentioned multiple data tables and include query conditions.
- the query conditions may include filtering conditions for data tables in the above-mentioned multiple data tables and association conditions between data tables.
- the target query statement may include a SQL (Structured Query Language) statement.
- the SQL statement may include a from clause and a where clause, wherein the from clause specifies the above-mentioned multiple data tables, and the where clause includes the query condition.
- the branch node in the embodiment of this specification may refer to a node with multiple child nodes.
- Any branch node may include any one of a union node, a join node, and a correlated query node.
- the Union node can be used to find the union of the result data of each of its child nodes.
- the dependency relationship between the child nodes corresponding to the Union node may be, for example, that the child nodes of the Union node are not mutually dependent.
- the Join node can be used to perform a connection operation on the result data of each of its child nodes.
- the dependency relationship between the child nodes corresponding to the Join node may be, for example, that the child nodes of the Join node are not mutually dependent.
- the Correlate node can be used to use the data in the result data of one of its child nodes as the input parameter of another child node, so that the other child node performs data query based on the input parameter.
- the dependency relationship between the child nodes corresponding to the Correlate node may be, for example, that the query of a child node (e.g., the right child node) of the Correlate node depends on another child node (e.g., the left child node).
- the execution plan of the target query statement may be a logical execution plan.
- the logical execution plan may be understood as an execution plan composed of logical operations, which mainly describes the content to be executed to implement the data query.
- the logical execution plan of the target query statement may include multiple logical nodes and connecting edges between the nodes, wherein the leaf nodes in the multiple logical nodes are logical query call nodes for the data tables in the above-mentioned multiple data tables, and there are several logical branch nodes in the multiple logical nodes, each of which has multiple child nodes, and the several logical branch nodes respectively correspond to the dependency relationships between the child nodes. It should be pointed out that the logical node can be understood as a logical operation.
- the above-mentioned multiple data tables can all support a read method, and the read method can use some fields of the corresponding data table for filtering query.
- the logical QueryCall node may, for example, include the read method supported by the corresponding data table and the filtering condition for the data table, wherein the filtering condition satisfies the read method.
- the filtering condition may be a filtering condition included in the target query statement, or may be determined based on the association condition between the data table and other data tables in the target query statement.
- the logical Join node may include an association condition, which is determined based on the association condition between the data tables corresponding to each child node of the logical Join node.
- the execution plan of the target query statement may be a physical execution plan.
- the physical execution plan may be understood as an execution plan composed of physical operations, which mainly describes the implementation scheme of the data query execution content.
- the physical execution plan of the target query statement may include multiple physical nodes and connecting edges between nodes, wherein the leaf nodes among the multiple physical nodes are physical query call nodes for the data tables in the above-mentioned multiple data tables, and there are several physical branch nodes among the multiple physical nodes, each of which has multiple child nodes, and the several physical branch nodes respectively correspond to the dependency relationships between the child nodes.
- the physical node may be understood as a physical operation.
- step S201 the logical execution plan of the target query statement can be obtained, and the logical execution plan can be converted into a physical execution plan.
- the execution plan conversion process shown in FIG3 can be performed.
- FIG3 is a flowchart of the conversion process from a logical execution plan to a physical execution plan.
- each logical QueryCall node in the logical execution plan is converted into a corresponding physical QueryCall node.
- step S303 several logical branch nodes in the logical execution plan are converted into corresponding physical branch nodes.
- step S307 a physical execution plan is constructed based on the nodes and the relationships between the nodes obtained through the conversion.
- the logical execution plan may also include a logical projection (Project) node and several logical filter (Filter) nodes.
- the logical Project node and the logical Filter node may also be converted.
- the merged result of adjacent logical Filter nodes and logical Project nodes may be converted into a corresponding physical calculation (Calculate) node.
- the several logical Filter nodes may be one or more logical Filter nodes.
- the logical Project node may be used to return the data required for the query.
- the logical filter (Filter) node may be used to filter the result data of its child nodes.
- step S305 may also be executed.
- step S305 for any first Filter node in the logical execution plan, if there is a connection edge between the first Filter node and the logical Project node, the first Filter node and the logical Project node are merged, and the merged result is converted into a corresponding physical Calculate node; if there is no connection edge between the first Filter node and the logical Project node, the first Filter node is converted into a corresponding physical Calculate node.
- FIG4 is a schematic diagram of the conversion process from the logical execution plan to the physical execution plan.
- the logical execution plan shown in FIG4 includes logical query call nodes LCall1, LCall2, LCall3, logical association query node LCorr, logical filtering nodes LFilt1, LFilt2, logical connection node LJoin, and logical projection node LProj.
- the logical query call nodes LCall1, LCall2, and LCall3 can be converted into corresponding physical query call nodes PCall1, PCall2, and PCall3, the logical association query node LCorr can be converted into the corresponding physical association query node PCorr, the logical filtering node LFilt2 can be converted into the corresponding physical calculation node PCalc2, the logical connection node LJoin can be converted into the corresponding physical connection node PJoin, and the merge result of the adjacent logical filtering node LFilt1 and the logical projection node LProj can be converted into the corresponding physical calculation node PCalc1. Then, based on the relationship between the nodes obtained by the conversion, a physical execution plan as shown in the right part of Figure 4 can be generated.
- the logical nodes can be converted into corresponding physical nodes based on the preset conversion logic.
- a target identifier can be added to the physical node with multiple child nodes in the physical execution plan, and the target identifier is used to indicate that the corresponding physical node is a physical branch node.
- step S203 the execution plan is divided into a plurality of subgraphs along the plurality of branch nodes, and the plurality of subgraphs are converted into a plurality of scheduling nodes; wherein a single subgraph includes at least one node.
- any branch node among the above-mentioned several branch nodes (hereinafter referred to as the first branch node)
- the first branch path if the number of child nodes of each node on a branch path of the first branch node (hereinafter referred to as the first branch path) is not greater than 1, and each node does not depend on nodes on other branch paths of the first branch node, the subgraph processing process shown in Figure 5 can be executed.
- Figure 5 is a flow chart of the subgraph processing process.
- step S501 if the number of child nodes of each node on the first branch path of the first branch node is not greater than 1, and each node does not depend on nodes on other branch paths of the first branch node, then each node is divided into the same first subgraph.
- step S503 the execution statement list of each node on the first branch path is obtained, and a first execution function is generated based on the execution statement list.
- step S505 a first scheduling node including a first execution function corresponding to the first subgraph is generated.
- Figure 6 is a flow chart of the subgraph processing process. As shown in Figure 6, in step S601, if the number of child nodes of each node on the first branch path of the first branch node is not greater than 1, and each node depends on the nodes on other branch paths of the first branch node, then the nodes and the first branch node are divided into the same second subgraph.
- step S603 the execution statement list of each node on the first branch path is obtained, and a second execution function is generated based on the execution statement list, and a sub-process node including the second execution function is generated.
- step S605 the execution statement list of the first branch node is obtained, and a third execution function is generated based on the execution statement list, and a main process node including the third execution function is generated; wherein the third execution function calls the second execution function.
- step S607 the combination of the main process node and the sub-process node is used as the second scheduling node corresponding to the second subgraph.
- FIG. 7 is a flow chart of the subgraph processing process.
- step S701 if the first branch node is connected to the root node by a connection edge and a scheduling node associated with the first branch node is not generated, the first branch node and the root node are divided into the same third subgraph.
- step S703 the execution statement list of the first branch node and the root node is obtained, and a fourth execution function is generated based on the execution statement list.
- step S705 a third scheduling node including a fourth execution function corresponding to the third subgraph is generated.
- the logic of obtaining the execution statement list of the node and generating the execution function based on the execution statement list may be pre-compiled, and corresponding processing may be performed according to the pre-compiled logic.
- step S203 is described in conjunction with Figure 8.
- the query of the right child node of the PCorr node depends on the left child node.
- Figure 8 is a schematic diagram of the subgraph processing process.
- let’s take the left subtree of the PJoin node as an example.
- There is only one branch path in the left subtree and there are two nodes, PCall1 and PCalc2, on the branch path. Since the number of child nodes of the PCall1 node is 0 and the number of child nodes of the PCalc2 node is 1, it can be known that these two nodes are not branch nodes with multiple child nodes. In addition, these two nodes do not depend on the nodes on other branch paths of the PJoin node. Therefore, the PCall1 node and the PCalc2 node can be divided into the subgraph pointed to by the number 801 in Figure 8.
- the PCalc2 node Since the query of the PCall1 node does not depend on other nodes, the PCalc2 node is only used to calculate the result data of the PCall1 node, so the subgraph can be called independently.
- the PCall1 node and The execution statement list of the PCalc2 node is generated, and the execution function Func1 is generated based on the execution statement list, and then the scheduling node A corresponding to the subgraph and including the execution function Func1 as shown in FIG8 is generated.
- the PCall2 node is a leaf node with 0 child nodes. Based on the dependency relationship between the child nodes corresponding to the PCorr node, it can be known that the query of the PCall2 node does not depend on the PCall3 node, so the PCall2 node can be executed independently, and the PCall2 node can be divided into a subgraph pointed to by the label 802.
- the execution statement list of the PCall2 node can be obtained, and the execution function Func2 is generated based on the execution statement list, and then the scheduling node B corresponding to the subgraph and including the execution function Func2 as shown in Figure 8 is generated.
- PCall3 node There is only PCall3 node on the right branch path of PCorr node, and PCall3 node is a leaf node, and the number of its child nodes is 0. Based on the dependency relationship between the child nodes corresponding to PCorr node, it can be known that the query of PCall3 node depends on PCall2 node, and PCorr node needs to use the data in the result data of PCall2 node as the input parameter of PCall3 node, so that PCall3 node performs data query based on the input parameter, and thus PCall3 node and PCorr node can be divided into the subgraph pointed to by label 803.
- the execution statement list of PCall3 node can be obtained, and the execution function Func3 is generated based on the execution statement list, and then the sub-process node D including the execution function Func3 as shown in Figure 8 is generated.
- the execution statement list of PCorr node can be obtained, and the execution function Func4 is generated based on the execution statement list, and then the main process node C including the execution function Func4 as shown in Figure 8 is generated.
- the execution function Func4 can call the execution function Func3, for example, the data in the execution result of the execution function Func2 is used as the input parameter of the execution function Func3, so that the execution function Func3 performs data query based on the input parameter.
- the combination of the main process node C and the sub-process node D can be used as the scheduling node corresponding to the subgraph pointed to by the label 803.
- the PJoin node and the PCalc1 node may be divided into a subgraph as indicated by reference numeral 804.
- an execution statement list of the PJoin node and the PCalc1 node may be obtained, an execution function Func5 may be generated based on the execution statement list, and a scheduling node E corresponding to the subgraph and including the execution function Func5 may be generated as shown in FIG. 8 .
- the execution plan of the target query statement can be traversed in a depth-first manner, thereby dividing the execution plan into multiple subgraphs along the above-mentioned branch nodes, and converting the multiple subgraphs into multiple scheduling nodes.
- depth-first can be: first, take a point that has not been visited as the starting vertex, and try to visit vertices that have not been visited along the edge of the current vertex; when there are no unvisited vertices, return to the previous vertex, and continue to try to visit other vertices until all vertices have been visited.
- depth-first traversal is to traverse along a branch of the graph to the end, then backtrack, and then traverse along another edge until all vertices have been visited.
- the PCall1 node and the PCalc2 node can be merged. Then, in response to the PJoin node being a branch node, the PJoin node can be added to the previous node stack. Among them, the previous node stack can be used to maintain branch nodes.
- the PCalc2 node and the PCall1 node can be divided into the same subgraph, and the execution statement list of the PCalc2 node is obtained, and an execution function is generated based on the execution statement list, and then a scheduling node corresponding to the subgraph including the execution function is generated.
- the node traversal can be performed in the direction from the PCorr node to the PCall2 node.
- the execution statement list of the PCall2 node can be generated in response to the fact that it has no child nodes, and then the PCorr node can be traced back.
- the PCorr node in response to the PCorr node being a branch node, the PCorr node can be added to the previous node stack, and in response to the fact that the PCall2 node does not depend on the child node PCall3 of the PCorr node, the PCall2 node can be divided into a subgraph, and the execution statement list of the PCall2 node can be obtained, and an execution function can be generated based on the execution statement list, and then a scheduling node corresponding to the subgraph and including the execution function can be generated.
- the PCorr node's child node PCall3 can be traversed, and in response to the PCall3 node having no child nodes, the PCall3 node's execution statement list can be generated, and then the PCorr node can be traced back.
- the PCorr node and the PCall3 node can be divided into the same subgraph.
- the PCall3 node's execution statement list can be obtained, and an execution function can be generated based on the execution statement list, and then a sub-process node including the execution function can be generated.
- the PCorr node's execution statement list can be generated, and an execution function can be generated based on the execution statement list, and then a main process node including the execution function can be generated.
- the sub-process node and the main process node can be used as a scheduling node corresponding to the subgraph.
- the PCorr node can be popped out from the previous node stack, and then the PJoin node can be traced back.
- an execution statement list of the PJoin node can be generated, and the PJoin node can be popped from the previous node stack, and then backtracked to the PCalc1 node.
- the PJoin node and the PCalc1 node can be divided into the same subgraph, and the execution statement list of the child node PJoin of the PCalc1 node can be obtained, and the execution statement list of the PCalc1 node can be generated based on the execution statement list, thereby realizing the merging of the PJoin node and the PCalc1 node.
- an execution function can be generated based on the execution statement list of the PCalc1 node, and then a scheduling node corresponding to the subgraph and including the execution function is generated.
- step S205 based on the dependency relationships between the sub-nodes corresponding to the plurality of branch nodes, a relationship graph between the plurality of scheduling nodes is constructed, wherein the relationship graph may be a directed acyclic graph (DAG).
- DAG directed acyclic graph
- scheduling node A and scheduling node B can be executed in parallel, scheduling node B and the scheduling node formed by the combination of sub-process node D and main process node C need to be executed in series, the formed scheduling node and scheduling node E need to be executed in series, and scheduling node A and scheduling node E need to be executed in series.
- a relationship diagram between scheduling nodes A, B, E and the formed scheduling nodes can be constructed. Among them, the relationship diagram can be shown in Figure 9.
- Figure 9 is a schematic diagram of a relationship diagram between multiple scheduling nodes.
- scheduling node A and scheduling node B are leaf nodes
- scheduling node E is a root node.
- relationship diagram can be pre-compiled, and the relationship diagram can be constructed based on the construction logic.
- the dependency relationships between the sub-nodes corresponding to the above-mentioned several branch nodes respectively construct a relationship graph between the above-mentioned multiple scheduling nodes.
- step S207 based on the relationship graph, the above-mentioned multiple scheduling nodes are concurrently scheduled.
- the above-mentioned multiple scheduling nodes may be concurrently scheduled based on the relationship graph in the direction from the leaf node to the root node.
- a first execution task corresponding to each leaf node in the relationship graph can be generated.
- the first execution task may, for example, include an execution function included in its corresponding leaf node, or an execution code obtained by compiling the execution function.
- each first execution task can be executed in parallel.
- an execution scheduler can be used to generate the first execution tasks corresponding to each leaf node, and each first execution task can be added to a task queue, and multiple execution threads can be used to obtain and execute the first execution task from the task queue.
- an execution scheduling trigger statement for the relationship graph can be generated, and the execution scheduler can be enabled to generate execution tasks by executing the trigger statement.
- the execution tasks corresponding to the scheduling nodes A and B can be generated first, and the execution tasks corresponding to the scheduling nodes A and B can be executed in parallel.
- the execution task corresponding to the scheduling node B is completed, the execution task corresponding to the scheduling node formed by the combination of the sub-process node D and the main process node C can be generated, and the execution task can be executed.
- the execution task corresponding to the scheduling node E can be generated and executed.
- the execution scheduler can add the generated execution task to the task queue, and the execution thread in the execution thread pool can obtain the execution task from the task queue, execute the obtained execution task, and notify the execution scheduler of the execution result after the execution of the execution task is completed.
- the execution scheduler can determine whether the execution is completed based on the received execution result. If the execution is determined to be completed, the execution scheduler can return the final execution result. If it is determined that the execution is not completed, the execution scheduler can trigger the subsequent execution task according to the relationship between the nodes and add it to the task queue.
- FIG10 is a schematic diagram of the concurrent scheduling process based on the relationship graph.
- the relationship graph may be compiled to generate execution code, and the multiple scheduling nodes may be concurrently scheduled based on the execution code, thereby achieving concurrent execution based on compilation.
- the solution provided by the embodiment corresponding to FIG. 2 can obtain the execution plan of the target query statement to be executed, wherein the target query statement is used for the joint query of multiple data tables, and the execution plan includes multiple nodes and connecting edges between the nodes, and the leaf nodes among the multiple nodes are query call nodes for the data tables in the multiple data tables, and there are several branch nodes among the multiple nodes, and the several branch nodes all have multiple child nodes and the dependency relationship between the corresponding child nodes.
- the execution plan can be divided into multiple subgraphs along the several branch nodes, and the multiple subgraphs can be converted into multiple scheduling nodes. Then, based on the dependency relationship, a relationship graph between the multiple scheduling nodes can be constructed. Then, based on the relationship graph, the multiple scheduling nodes can be concurrently scheduled.
- the parallelization of the multiple data table queries can be achieved, thereby improving the overall execution efficiency of the data query.
- FIG11 is a schematic diagram of the structure of a data query device in an embodiment of the present specification.
- the device can be applied to any A device, platform or device cluster with data storage, computing and processing capabilities, for example, can be applied to the target platform 102 as shown in FIG. 1 .
- the data query device 1100 in the embodiment of the present specification may include: an acquisition unit 1101, a segmentation unit 1102, a generation unit 1103 and a scheduling unit 1104.
- the acquisition unit 1101 is configured to acquire an execution plan of a target query statement to be executed, the target query statement is used for a joint query of multiple data tables, the execution plan includes multiple nodes and connecting edges between the nodes, the leaf nodes among the multiple nodes are query call nodes for a data table in the multiple data tables, and there are several branch nodes among the multiple nodes, each of the several branch nodes has multiple child nodes and dependencies between the corresponding child nodes;
- the segmentation unit 1102 is configured to segment the execution plan into multiple subgraphs along the several branch nodes, and convert the multiple subgraphs into multiple scheduling nodes; wherein a single subgraph includes at least one node;
- the generation unit 1103 is configured to construct a relationship graph between the multiple scheduling nodes based on the dependency relationship; the scheduling unit 1104 is configured to concurrently schedule the multiple scheduling no
- a single branch node includes any one of the following: a union node, a connection node, and an association query node; wherein the union node is used to calculate the union of the result data of its child nodes, the connection node is used to perform a connection operation on the result data of its child nodes, and the association query node is used to use the data in the result data of one of its child nodes as an input parameter of another child node, so that the other child node performs data query based on the input parameter.
- the execution plan is a physical execution plan
- the multiple nodes are multiple physical nodes
- the query calling node is a physical query scheduling node
- the multiple branch nodes are multiple physical branch nodes.
- the acquisition unit 1101 can be further configured to: obtain a logical execution plan for the target query statement, the logical execution plan including multiple logical nodes and connecting edges between the nodes, a leaf node among the multiple logical nodes is a logical query call node for a data table in the above-mentioned multiple data tables, and there are several logical branch nodes in the multiple logical nodes, each of which has multiple child nodes; convert the logical execution plan into a physical execution plan.
- the acquisition unit 1101 can be further configured to: convert each logical query call node in the above-mentioned multiple logical nodes into a corresponding physical query call node; convert several logical branch nodes into corresponding physical branch nodes; and construct a physical execution plan based on the relationship between the converted nodes.
- the multiple logical nodes further include a logical projection node and several logical filtering nodes; and the acquisition unit 1101 may also be configured to: before building a physical execution plan based on the relationship between the nodes obtained by the conversion, for any first filtering node among the several logical filtering nodes, if the first filtering node is connected to the logical projection node by a connection edge, merge the first filtering node and the logical projection node, and convert the merged result into a corresponding physical computing node; if the first filtering node is not connected to the logical projection node by a connection edge, convert the first filtering node into a corresponding physical computing node.
- the segmentation unit 1102 can be further configured to: for any first branch node among the above-mentioned branch nodes, if the number of child nodes of each node on the first branch path of the first branch node is not greater than 1, and each node does not depend on the nodes on other branch paths of the first branch node, then divide the nodes into the same first subgraph; obtain the execution statement list of each node, and generate a first execution function based on the execution statement list; generate a first scheduling node corresponding to the first subgraph including the first execution function.
- the segmentation unit 1102 can be further configured to: if the number of child nodes of each of the above-mentioned nodes is not greater than 1, and each of the above-mentioned nodes depends on the nodes on the above-mentioned other branch paths, then the above-mentioned nodes and the first branch node are divided into the same second subgraph; obtain the execution statement list of each of the above-mentioned nodes, generate a second execution function based on the execution statement list, and generate a sub-process node including the second execution function; obtain the execution statement list of the first branch node, generate a third execution function based on the execution statement list, and generate a main process node including the third execution function; wherein the third execution function calls the second execution function; and use the combination of the main process node and the sub-process node as the second scheduling node corresponding to the second subgraph.
- the segmentation unit 1102 can be further configured to: if the first branch node is connected to the root node by a connecting edge and a scheduling node associated with the first branch node is not generated, then the first branch node and the root node are divided into the same third subgraph; obtain an execution statement list of the first branch node and the root node, and generate a fourth execution function based on the execution statement list; generate a third scheduling node corresponding to the third subgraph including the fourth execution function.
- the above relationship graph is a directed acyclic graph.
- the scheduling unit 1104 may be further configured to: concurrently schedule the multiple scheduling nodes based on the relationship graph in a direction from the leaf node to the root node.
- the scheduling unit 1104 may be further configured to: generate a first execution task corresponding to each leaf node in the above relationship graph; and execute each first execution task in parallel.
- the scheduling unit 1104 may be further configured to: add each first execution task to a task queue; and obtain and execute the first execution task from the task queue using multiple execution threads.
- the scheduling unit 1104 can be further configured to: in response to the completion of the execution of the first execution task among each first execution task, determine whether the parent node of the leaf node corresponding to the first execution task can be executed based on the relationship between the parent node and other scheduling nodes; if the determination result is yes, generate the second execution task corresponding to the parent node, and execute the second execution task.
- the embodiments of the present specification also provide a computer-readable storage medium on which a computer program is stored.
- the computer program is executed in a computer, the computer is caused to execute the data query method described in the method embodiment above.
- An embodiment of the present specification also provides a computing device, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the data query method described in the method embodiment in the foregoing text is implemented.
- the embodiments of this specification also provide a computer program product, wherein when the computer program product is executed in a computer, the computer is caused to execute the data query method described in the method embodiment above.
- a programmable logic device such as a field programmable gate array (FPGA)
- FPGA field programmable gate array
- HDL Hardware Description Language
- HDL There is not only one HDL, but many kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc.
- ABEL Advanced Boolean Expression Language
- AHDL Altera Hardware Description Language
- HDCal JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc.
- VHDL Very-High-Speed Integrated Circuit Hardware Description Language
- Verilog Verilog
- the controller may be implemented in any suitable manner, for example, the controller may take the form of a microprocessor or processor and a computer readable medium storing a computer readable program code (e.g., software or firmware) executable by the (micro)processor, a logic gate, a switch, an application specific integrated circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, and the memory controller may also be implemented as part of the control logic of the memory.
- a computer readable program code e.g., software or firmware
- the controller may be implemented in the form of a logic gate, a switch, an application specific integrated circuit, a programmable logic controller, and an embedded microcontroller by logically programming the method steps. Therefore, such a controller may be considered as a hardware component, and the means for implementing various functions included therein may also be considered as a structure within the hardware component. Or even, the means for implementing various functions may be considered as both a software module for implementing the method and a structure within the hardware component.
- the systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions.
- a typical implementation device is a server system.
- the computer that implements the functions of the above embodiments may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
- each module can be implemented in the same or more software and/or hardware, or the module implementing the same function can be implemented by a combination of multiple sub-modules or sub-units, etc.
- the device embodiments described above are only schematic.
- the division of the units is only a logical function division. There may be other division methods in actual implementation.
- multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
- Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
- each process and/or box in the flowchart and/or block diagram, as well as the combination of the process and/or box in the flowchart and/or block diagram can be implemented by computer program instructions.
- These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
- These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
- These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
- a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
- processors CPU
- input/output interfaces network interfaces
- memory volatile and non-volatile memory
- Memory may include non-permanent storage in a computer-readable medium, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
- RAM random access memory
- ROM read-only memory
- flash RAM flash memory
- Computer-readable media include permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information.
- the information can be computer-readable instructions, data structures, program modules or other data.
- Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, graphene storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device.
- computer-readable media does not include temporary computer-readable media. (transitory media), such as modulated data signals and carrier waves.
- one or more embodiments of the present specification may be provided as a method, system or computer program product. Therefore, one or more embodiments of the present specification may take the form of a complete hardware embodiment, a complete software embodiment or an embodiment combining software and hardware. Moreover, one or more embodiments of the present specification may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- One or more embodiments of the present specification may be described in the general context of computer-executable instructions executed by a computer, such as program modules.
- program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
- One or more embodiments of the present specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communication network.
- program modules may be located in local and remote computer storage media, including storage devices.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本说明书实施例提供了数据查询方法及装置。该方法包括:获取待执行的目标查询语句的执行计划,目标查询语句用于多个数据表的联合查询,该执行计划包括多个节点和节点之间的连接边,该多个节点中的叶子节点是针对该多个数据表中的数据表的查询调用节点,并且该多个节点中存在若干分支节点,该若干分支节点均具有多个子节点、且对应子节点之间的依赖关系;沿该若干分支节点,将该执行计划分割成多个子图,并将该多个子图转换成多个调度节点;其中,单个子图包括至少一个节点;基于该依赖关系,构建该多个调度节点之间的关系图;基于该关系图,对该多个调度节点进行并发调度。
Description
本申请要求于2023年04月25日提交中国国家知识产权局、申请号为202310458769.7、申请名称为“数据查询方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本说明书实施例属于计算机技术领域,尤其涉及数据查询方法及装置。
由于数据安全和隐私保护考虑,很多数据资源是查询受限的形式,如典型的通过数据服务接口的形式透出数据。在当前数据作为生产要素流通的大背景下,机构间数据集成应用场景越来越多,查询受限的数据资源集成需求也越来越高。由于数据资源查询本身是在大规模数据之上,且数据资源是远程交互的,因而数据资源查询执行耗时一般较长。
发明内容
本发明的目的在于提供一种数据查询方案,能提高数据查询整体执行效率。
本说明书第一方面提供一种数据查询方法,包括:获取待执行的目标查询语句的执行计划,所述目标查询语句用于多个数据表的联合查询,所述执行计划包括多个节点和节点之间的连接边,所述多个节点中的叶子节点是针对所述多个数据表中的数据表的查询调用节点,并且所述多个节点中存在若干分支节点,所述若干分支节点均具有多个子节点、且对应子节点之间的依赖关系;沿所述若干分支节点,将所述执行计划分割成多个子图,并将所述多个子图转换成多个调度节点;其中,单个子图包括至少一个节点;基于所述依赖关系,构建所述多个调度节点之间的关系图;基于所述关系图,对所述多个调度节点进行并发调度。
本说明书第二方面提供一种数据查询装置,包括:获取单元,被配置成获取待执行的目标查询语句的执行计划,所述目标查询语句用于多个数据表的联合查询,所述执行计划包括多个节点和节点之间的连接边,所述多个节点中的叶子节点是针对所述多个数据表中的数据表的查询调用节点,并且所述多个节点中存在若干分支节点,所述若干分支节点均具有多个子节点、且对应子节点之间的依赖关系;分割单元,被配置成沿所述若干分支节点,将所述执行计划分割成多个子图,并将所述多个子图转换成多个调度节点;其中,单个子图包括至少一个节点;生成单元,被配置成基于所述依赖关系,构建所述多个调度节点之间的关系图;调度单元,被配置成基于所述关系图,对所述多个调度节点进行并发调度。
本说明书第三方面提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行如第一方面中任一实现方式描述的方法。
本说明书第四方面提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现如第一方面中任一实现方式描述的方法。
本说明书第五方面提供一种计算机程序产品,当该计算机程序产品在计算机中执行时,令计算机执行如第一方面中任一实现方式描述的方法。
在本说明书实施例提供的方案中,可以获取待执行的目标查询语句的执行计划,其中,目标查询语句用于多个数据表的联合查询,该执行计划包括多个节点和节点之间的连接边,该多个节点中的叶子节点是针对该多个数据表中的数据表的查询调用节点,并且该多个节点中存在若干分支节点,该若干分支节点均具有多个子节点、且对应子节点之间的依赖关系。之后,可以沿该若干分支节点,将该执行计划分割成多个子图,并将该多个子图转换成多个调度节点。接着,可以基于该依赖关系,构建该多个调度节点之间的关系图。然后,可以基于该关系图,对该多个调度节点进行并发调度。由此,可以实现该多个数据表查询的并行化,从而能提高数据查询整体执行效率。
为了更清楚地说明本说明书实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本说明书实施例可以应用于其中的一个应用场景的示意图;
图2是本说明书实施例中数据查询方法的流程图;
图3是从逻辑执行计划到物理执行计划的转换过程的流程图;
图4是从逻辑执行计划到物理执行计划的转换过程的一个示意图;
图5是子图处理过程的一个流程图;
图6是子图处理过程的一个流程图;
图7是子图处理过程的一个流程图;
图8是子图处理过程的一个示意图;
图9是多个调度节点之间的关系图的一个示意图;
图10是基于关系图进行的并发调度过程的一个示意图;
图11是本说明书实施例中数据查询装置的结构示意图。
为了使本技术领域的人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本说明书一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本说明书保护的范围。另外,本说明书实施例中的“第一”、“第二”、“第三”等词,仅用于信息区分,不起任何限定作用。
如前所述,由于数据安全和隐私保护考虑,很多数据资源是查询受限的形式,如
典型的通过数据服务接口的形式透出数据。在当前数据作为生产要素流通的大背景下,机构间数据集成应用场景越来越多,查询受限的数据资源集成需求也越来越高。由于数据资源查询本身是在大规模数据之上,且数据资源是远程交互的,因而数据资源查询执行耗时一般较长。
为了提高数据查询整体执行效率,本说明书实施例提供了一种数据查询方案。
图1是本说明书实施例可以应用于其中的一个应用场景的示意图。在图1所示的应用场景中,可以包括用户A使用的用户设备101,以及支持对多个数据表进行联合查询的目标平台102。目标平台102例如可以是数据流转中枢平台、数据开发平台、隐私协作应用平台和数矩服务平台等中的任一种。
上述多个数据表可以关联于多个机构(如企业等)。用户A可以是该多个机构之一,或者其他具有查询上述多个数据表的权限的用户。上述多个数据表可以是位于机构侧的普通数据表;或者,上述多个数据表可以是位于目标平台102中的多个虚拟数据表,该多个虚拟数据表可以通过对该多个机构的数据资源进行抽象而形成。其中,任意的虚拟数据表可以对应至少一个数据资源,目标平台102可以基于针对虚拟数据表的查询操作,对该虚拟数据表所对应的数据资源进行数据查询。
当用户A想要对上述多个数据表进行联合查询时,可以通过用户设备101向目标平台102提交用于上述多个数据表的联合查询的目标查询语句。目标平台102可以获取目标查询语句的执行计划PlanA。其中,执行计划PlanA可以是树状的执行计划。执行计划PlanA可以包括多个节点和节点之间的连接边,该多个节点中的叶子节点是针对上述多个数据表中的数据表的查询调用(QueryCall)节点。其中,该多个节点对应于多个操作,该多个节点中的叶子节点对应于针对上述多个数据表中的数据表的查询调用操作。另外,该多个节点中存在若干分支节点,该若干分支节点均具有多个子节点、且对应子节点之间的依赖关系。
该若干分支节点中任意的分支节点可以包括并集(Union)节点、连接(Join)节点和关联查询(Correlate)节点等中的任一种。Union节点可以用于对其各子节点的结果数据求并集。Union节点对应的子节点之间的依赖关系,例如可以是Union节点的各子节点不相互依赖。Join节点可以用于对其各子节点的结果数据进行连接操作。Join节点对应的子节点之间的依赖关系,例如可以是Join节点的各子节点不相互依赖。Correlate节点可以用于将其一个子节点的结果数据中的数据作为另一个子节点的入参,从而使得该另一个子节点基于该入参进行数据查询。Correlate节点对应的子节点之间的依赖关系,例如可以是Correlate节点的一个子节点的查询依赖于另一个子节点。
接着,目标平台102可以沿上述若干分支节点,将执行计划PlanA分割成多个子图,并将该多个子图转换成多个调度节点。其中,单个子图包括至少一个节点。之后,目标平台102可以基于上述若干分支节点分别对应的子节点之间的依赖关系,构建该多个调度节点之间的关系图,并基于该关系图对该多个调度节点进行并发调度。由此,可以实现上述多个数据表查询的并行化,从而能提高数据查询整体执行效率。
下面,结合具体的实施例,介绍本说明书实施例提供的数据查询方案。
参看图2,其是本说明书实施例中数据查询方法的流程图。该方法可由任何具有数据存储、计算、处理能力的设备、平台或设备集群执行,例如,可由如图1中所示
的目标平台102执行。
如图2所示,首先,在步骤S201,获取待执行的目标查询语句的执行计划,目标查询语句用于多个数据表的联合查询,执行计划包括多个节点和节点之间的连接边,该多个节点中的叶子节点是针对该多个数据表中的数据表的查询调用节点,并且该多个节点中存在若干分支节点,该若干分支节点均具有多个子节点、且对应子节点之间的依赖关系。
其中,目标查询语句中可以指定上述多个数据表,并且包括查询条件。该查询条件可以包括针对上述多个数据表中的数据表的过滤条件、及数据表之间的关联条件。在一个例子中,目标查询语句可以包括SQL(Structured Query Language,结构化查询语言)语句。该SQL语句可以包括from子句和where子句,该from子句中指定上述多个数据表,该where子句中包括该查询条件。
本说明书实施例中的分支节点可以指具有多个子节点的节点。任意的分支节点可以包括并集(Union)节点、连接(Join)节点和关联查询(Correlate)节点等中的任一种。Union节点可以用于对其各子节点的结果数据求并集。Union节点对应的子节点之间的依赖关系,例如可以是Union节点的各子节点不相互依赖。Join节点可以用于对其各子节点的结果数据进行连接操作。Join节点对应的子节点之间的依赖关系,例如可以是Join节点的各子节点不相互依赖。Correlate节点可以用于将其一个子节点的结果数据中的数据作为另一个子节点的入参,从而使得该另一个子节点基于该入参进行数据查询。Correlate节点对应的子节点之间的依赖关系,例如可以是Correlate节点的一个子节点(例如右子节点)的查询依赖于另一个子节点(例如左子节点)。
在一个例子中,目标查询语句的执行计划可以是逻辑执行计划。逻辑执行计划可以理解为由逻辑操作构成的执行计划,主要描述实现数据查询要执行的内容。目标查询语句的逻辑执行计划可以包括多个逻辑节点和节点之间的连接边,该多个逻辑节点中的叶子节点是针对上述多个数据表中的数据表的逻辑查询调用节点,并且该多个逻辑节点中存在均具有多个子节点的若干逻辑分支节点,该若干逻辑分支节点分别对应子节点之间的依赖关系。需要指出,逻辑节点可以理解为逻辑操作。
实践中,上述多个数据表均可以支持读取方法,该读取方法可以使用其对应的数据表的部分字段进行过滤查询。逻辑QueryCall节点例如可以包括其对应的数据表所支持的读取方法、及针对该数据表的过滤条件,其中,该过滤条件满足该读取方法。另外,该过滤条件可以是目标查询语句中包括的过滤条件,或者基于目标查询语句中该数据表与其他数据表之间的关联条件而确定。逻辑Join节点可以包括关联条件,该关联条件基于该逻辑Join节点的各子节点分别对应的数据表之间的关联条件而确定。
在另一个例子中,目标查询语句的执行计划可以是物理执行计划。物理执行计划可以理解为由物理操作构成的执行计划,主要描述数据查询执行内容的实现方案。目标查询语句的物理执行计划可以包括多个物理节点和节点之间的连接边,该多个物理节点中的叶子节点是针对上述多个数据表中的数据表的物理查询调用节点,并且该多个物理节点中存在均具有多个子节点的若干物理分支节点,该若干物理分支节点分别对应子节点之间的依赖关系。需要指出,物理节点可以理解为物理操作。
基于此,在步骤S201中,可以获取目标查询语句的逻辑执行计划,并将该逻辑执行计划转换成物理执行计划。具体地,在将目标查询语句的逻辑执行计划转换成物理执行计划时,可以执行如图3所示的执行计划转换过程。其中,图3是从逻辑执行计划到物理执行计划的转换过程的流程图。
如图3所示,在步骤S301,将逻辑执行计划中的各逻辑QueryCall节点,分别转换成相应的物理QueryCall节点。
在步骤S303,将逻辑执行计划中的若干逻辑分支节点,分别转换成相应的物理分支节点。
在步骤S307,基于转换所得的节点和节点之间的关系,构建物理执行计划。
在一种实施方式中,逻辑执行计划还可以包括逻辑投影(Project)节点和若干逻辑过滤(Filter)节点。在将逻辑执行计划转换成物理执行计划的过程中,还可以对逻辑Project节点和逻辑Filter节点进行转换,例如可以将相邻的逻辑Filter节点和逻辑Project节点的合并结果转换成相应的物理计算(Calculate)节点。其中,该若干逻辑Filter节点可以为一个或多个逻辑Filter节点。逻辑Project节点可以用于返回查询需要的数据。逻辑过滤(Filter)节点可以用于对其子节点的结果数据进行过滤。
具体地,在步骤S307之前,还可以执行步骤S305。
在步骤S305,对于逻辑执行计划中任意的第一Filter节点,若第一Filter节点与逻辑Project节点之间有连接边相连,则对第一Filter节点和该逻辑Project节点进行合并,并将合并结果转换成相应的物理Calculate节点;若第一Filter节点与该逻辑Project节点之间没有连接边相连,则将第一Filter节点转换成相应的物理Calculate节点。
以目标查询语句的逻辑执行计划为图4中左侧部分示出的逻辑执行计划为例,对从逻辑执行计划到物理执行计划的转换过程进行举例说明。其中,图4是从逻辑执行计划到物理执行计划的转换过程的一个示意图。在图4所示的逻辑执行计划中,包括逻辑查询调用节点LCall1、LCall2、LCall3,逻辑关联查询节点LCorr,逻辑过滤节点LFilt1、LFilt2,逻辑连接节点LJoin,以及逻辑投影节点LProj。在将图4所示的逻辑执行计划转换成物理执行计划的过程中,可以将逻辑查询调用节点LCall1、LCall2、LCall3转换成相应的物理查询调用节点PCall1、PCall2、PCall3,将逻辑关联查询节点LCorr转换成相应的物理关联查询节点PCorr,将逻辑过滤节点LFilt2转换成相应的物理计算节点PCalc2,将逻辑连接节点LJoin转换成相应的物理连接节点PJoin,将相邻的逻辑过滤节点LFilt1和逻辑投影节点LProj的合并结果转换成相应的物理计算节点PCalc1,之后可以基于转换所得的节点和节点之间的关系,生成如图4中右侧部分所示的物理执行计划。
需要说明,在将逻辑执行计划转换成物理执行计划的过程中,可以基于预设的转换逻辑,将逻辑节点转换成相应的物理节点。另外,为了后续对物理执行计划进行子图划分时便于识别哪个物理节点是具有多个子节点的物理分支节点,可以在物理执行计划中为具有多个子节点的物理节点添加目标标识,该目标标识用于指示其对应的物理节点为物理分支节点。
在步骤S203,沿上述若干分支节点,将执行计划分割成多个子图,并将该多个子图转换成多个调度节点;其中,单个子图包括至少一个节点。
在一种实施方式中,对于上述若干分支节点中任意的分支节点(下文中称为第一分支节点),若第一分支节点的一个分支路径(下文中称为第一分支路径)上的各节点各自的子节点数目不大于1、且该各节点不依赖于第一分支节点的其他分支路径上的节点,则可以执行如图5所示的子图处理过程。其中,图5是子图处理过程的一个流程图。如图5所示,在步骤S501,若第一分支节点的第一分支路径上的各节点各自的子节点数目不大于1、且该各节点不依赖于第一分支节点的其他分支路径上的节点,则将该各节点划分到同一个第一子图。在步骤S503,获取第一分支路径上的各节点的执行语句列表,基于该执行语句列表生成第一执行函数。在步骤S505,生成第一子图对应的包括第一执行函数的第一调度节点。
若第一分支节点的第一分支路径上的各节点各自的子节点数目不大于1、且该各节点依赖于第一分支节点的其他分支路径上的节点,则可以执行如图6所示的子图处理过程。其中,图6是子图处理过程的一个流程图。如图6所示,在步骤S601,若第一分支节点的第一分支路径上的各节点各自的子节点数目不大于1、且该各节点依赖于第一分支节点的其他分支路径上的节点,则将该各节点和第一分支节点划分到同一第二子图。在步骤S603,获取第一分支路径上的各节点的执行语句列表,基于该执行语句列表生成第二执行函数,以及生成包括第二执行函数的子流程节点。在步骤S605,获取第一分支节点的执行语句列表,基于该执行语句列表生成第三执行函数,以及生成包括第三执行函数的主流程节点;其中,第三执行函数调用第二执行函数。在步骤S607,将主流程节点和子流程节点的组合,作为第二子图对应的第二调度节点。
若第一分支节点与根节点有连接边相连、且未生成关联于第一分支节点的调度节点,则可以执行如图7所示的子图处理过程。其中,图7是子图处理过程的一个流程图。如图7所示,在步骤S701,若第一分支节点与根节点有连接边相连、且未生成关联于第一分支节点的调度节点,则将第一分支节点和根节点划分到同一第三子图。在步骤S703,获取第一分支节点和根节点的执行语句列表,基于该执行语句列表生成第四执行函数。在步骤S705,生成第三子图对应的包括第四执行函数的第三调度节点。
需要说明,获取节点的执行语句列表,并基于执行语句列表生成执行函数的逻辑,可以是预先编译好的,可以按照预先编译好的该逻辑进行相应的处理。
以图4中所示的物理执行计划为例,结合图8介绍步骤S203的实现过程。其中,在图4所示的物理执行计划中,PCorr节点的右子节点的查询依赖于左子节点。图8是子图处理过程的一个示意图。
具体地,先以PJoin节点的左子树为例,该左子树中只有一条分支路径,该分支路径上有PCall1和PCalc2两个节点。由于PCall1节点的子节点数目为0,PCalc2节点的子节点数目为1,因而可以获知这两个节点均不是具有多个子节点的分支节点。另外,这两个节点均不依赖于PJoin节点的其他分支路径上的节点。因此,可以将PCall1节点和PCalc2节点划分到如图8中的标号801指向的子图。由于PCall1节点的查询不依赖于其他节点,PCalc2节点仅用于对PCall1节点的结果数据进行计算处理,因而该子图能独立调用。在将该子图转换成调度节点时,可以获取PCall1节点和
PCalc2节点的执行语句列表,并基于该执行语句列表生成执行函数Func1,进而生成如图8中所示的对应于该子图且包括执行函数Func1的调度节点A。
再以PJoin节点的右子树为例,在该右子树中,PCorr节点的左分支路径上只有PCall2节点,PCall2节点为叶子节点,其子节点数目为0。基于PCorr节点对应的子节点之间的依赖关系,可以获知PCall2节点的查询不依赖于PCall3节点,因而PCall2节点可以独立执行,可以将PCall2节点划分到如标号802指向的子图。在将该子图转换成调度节点时,可以获取PCall2节点的执行语句列表,基于该执行语句列表生成执行函数Func2,进而生成如图8中所示的对应于该子图且包括执行函数Func2的调度节点B。
PCorr节点的右分支路径上只有PCall3节点,PCall3节点为叶子节点,其子节点数目为0。基于PCorr节点对应的子节点之间的依赖关系,可以获知PCall3节点的查询依赖于PCall2节点,PCorr节点需要将PCall2节点的结果数据中的数据作为PCall3节点的入参,使得PCall3节点基于该入参进行数据查询,因而可以将PCall3节点和PCorr节点划分到如标号803指向的子图。在将该子图转换成调度节点时,可以获取PCall3节点的执行语句列表,基于该执行语句列表生成执行函数Func3,进而生成如图8中所示的包括执行函数Func3的子流程节点D。另外,还可以获取PCorr节点的执行语句列表,基于该执行语句列表生成执行函数Func4,进而生成如图8中所示的包括执行函数Func4的主流程节点C。其中,执行函数Func4可以调用执行函数Func3,例如,将执行函数Func2的执行结果中的数据作为执行函数Func3的入参,使得执行函数Func3基于该入参进行数据查询。接着,可以如图8中所示,将主流程节点C和子流程节点D的组合,作为标号803指向的子图对应的调度节点。
在处理完PJoin节点的左子树和右子树后,可以响应于PJoin节点与根节点PCalc1有连接边相连,而将PJoin节点和PCalc1节点划分到如标号804指向的子图。在将该子图转换成调度节点时,可以获取PJoin节点和PCalc1节点的执行语句列表,基于该执行语句列表生成执行函数Func5,以及生成如图8中所示的对应于该子图且包括执行函数Func5的调度节点E。
在一种实施方式中,可以按深度优先的方式对目标查询语句的执行计划进行遍历,从而沿上述若干分支节点,将执行计划分割成多个子图,并将该多个子图转换成多个调度节点。其中,深度优先的思想可以是:首先以一个没有被访问到的点作为起始顶点,沿当前顶点的边进行尝试访问没有到过的顶点;当没有未访问过的顶点时则返回上一顶点,继续尝试访问别的顶点,直到所有的顶点都被访问过。显然,深度优先遍历是沿着图的一条分支遍历直到末端,然后回溯,再沿着另外一条边进行遍历,直到所有的顶点都被访问过为止。
以图4所示的物理执行计划为例,介绍按深度优先的方式对该物理执行计划进行遍历,从而沿若干物理分支节点,将该物理执行计划分割成多个子图,并将该多个子图转换成多个调度节点的过程。
首先,可以从PCalc1节点开始,沿从PCalc1节点到PCall1节点的方向进行节点遍历,在遍历到PCall1节点时,可以响应于其没有子节点,而生成PCall1节点的执行语句列表,然后回溯到PCalc2节点。接着,可以响应于PCalc2节点只有PCall1
这一个子节点,而获取PCall1节点的执行语句列表,基于该执行语句列表生成PCalc2节点的执行语句列表,然后回溯到PJoin节点。需要说明,通过基于PCall1节点的执行语句列表生成PCalc2节点的执行语句列表,可以实现对PCall1节点和PCalc2节点的合并。接着,可以响应于PJoin节点为分支节点而将PJoin节点添加到前序节点堆栈。其中,前序节点堆栈可以用于维护分支节点。接着,可以响应于PCalc2节点和PCall1节点不依赖于PJoin节点的其他分支路径上的节点,而将PCalc2节点和PCall1节点划分到同一子图,并获取PCalc2节点的执行语句列表,基于该执行语句列表生成执行函数,进而生成该子图对应的包括该执行函数的调度节点。
接着,可以沿从PCorr节点到PCall2节点的方向进行节点遍历,在遍历到PCall2节点时,可以响应于其没有子节点而生成PCall2节点的执行语句列表,然后回溯到PCorr节点。接着,可以响应于PCorr节点为分支节点而将PCorr节点添加到前序节点堆栈,以及响应于PCall2节点不依赖于PCorr节点的子节点PCall3,而将PCall2节点划分到一个子图,以及获取PCall2节点的执行语句列表,基于该执行语句列表生成执行函数,进而生成该子图对应的包括该执行函数的调度节点。
接着,可以遍历PCorr节点的子节点PCall3,并响应于PCall3节点没有子节点而生成PCall3节点的执行语句列表,然后回溯到PCorr节点。接着,可以响应于PCall3节点的查询依赖于PCorr节点的子节点PCall2,而将PCorr节点和PCall3节点划分到同一子图。之后,可以获取PCall3节点的执行语句列表,基于该执行语句列表生成执行函数,进而生成包括该执行函数的子流程节点。接着,可以生成PCorr节点的执行语句列表,基于该执行语句列表生成执行函数,进而生成包括该执行函数的主流程节点。接着,可以将该子流程节点和该主流程节点的组合,作为该子图对应的调度节点。另外,还可以从前序节点堆栈中弹出PCorr节点,然后回溯到PJoin节点。
接着,可以响应于PJoin节点的各子节点均已处理完成,而生成PJoin节点的执行语句列表,并从前序节点堆栈中弹出PJoin节点,然后回溯到PCalc1节点。接着,可以响应于PCalc1节点为根节点,而将PJoin节点和PCalc1节点划分到同一子图,并获取PCalc1节点的子节点PJoin的执行语句列表,基于该执行语句列表生成PCalc1节点的执行语句列表,由此可以实现对PJoin节点和PCalc1节点的合并。然后,可以基于PCalc1节点的执行语句列表生成执行函数,进而生成该子图对应的包括该执行函数的调度节点。
在步骤S205,基于上述若干分支节点分别对应的子节点之间的依赖关系,构建上述多个调度节点之间的关系图。其中,该关系图可以是有向无环图(DAG)。
以图8中所示的调度节点为例,基于PJoin节点和PCorr节点分别对应的子节点之间的依赖关系,可以分析出调度节点A和调度节点B可以并行执行,调度节点B和由子流程节点D和主流程节点C的组合形成的调度节点需要串行执行,该形成的调度节点和调度节点E需要串行执行,调度节点A和调度节点E需要串行执行。基于该分析结果,可以构建调度节点A、B、E和该形成的调度节点之间的关系图。其中,该关系图可以如图9所示。图9是多个调度节点之间的关系图的一个示意图。在图9中,调度节点A和调度节点B为叶子节点,调度节点E为根节点。
需要说明,关系图的构建逻辑可以是预先编译好的,可以按照该构建逻辑,基于
上述若干分支节点分别对应的子节点之间的依赖关系,构建上述多个调度节点之间的关系图。
在步骤S207,基于关系图,对上述多个调度节点进行并发调度。
具体地,可以沿从叶子节点到根节点的方向,基于关系图,对上述多个调度节点进行并发调度。
进一步地,可以生成关系图中的各叶子节点分别对应的第一执行任务。第一执行任务例如可以包括其对应的叶子节点所包括的执行函数,或者通过对该执行函数进行编译而得到的执行代码。之后,可以并行执行各第一执行任务。作为示例,可以利用执行调度器生成该各叶子节点分别对应的第一执行任务,并将各第一执行任务添加到任务队列,以及利用多个执行线程从任务队列获取并执行第一执行任务。需要说明,可以生成针对关系图的执行调度触发语句,通过执行该触发语句使得执行调度器进行执行任务生成。
另外,可以响应于上述各第一执行任务中的第一执行任务执行完成,基于该第一执行任务对应的叶子节点的父节点与其他调度节点之间的关系,确定该父节点是否能执行。若确定结果为是,则可以生成该父节点对应的第二执行任务,并执行该第二执行任务。
以图9所示的关系图为例,可以先生成调度节点A和调度节点B分别对应的执行任务,以及并行执行调度节点A和调度节点B分别对应的执行任务。在调度节点B对应的执行任务执行完成后,可以生成由子流程节点D和主流程节点C的组合形成的调度节点对应的执行任务,并执行该执行任务。在该执行任务和调度节点A对应的执行任务均执行完成后,可以生成调度节点E对应的执行任务,并执行该执行任务。在调度节点E对应的执行任务执行完成后,可以判定执行结束。
实践中,如图10中所示,执行调度器可以将生成的执行任务添加到任务队列,执行线程池中的执行线程可以从任务队列中获取执行任务,并执行获取到的执行任务,以及在该执行任务执行完成后向执行调度器通知执行结果。执行调度器可以基于接收到的执行结果判断执行是否结束。若判定执行结束,则执行调度器可以返回最终的执行结果。若判定未执行结束,则执行调度器可以根据节点间关系,触发后续执行任务,并添加到任务队列。其中,图10是基于关系图进行的并发调度过程的一个示意图。
在一种实施方式中,可以将关系图编译生成执行代码,基于该执行代码对上述多个调度节点进行并发调度。由此,可以实现基于编译的并发执行。
图2对应的实施例提供的方案,可以获取待执行的目标查询语句的执行计划,其中,目标查询语句用于多个数据表的联合查询,该执行计划包括多个节点和节点之间的连接边,该多个节点中的叶子节点是针对该多个数据表中的数据表的查询调用节点,并且该多个节点中存在若干分支节点,该若干分支节点均具有多个子节点、且对应子节点之间的依赖关系。之后,可以沿该若干分支节点,将该执行计划分割成多个子图,并将该多个子图转换成多个调度节点。接着,可以基于该依赖关系,构建该多个调度节点之间的关系图。然后,可以基于该关系图,对该多个调度节点进行并发调度。由此,可以实现该多个数据表查询的并行化,从而能提高数据查询整体执行效率。
图11是本说明书实施例中的数据查询装置的结构示意图。该装置可以应用于任何
具有数据存储、计算、处理能力的设备、平台或设备集群,例如,可以应用于如图1中所示的目标平台102。
如图11所示,本说明书实施例中的数据查询装置1100可以包括:获取单元1101、分割单元1102、生成单元1103和调度单元1104。其中,获取单元1101被配置成获取待执行的目标查询语句的执行计划,目标查询语句用于多个数据表的联合查询,该执行计划包括多个节点和节点之间的连接边,该多个节点中的叶子节点是针对该多个数据表中的数据表的查询调用节点,并且该多个节点中存在若干分支节点,该若干分支节点均具有多个子节点、且对应子节点之间的依赖关系;分割单元1102被配置成沿该若干分支节点,将该执行计划分割成多个子图,并将该多个子图转换成多个调度节点;其中,单个子图包括至少一个节点;生成单元1103被配置成基于该依赖关系,构建该多个调度节点之间的关系图;调度单元1104被配置成基于该关系图,对该多个调度节点进行并发调度。
在一种实施方式中,单个分支节点包括以下中的任一种:并集节点、连接节点、关联查询节点;其中,并集节点用于对其各子节点的结果数据求并集,连接节点用于对其各子节点的结果数据进行连接操作,关联查询节点用于将其一个子节点的结果数据中的数据作为另一个子节点的入参,从而使得该另一个子节点基于该入参进行数据查询。
在一种实施方式中,上述执行计划为物理执行计划,上述多个节点为多个物理节点,上述查询调用节点为物理查询调度节点,上述若干分支节点为若干物理分支节点。
在一种实施方式中,获取单元1101可以进一步被配置成:获取目标查询语句的逻辑执行计划,逻辑执行计划包括多个逻辑节点和节点之间的连接边,该多个逻辑节点中的叶子节点是针对上述多个数据表中的数据表的逻辑查询调用节点,并且该多个逻辑节点中存在均具有多个子节点的若干逻辑分支节点;将逻辑执行计划转换成物理执行计划。
在一种实施方式中,获取单元1101可以进一步被配置成:将上述多个逻辑节点中的各逻辑查询调用节点,分别转换成相应的物理查询调用节点;将若干逻辑分支节点,分别转换成相应的物理分支节点;基于转换所得的节点和节点之间的关系,构建物理执行计划。
在一种实施方式中,上述多个逻辑节点还包括逻辑投影节点和若干逻辑过滤节点;以及获取单元1101还可以被配置成:在基于转换所得的节点和节点之间的关系构建物理执行计划之前,对于该若干逻辑过滤节点中任意的第一过滤节点,若第一过滤节点与逻辑投影节点之间有连接边相连,则对第一过滤节点和所述逻辑投影节点进行合并,并将合并结果转换成相应的物理计算节点;若第一过滤节点与逻辑投影节点之间没有连接边相连,则将第一过滤节点转换成相应的物理计算节点。
在一种实施方式中,分割单元1102可以进一步被配置成:对于上述若干分支节点中任意的第一分支节点,若第一分支节点的第一分支路径上的各节点各自的子节点数目不大于1、且该各节点不依赖于第一分支节点的其他分支路径上的节点,则将该各节点划分到同一第一子图;获取该各节点的执行语句列表,基于该执行语句列表生成第一执行函数;生成第一子图对应的包括第一执行函数的第一调度节点。
在一种实施方式中,分割单元1102可以进一步被配置成:若上述各节点各自的子节点数目不大于1、且上述各节点依赖于上述其他分支路径上的节点,则将上述各节点和第一分支节点划分到同一第二子图;获取上述各节点的执行语句列表,基于该执行语句列表生成第二执行函数,以及生成包括第二执行函数的子流程节点;获取第一分支节点的执行语句列表,基于该执行语句列表生成第三执行函数,以及生成包括第三执行函数的主流程节点;其中,第三执行函数调用第二执行函数;将主流程节点和子流程节点的组合,作为第二子图对应的第二调度节点。
在一种实施方式中,分割单元1102可以进一步被配置成:若第一分支节点与根节点有连接边相连、且未生成关联于第一分支节点的调度节点,则将第一分支节点和根节点划分到同一第三子图;获取第一分支节点和根节点的执行语句列表,基于该执行语句列表生成第四执行函数;生成第三子图对应的包括第四执行函数的第三调度节点。
在一种实施方式中,上述关系图为有向无环图。
在一种实施方式中,调度单元1104可以进一步被配置成:沿从叶子节点到根节点的方向,基于上述关系图,对上述多个调度节点进行并发调度。
在一种实施方式中,调度单元1104可以进一步被配置成:生成上述关系图中的各叶子节点分别对应的第一执行任务;并行执行各第一执行任务。
在一种实施方式中,调度单元1104可以进一步被配置成:将各第一执行任务添加到任务队列;利用多个执行线程从任务队列获取并执行第一执行任务。
在一种实施方式中,调度单元1104可以进一步被配置成:响应于各第一执行任务中的第一执行任务执行完成,基于该第一执行任务对应的叶子节点的父节点与其他调度节点之间的关系,确定该父节点是否能执行;若确定结果为是,则生成该父节点对应的第二执行任务,并执行第二执行任务。
在图11对应的装置实施例中,关于各单元的进一步解释,可参考前文中相关方法实施例中的相关说明,在此不再赘述。
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,其中,当该计算机程序在计算机中执行时,令计算机执行前文中的方法实施例描述的数据查询方法。
本说明书实施例还提供了一种计算设备,包括存储器和处理器,其中,该存储器中存储有可执行代码,该处理器执行该可执行代码时,实现前文中的方法实施例描述的数据查询方法。
本说明书实施例还提供了一种计算机程序产品,其中,当该计算机程序产品在计算机中执行时,令计算机执行前文中的方法实施例描述的数据查询方法。
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑
功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为服务器系统。当然,本申请不排除随着未来计算机技术的发展,实现上述实施例功能的计算机例如可以为个人计算机、膝上型计算机、车载人机交互设备、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
虽然本说明书一个或多个实施例提供了如实施例或流程图所述的方法操作步骤,但基于常规或者无创造性的手段可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际中的装置或终端产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境,甚至为分布式数据处理环境)。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、产品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、产品或者设备所固有的要素。在没有更多
限制的情况下,并不排除在包括所述要素的过程、方法、产品或者设备中还存在另外的相同或等同要素。例如若使用到第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本说明书一个或多个时可以把各模块的功能在同一个或多个软件和/或硬件中实现,也可以将实现同一功能的模块由多个子模块或子单元的组合实现等。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
本发明是参照根据本发明实施例的方法、装置(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储、石墨烯存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体
(transitory media),如调制的数据信号和载波。
本领域技术人员应明白,本说明书一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本说明书一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书一个或多个实施例可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本说明书一个或多个实施例,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本说明书的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
以上所述仅为本说明书一个或多个实施例的实施例而已,并不用于限制本说明书一个或多个实施例。对于本领域技术人员来说,本说明书一个或多个实施例可以有各种更改和变化。凡在本说明书的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在权利要求范围之内。
Claims (17)
- 一种数据查询方法,包括:获取待执行的目标查询语句的执行计划,所述目标查询语句用于多个数据表的联合查询,所述执行计划包括多个节点和节点之间的连接边,所述多个节点中的叶子节点是针对所述多个数据表中的数据表的查询调用节点,并且所述多个节点中存在若干分支节点,所述若干分支节点均具有多个子节点、且对应子节点之间的依赖关系;沿所述若干分支节点,将所述执行计划分割成多个子图,并将所述多个子图转换成多个调度节点;其中,单个子图包括至少一个节点;基于所述依赖关系,构建所述多个调度节点之间的关系图;基于所述关系图,对所述多个调度节点进行并发调度。
- 根据权利要求1所述的方法,其中,单个所述分支节点包括以下中的任一种:并集节点、连接节点、关联查询节点;其中,所述并集节点用于对其各子节点的结果数据求并集,所述连接节点用于对其各子节点的结果数据进行连接操作,所述关联查询节点用于将其一个子节点的结果数据中的数据作为另一个子节点的入参,从而使得该另一个子节点基于该入参进行数据查询。
- 根据权利要求1所述的方法,其中,所述执行计划为物理执行计划,所述多个节点为多个物理节点,所述查询调用节点为物理查询调度节点,所述若干分支节点为若干物理分支节点。
- 根据权利要求3所述的方法,其中,所述获取待执行的目标查询语句的执行计划,包括:获取所述目标查询语句的逻辑执行计划,所述逻辑执行计划包括多个逻辑节点和节点之间的连接边,所述多个逻辑节点中的叶子节点是针对所述多个数据表中的数据表的逻辑查询调用节点,并且所述多个逻辑节点中存在均具有多个子节点的若干逻辑分支节点;将所述逻辑执行计划转换成所述物理执行计划。
- 根据权利要求4所述的方法,其中,所述将所述逻辑执行计划转换成所述物理执行计划,包括:将所述多个逻辑节点中的各逻辑查询调用节点,分别转换成相应的物理查询调用节点;将所述若干逻辑分支节点,分别转换成相应的物理分支节点;基于转换所得的节点和节点之间的关系,构建所述物理执行计划。
- 根据权利要求5所述的方法,其中,所述多个逻辑节点还包括逻辑投影节点和若干逻辑过滤节点;以及在基于转换所得的节点和节点之间的关系,构建所述物理执行计划之前,还包括:对于所述若干逻辑过滤节点中任意的第一过滤节点,若所述第一过滤节点与所述逻辑投影节点之间有连接边相连,则对所述第一过滤节点和所述逻辑投影节点进行合并,并将合并结果转换成相应的物理计算节点;若所述第一过滤节点与所述逻辑投影节点之间没有连接边相连,则将所述第一过滤节点转换成相应的物理计算节点。
- 根据权利要求1所述的方法,其中,所述沿所述若干分支节点,将所述执行计划分割成多个子图,并将所述多个子图转换成多个调度节点,包括:对于所述若干分支节点中任意的第一分支节点,若所述第一分支节点的第一分支路径上的各节点各自的子节点数目不大于1、且所述各节点不依赖于所述第一分支节点的其他分支路径上的节点,则将所述各节点划分到同一第一子图;获取所述各节点的执行语句列表,基于该执行语句列表生成第一执行函数;生成所述第一子图对应的包括所述第一执行函数的第一调度节点。
- 根据权利要求7所述的方法,其中,所述沿所述若干分支节点,将所述执行计划分割成多个子图,并将所述多个子图转换成多个调度节点,还包括:若所述各节点各自的子节点数目不大于1、且所述各节点依赖于所述其他分支路径上的节点,则将所述各节点和所述第一分支节点划分到同一第二子图;获取所述各节点的执行语句列表,基于该执行语句列表生成第二执行函数,以及生成包括所述第二执行函数的子流程节点;获取所述第一分支节点的执行语句列表,基于该执行语句列表生成第三执行函数,以及生成包括所述第三执行函数的主流程节点;其中,所述第三执行函数调用所述第二执行函数;将所述主流程节点和所述子流程节点的组合,作为所述第二子图对应的第二调度节点。
- 根据权利要求7-8之一所述的方法,其中,所述沿所述若干分支节点,将所述执行计划分割成多个子图,并将所述多个子图转换成多个调度节点,还包括:若所述第一分支节点与根节点有连接边相连、且未生成关联于所述第一分支节点的调度节点,则将所述第一分支节点和所述根节点划分到同一第三子图;获取所述第一分支节点和所述根节点的执行语句列表,基于该执行语句列表生成第四执行函数;生成所述第三子图对应的包括所述第四执行函数的第三调度节点。
- 根据权利要求1所述的方法,其中,所述关系图为有向无环图。
- 根据权利要求1所述的方法,其中,所述基于所述关系图,对所述多个调度节点进行并发调度,包括:沿从叶子节点到根节点的方向,基于所述关系图,对所述多个调度节点进行并发调度。
- 根据权利要求11所述的方法,其中,所述沿从叶子节点到根节点的方向,基于所述关系图,对所述多个调度节点进行并发调度,包括:生成所述关系图中的各叶子节点分别对应的第一执行任务;并行执行各第一执行任务。
- 根据权利要求12所述的方法,还包括:将所述各第一执行任务添加到任务队列;所述并行执行各第一执行任务,包括:利用多个执行线程从所述任务队列获取并执行第一执行任务。
- 根据权利要求12所述的方法,其中,所述沿从叶子节点到根节点的方向,基 于所述关系图,对所述多个调度节点进行并发调度,还包括:响应于所述各第一执行任务中的第一执行任务执行完成,基于该第一执行任务对应的叶子节点的父节点与其他调度节点之间的关系,确定所述父节点是否能执行;若确定结果为是,则生成所述父节点对应的第二执行任务,并执行所述第二执行任务。
- 一种数据查询装置,包括:获取单元,被配置成获取待执行的目标查询语句的执行计划,所述目标查询语句用于多个数据表的联合查询,所述执行计划包括多个节点和节点之间的连接边,所述多个节点中的叶子节点是针对所述多个数据表中的数据表的查询调用节点,并且所述多个节点中存在若干分支节点,所述若干分支节点均具有多个子节点、且对应子节点之间的依赖关系;分割单元,被配置成沿所述若干分支节点,将所述执行计划分割成多个子图,并将所述多个子图转换成多个调度节点;其中,单个子图包括至少一个节点;生成单元,被配置成基于所述依赖关系,构建所述多个调度节点之间的关系图;调度单元,被配置成基于所述关系图,对所述多个调度节点进行并发调度。
- 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-14中任一项所述的方法。
- 一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-14中任一项所述的方法。
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310458769.7A CN116483858A (zh) | 2023-04-25 | 2023-04-25 | 数据查询方法及装置 |
| CN202310458769.7 | 2023-04-25 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024221908A1 true WO2024221908A1 (zh) | 2024-10-31 |
Family
ID=87213459
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/135018 Pending WO2024221908A1 (zh) | 2023-04-25 | 2023-11-29 | 数据查询方法及装置 |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN116483858A (zh) |
| WO (1) | WO2024221908A1 (zh) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116483858A (zh) * | 2023-04-25 | 2023-07-25 | 蚂蚁区块链科技(上海)有限公司 | 数据查询方法及装置 |
| CN117271580B (zh) * | 2023-09-04 | 2026-01-27 | 阿里云计算有限公司 | 查询计划的处理方法及装置 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060253422A1 (en) * | 2005-05-06 | 2006-11-09 | Microsoft Corporation | Efficient computation of multiple group by queries |
| CN114969261A (zh) * | 2022-05-30 | 2022-08-30 | 平安科技(深圳)有限公司 | 基于人工智能的数据查询方法、装置、电子设备及介质 |
| CN116483858A (zh) * | 2023-04-25 | 2023-07-25 | 蚂蚁区块链科技(上海)有限公司 | 数据查询方法及装置 |
| CN116483859A (zh) * | 2023-04-25 | 2023-07-25 | 蚂蚁区块链科技(上海)有限公司 | 数据查询方法及装置 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105786808B (zh) * | 2014-12-15 | 2019-06-18 | 阿里巴巴集团控股有限公司 | 一种用于分布式执行关系型计算指令的方法与设备 |
| US11704316B2 (en) * | 2019-05-31 | 2023-07-18 | Qubole, Inc. | Systems and methods for determining peak memory requirements in SQL processing engines with concurrent subtasks |
| CN111209302A (zh) * | 2020-01-02 | 2020-05-29 | 中科驭数(北京)科技有限公司 | 数据库访问的并行执行方法及装置 |
| CN113449856B (zh) * | 2020-03-27 | 2025-03-11 | 华为技术有限公司 | 一种控制流图的处理方法及相关设备 |
| CN115408427A (zh) * | 2021-05-28 | 2022-11-29 | 华为技术有限公司 | 用于数据搜索的方法、装置及设备 |
-
2023
- 2023-04-25 CN CN202310458769.7A patent/CN116483858A/zh active Pending
- 2023-11-29 WO PCT/CN2023/135018 patent/WO2024221908A1/zh active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060253422A1 (en) * | 2005-05-06 | 2006-11-09 | Microsoft Corporation | Efficient computation of multiple group by queries |
| CN114969261A (zh) * | 2022-05-30 | 2022-08-30 | 平安科技(深圳)有限公司 | 基于人工智能的数据查询方法、装置、电子设备及介质 |
| CN116483858A (zh) * | 2023-04-25 | 2023-07-25 | 蚂蚁区块链科技(上海)有限公司 | 数据查询方法及装置 |
| CN116483859A (zh) * | 2023-04-25 | 2023-07-25 | 蚂蚁区块链科技(上海)有限公司 | 数据查询方法及装置 |
Non-Patent Citations (2)
| Title |
|---|
| ANONYMOUS: "A comprehensive guide to the PolarDB-X parallel computing framework", ALIBABA CLOUD DATABASE, 13 March 2021 (2021-03-13), XP093229208 * |
| ANONYMOUS: "Distributed execution plan scheduling", 11 January 2021 (2021-01-11), XP093229182, Retrieved from the Internet <URL:https://www.bookstack.cn/read/oceanbase-1.4-zh/f98b443179caddad.md> * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116483858A (zh) | 2023-07-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Inoubli et al. | An experimental survey on big data frameworks | |
| Belcastro et al. | Programming models and systems for big data analysis | |
| JP7360328B2 (ja) | 異種にプログラムされたデータ処理システムの自動依存性アナライザ | |
| US8601458B2 (en) | Profile-driven data stream processing | |
| CN109791492B (zh) | 流水线相关树查询优化器和调度器 | |
| Menard et al. | High-performance deterministic concurrency using lingua franca | |
| WO2024221909A1 (zh) | 数据查询方法及装置 | |
| WO2024221908A1 (zh) | 数据查询方法及装置 | |
| Wu et al. | Big data programming models | |
| US10268461B2 (en) | Global data flow optimization for machine learning programs | |
| WO2024193164A1 (zh) | 一种代码处理方法及装置 | |
| CN115934161A (zh) | 一种代码变更的影响分析方法、装置以及设备 | |
| Sbirlea et al. | Dfgr an intermediate graph representation for macro-dataflow programs | |
| Gijsbers et al. | An efficient scalable runtime system for macro data flow processing using S-Net | |
| JP4830108B2 (ja) | プログラム処理装置、プログラム処理方法、並列処理プログラム用コンパイラおよび並列処理プログラム用コンパイラを格納した記録媒体 | |
| US8572585B2 (en) | Using compiler-generated tasks to represent programming elements | |
| Jacques-Silva et al. | Unified lineage system: Tracking data provenance at scale | |
| Thorelli | The EDA multiprocessing model | |
| CN114510419A (zh) | 性能分析编程框架、方法和装置 | |
| JP2023084609A (ja) | 変換プログラムおよび変換方法 | |
| Ritter et al. | Catalog of optimization strategies and realizations for composed integration patterns | |
| JP2016146022A (ja) | モデルベース開発支援装置、モデルベース開発支援方法、およびモデルベース開発支援プログラム | |
| US20080222598A1 (en) | Using a system of annotations to generate views and adapters | |
| Ayyalasomayajula et al. | Experiences running mixed workloads on cray analytics platforms | |
| Raghavan et al. | Alto: Orchestrating Distributed Compound AI Systems with Nested Ancestry |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23935075 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |