CN116126901A - Data processing method, device, electronic equipment and computer readable storage medium - Google Patents

Data processing method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN116126901A
CN116126901A CN202310165032.6A CN202310165032A CN116126901A CN 116126901 A CN116126901 A CN 116126901A CN 202310165032 A CN202310165032 A CN 202310165032A CN 116126901 A CN116126901 A CN 116126901A
Authority
CN
China
Prior art keywords
node
sub
data
query plan
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310165032.6A
Other languages
Chinese (zh)
Inventor
吕亚宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Yaxin Anhui Technology Co ltd
Original Assignee
Hunan Yaxin Anhui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Yaxin Anhui Technology Co ltd filed Critical Hunan Yaxin Anhui Technology Co ltd
Priority to CN202310165032.6A priority Critical patent/CN116126901A/en
Publication of CN116126901A publication Critical patent/CN116126901A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a computer readable storage medium, and relates to the technical field of database management. The method comprises the following steps: traversing each node of the query plan tree from the leaf node of the query plan tree, and performing optimization operation on each node in the traversing process until traversing to the root node of the query plan tree to generate a target query plan tree; dividing the target query plan tree into a plurality of sub-plans based on the extended nodes; distributing the multiple sub-plans to each data node for execution, and obtaining an execution result of each sub-plan; and obtaining and collecting the execution results of each sub-plan to obtain the target execution result of the query plan tree. According to the method and the device for scheduling the data, the nodes are mapped to different data nodes through the feature list of the nodes of the query plan tree, and data scheduling is carried out according to the expansion nodes, so that a distributed query plan comprising a plurality of sub-plans is constructed, and the query efficiency is effectively improved.

Description

Data processing method, device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of database management technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a computer readable storage medium.
Background
A database management system is a large piece of software that manipulates and manages databases for building, using, and maintaining databases. The user can access the data in the database through the database management system, and the database manager also performs maintenance work of the database through the database management system. Where a query plan is a set of steps that the database management system performs to complete a query, optimization of the query plan determines the query efficiency of the query operation.
In the prior art, a database management system generally realizes data storage in a query plan based on a storage node, and transmits data to a computing node for optimization and execution of the query plan, wherein the data transmission quantity between the computing node and the storage node is large, so that the consumption resource of the computing node is increased, and the problem of low query efficiency exists.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a computer readable storage medium, which can solve the problem of low query efficiency. The technical scheme is as follows:
according to an aspect of the embodiments of the present application, there is provided a data processing method, which is applied to a coordinating node of a database management system, including:
Traversing each node of the query plan tree from the leaf node of the query plan tree, and performing the following optimization operation on each node in the traversing process until traversing to the root node of the query plan tree to generate a target query plan tree;
wherein the optimizing operation includes:
acquiring a characteristic list of the node; calculating data nodes corresponding to the nodes through a preset mapping function according to the feature list; determining whether to insert an extended node at a next level of the node based on the feature list; the feature list is used for providing data for the query task corresponding to the node; the expansion node is used for scheduling the query task corresponding to the node;
dividing the target query plan tree into a plurality of sub-plans based on the extended nodes;
distributing the multiple sub-plans to each data node for execution, and obtaining an execution result of each sub-plan;
and obtaining and collecting the execution results of each sub-plan to obtain the target execution result of the query plan tree.
In one possible implementation manner, the determining, based on the feature list, whether to insert the extended node at the next level of the node includes:
when the feature list and the original table corresponding to the node do not meet the preset matching condition, inserting an expansion node in the next level of the node;
And traversing to the next node when the feature list and the original table corresponding to the node meet the preset matching condition.
In one possible implementation manner, the determining, based on the feature list, whether to insert the extended node at the next level of the node further includes:
when the feature list and the original list corresponding to the node do not meet the preset matching condition and the query task corresponding to the node is an aggregation task, decomposing the aggregation task into an intermediate aggregation task and a terminal aggregation task, and inserting an extension node between the intermediate aggregation task and the terminal aggregation task; the aggregation task is a grouping aggregation task of a non-grouping aggregation task or a non-table distribution field of a first field of a grouping key.
In yet another possible implementation manner, the matching condition includes:
the feature list comprises a distribution key corresponding to the original table, and the distribution mode of the original table corresponding to the node is any one of hash, modulo and copy.
In yet another possible implementation, the above feature list is constructed based on the following:
acquiring a list of an original list corresponding to the node and binding parameters corresponding to the node;
and constructing a characteristic list of the node based on the list of the list and the binding parameters.
According to an aspect of the embodiments of the present application, there is provided a data processing method, which is applied to a data node of a database management system, including:
acquiring a sub-plan corresponding to the data node; wherein, the sub-plan is obtained by dividing the query plan tree by the coordination node of the database based on the expansion node;
executing the sub-plan to obtain an execution result of the sub-plan, and storing the execution result in a preset storage address.
According to another aspect of an embodiment of the present application, there is provided a data processing apparatus, the apparatus including:
the optimizing module is used for traversing each node of the query plan tree from the leaf node of the query plan tree, and in the traversing process, carrying out the following optimizing operation on each node until traversing to the root node of the query plan tree to generate a target query plan tree;
wherein the optimizing operation includes:
acquiring a characteristic list of the node; calculating data nodes corresponding to the nodes through a preset mapping function according to the feature list; determining whether to insert an extended node at a next level of the node based on the feature list; the feature list is used for providing data for the query task corresponding to the node; the expansion node is used for scheduling the query task corresponding to the node;
The dividing module is used for dividing the target query plan tree into a plurality of sub plans based on the expansion node;
the distribution module is used for distributing the multiple sub-plans to each data node for execution to obtain an execution result of each sub-plan;
and the collecting module is used for obtaining and collecting the execution results of each sub-plan to obtain the target execution result of the query plan tree.
In one possible implementation manner, the optimization module is configured to, when determining whether to insert an extension node at a next level of the node based on the feature list:
when the feature list and the original table corresponding to the node do not meet the preset matching condition, inserting an expansion node in the next level of the node;
and traversing to the next node when the feature list and the original table corresponding to the node meet the preset matching condition.
In one possible implementation manner, the optimization module is configured to, when determining whether to insert an extension node at a next level of the node based on the feature list:
when the feature list and the original list corresponding to the node do not meet the preset matching condition and the query task corresponding to the node is an aggregation task, decomposing the aggregation task into an intermediate aggregation task and a terminal aggregation task, and inserting an extension node between the intermediate aggregation task and the terminal aggregation task; the aggregation task is a grouping aggregation task of a non-grouping aggregation task or a non-table distribution field of a first field of a grouping key.
In yet another possible implementation manner, the matching condition includes:
the feature list comprises a distribution key corresponding to the original table, and the distribution mode of the original table corresponding to the node is any one of hash, modulo and copy.
In yet another possible implementation, the above feature list is constructed based on the following:
acquiring a list of an original list corresponding to the node and binding parameters corresponding to the node;
and constructing a characteristic list of the node based on the list of the list and the binding parameters.
According to another aspect of an embodiment of the present application, there is provided a data processing apparatus, the apparatus including:
the acquisition module is used for acquiring the sub-plan corresponding to the data node; wherein, the sub-plan is obtained by dividing the query plan tree by a coordination node of the database;
and the execution module is used for executing the sub-plan, obtaining an execution result of the sub-plan and storing the execution result in a preset storage address.
According to another aspect of the embodiments of the present application, there is provided an electronic device including: a memory, a processor and a computer program stored on the memory, the processor executing the computer program to perform the steps of the method according to the first aspect of the embodiments of the present application.
According to a further aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect of embodiments of the present application.
According to an aspect of embodiments of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the first aspect of embodiments of the present application.
The beneficial effects that technical scheme that this application embodiment provided brought are:
in the process of traversing the nodes of the query plan tree, determining the data node corresponding to each node through the feature list of each node, and determining whether to insert an expansion node in the next level of the node so as to obtain an optimized target query plan tree; the method comprises the steps of dividing a target query plan tree into a plurality of sub plans according to the expansion node, enabling the coordination node to distribute the sub plans to corresponding data nodes for execution, and obtaining a target execution result of the query plan tree.
According to the embodiment of the application, each node can be mapped to different data nodes according to the characteristic list of each node of the query plan tree, and data scheduling is carried out according to the expansion nodes, so that a distributed query plan comprising a plurality of sub-plans is constructed; compared with the prior art that larger data volume transmission exists between the computing node and the storage node, the method and the device can issue each sub-plan to the corresponding data node for execution, can effectively reduce the data transmission volume between the data node and the coordination node, simultaneously, the data node participates in the execution of the sub-plan, can also reduce the computing load of the coordination node, effectively improve the query efficiency and improve the user experience.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is an application scenario schematic diagram of a data processing method provided in an embodiment of the present application;
fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application;
fig. 3 is a matching feature table of a MapReduce operator in the data processing method provided in the embodiment of the present application;
FIG. 4 is a flowchart illustrating a data processing method according to an embodiment of the present disclosure;
FIG. 5-1 is a flow chart of an exemplary data processing method according to an embodiment of the present application;
FIG. 5-2 is a schematic diagram of a target query plan tree in a data processing method according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a data processing apparatus according to another embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a data processing electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present application. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The database management system is the core of the management information system, and the OLTP (On-Line Transaction Processing, online transaction) and OLAP (Online Analytical Processing, online analysis process) based On the database are one of the most important computer applications for the departments of banks, enterprises, government, etc. From the application instance of most systems, the query operation occupies the greatest weight among the various database operations, and the select statement on which the query operation is based is the most costly one among the SQL (Structured Query Language ) statements. If the amount of data accumulates to some extent, such as millions or even tens of millions of records of account database table information for a bank, a full table scan often takes tens of minutes or even hours. If a better query strategy than full table scanning is employed, the query time can often be reduced to a few minutes, thus visualizing the importance of query optimization techniques.
The inventor finds that the application of the distributed database framework (such as a proxy/database middleware framework and a shared storage distributed database framework) can improve the query efficiency; the proxy layer/middleware layer in the proxy/database middleware architecture needs to rewrite the SQL sentences and distribute the SQL sentences to the database nodes, the rewriting process is not suitable for all the SQL sentences, and the SQL sentences need to be repeatedly analyzed between the proxy layer/middleware layer and the database nodes, so that the problem of low data query efficiency exists.
The data processing method, device, electronic equipment and computer readable storage medium provided by the application aim to solve the technical problems in the prior art.
The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.
As shown in fig. 1, the data processing method of the present application may be applied to the scenario shown in fig. 1, specifically, the coordination node of the database management system traverses each node of the query plan tree from the leaf node of the query plan tree, and in the traversal process, performs an optimization operation for each node until traversing to the root node of the query plan tree, so as to generate the target query plan tree; dividing the target query plan tree into a plurality of sub plans based on the expansion node, and distributing the plurality of sub plans to corresponding data nodes; each data node executes the sub-plan to obtain an execution result of each sub-plan, and stores the execution result to a preset address; and the coordination node acquires and gathers the execution results of each sub-plan from the preset address to obtain the target execution result of the query plan tree.
In the scenario shown in fig. 1, the database management system may be operated in a server, or in other scenarios, may be operated in a terminal.
As will be appreciated by those skilled in the art, a "terminal" as used herein may be a cell phone, tablet computer, PDA (Personal Digital Assistant ), MID (Mobile Internet Device, mobile internet device), etc.; the "server" may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.
An embodiment of the present application provides a data processing method, as shown in fig. 2, which may be applied to a Coordination Node (CN) of a database management system, where the method includes:
s201, traversing each node of the query plan tree from the leaf node of the query plan tree, and performing the following optimization operation on each node in the traversing process until traversing to the root node of the query plan tree to generate a target query plan tree.
Wherein, the optimizing operation comprises: acquiring a characteristic list of the node; calculating Data nodes (DN, data Node) corresponding to the nodes through a preset mapping function according to the feature list; it is determined whether to insert an extended node at a next level of the node based on the feature list.
Further, the feature list is used for providing data for the query task corresponding to the node; the expansion node is used for scheduling the query task corresponding to the node. The extended node may be a MapReduce node.
The MapReduce is a calculation model for big data parallel processing, and by means of the model, a stack of disordered data, namely all nodes of a query plan tree, can be summarized according to a feature list to obtain data nodes corresponding to all the nodes, and then all sub plans in the query plan tree are executed in parallel based on a plurality of data nodes to obtain an execution result.
Specifically, the feature list may be a MapKey list of the node, the mapping function may be a Map interface function, and the MapKey list may be constructed based on a list of the node and binding parameters. For example, for a Scan node, its corresponding MapKey is the distribution key of the original table to which the node corresponds; for a Join node, the corresponding MapKey is a Join equivalent condition column of the original table; for the GroupAgg node, the corresponding MapKey is the Group Key (Group Key) column of the original table.
In the embodiment of the application, the coordination node can traverse the query plan tree subsequently, acquire a MapKey list of each node, and calculate the data node corresponding to the node according to the MapKey list through a Map interface function; meanwhile, whether the MapReduce node is inserted in the next level of the node can be determined based on the matching degree of the feature list of the node and the original table corresponding to the node.
S202, dividing the target query plan tree into a plurality of sub plans based on the expansion node, and distributing the plurality of sub plans to each data node for execution to obtain an execution result of each sub plan.
The sub-plan may be a sub-plan corresponding to a MapReduce node; the plurality of sub-plans constitutes a distributed plan of the query plan tree. One MapReduce node may correspond to multiple data nodes such that multiple sub-plans are co-processed and executed in the corresponding data nodes.
Specifically, the coordination node distributes binding parameters of the plurality of sub-plans and the corresponding nodes to the corresponding data nodes; the data node starts and executes the corresponding sub-plan after receiving the sub-plan, and places the execution result of the sub-plan in the tuple slot, from which the coordination node can pull the execution result.
S203, the execution results of all the sub plans are acquired and collected, and the target execution result of the query plan tree is obtained.
Specifically, the coordination node obtains and gathers the execution results of each sub-plan through a reduce interface function to obtain the target execution result of the query plan tree, and stores the target execution result in the table tuple slot.
In the embodiment of the present application, when there is broadcasting or data distribution between the data nodes, the data nodes may first obtain the execution result of the associated level sub-plan through the reduce interface function, so as to complete the execution process of the current sub-plan based on the execution result.
In the process of traversing the nodes of the query plan tree, determining the data node corresponding to each node through the feature list of each node, and determining whether to insert an expansion node in the next level of the node so as to obtain an optimized target query plan tree; the method comprises the steps of dividing a target query plan tree into a plurality of sub plans according to the expansion node, enabling the coordination node to distribute the sub plans to corresponding data nodes for execution, and obtaining a target execution result of the query plan tree.
According to the embodiment of the application, each node can be mapped to different data nodes according to the characteristic list of each node of the query plan tree, and data scheduling is carried out according to the expansion nodes, so that a distributed query plan comprising a plurality of sub-plans is constructed; compared with the prior art that larger data volume transmission exists between the computing node and the storage node, the method and the device can issue each sub-plan to the corresponding data node for execution, can effectively reduce the data transmission volume between the data node and the coordination node, simultaneously, the data node participates in the execution of the sub-plan, can also reduce the computing load of the coordination node, effectively improve the query efficiency and improve the user experience.
The embodiment of the application provides a possible implementation manner, which determines whether to insert an extended node in the next level of the node based on the feature list, and includes:
when the feature list and the original table corresponding to the node do not meet the preset matching condition, inserting an expansion node in the next level of the node;
and traversing to the next node when the feature list and the original table corresponding to the node meet the preset matching condition.
The embodiment of the application provides a possible implementation manner, and the matching conditions include:
the feature list comprises a distribution key corresponding to the original table, and the distribution mode of the original table corresponding to the node is any one of hash, modulo and copy.
In the embodiment of the application, when the MapKey list corresponding to the planning node and the original table meet the matching condition, the mapping relation from the MapKey list of the planning node to the data node is consistent, and then the upward iteration is continued according to the subsequent traversal rule; and when the MapKey list corresponding to the plan node and the original table do not meet the matching condition, indicating that the mapping relation from the MapKey list of the plan node to the data node is inconsistent, inserting a MapReduce node under the plan node.
The expansion node may match the corresponding MapReduce operator, and the matching feature of the MapReduce operator is shown in fig. 3.
The embodiment of the application provides a possible implementation manner, which determines whether to insert an extended node in a next level of the node based on the feature list, and further includes:
when the feature list and the original list corresponding to the node do not meet the preset matching condition and the query task corresponding to the node is an aggregation task, decomposing the aggregation task into an intermediate aggregation task and a terminal aggregation task, and inserting an extension node between the intermediate aggregation task and the terminal aggregation task.
The aggregation task is a grouping aggregation task of a non-grouping aggregation task or a non-table distribution field of a first field of a grouping key.
In the embodiment of the application, the aggregation task corresponding to the node is split into Partial Aggregate (intermediate aggregation) below the MapReduce node and Finalize Aggregate (terminal aggregation) above the MapReduce node; the original aggregation task is divided into two execution stages of intermediate aggregation and final aggregation, wherein the intermediate aggregation stage can be executed at a coordination node of the database management system, the final aggregation stage can be executed at a data node of the database management system, and the calculation load of the coordination node is effectively reduced by utilizing a plurality of data nodes, so that the query efficiency is further improved.
The embodiment of the application provides a possible implementation manner, and the feature list is constructed based on the following manner:
s301, a list of an original list corresponding to the node and binding parameters corresponding to the node are obtained.
S302, constructing a characteristic list of the node based on the list and the binding parameters.
In the embodiment of the application, the coordination node acquires a list (rte-array, range table entry array) of the list of nodes in the query plan tree and binding parameters (bind-params), and constructs a MapKey list of the nodes based on rte-array and bind-params.
An embodiment of the present application provides a data processing method, as shown in fig. 4, which may be applied to a data node of a database management system, where the method includes:
s401, obtaining a sub-plan corresponding to the data node.
Wherein, the sub-plan is obtained by dividing the query plan tree by the coordination node of the database based on the expansion node. The extended node is obtained when the coordinating node optimizes the query plan tree based on the data processing method of the first aspect of the embodiment.
Specifically, the coordination node optimizes the query plan tree, determines an expansion node of the target query plan tree, namely a MapReduce node, and then acquires a sub-plan and a data node mapped by each expansion node based on the MapReduce node; and then the co-regulation point distributes each sub-plan and the binding parameters to the corresponding data node, and the data node acquires the corresponding sub-plan.
S402, executing the sub-plan to obtain an execution result of the sub-plan, and storing the execution result in a preset storage address.
Specifically, the data node executes the sub-plan, and stores the execution result of the obtained sub-plan in the table tuple slot for pulling by the data node or the coordination node of the last level corresponding to the sub-plan.
In the embodiment of the application, when a data node needs to broadcast or redistribute data, that is, there is a reference of the execution result of a sub-plan between the data nodes, the data node first runs a reduce interface function of ClusterReduce or ClusterMergeReduce, pulls the result of a preamble sub-plan, and joins the execution process of the sub-plan of the node. The method and the device ensure timeliness of data transmission among the data nodes, further reduce load of the coordination nodes, and effectively improve execution efficiency of the query plan.
In the process of traversing the nodes of the query plan tree, determining the data node corresponding to each node through the feature list of each node, and determining whether to insert an expansion node in the next level of the node so as to obtain an optimized target query plan tree; the method comprises the steps of dividing a target query plan tree into a plurality of sub plans according to the expansion node, enabling the coordination node to distribute the sub plans to corresponding data nodes for execution, and obtaining a target execution result of the query plan tree.
According to the embodiment of the application, each node can be mapped to different data nodes according to the characteristic list of each node of the query plan tree, and data scheduling is carried out according to the expansion nodes, so that a distributed query plan comprising a plurality of sub-plans is constructed; compared with the prior art that larger data volume transmission exists between the computing node and the storage node, the method and the device can issue each sub-plan to the corresponding data node for execution, can effectively reduce the data transmission volume between the data node and the coordination node, simultaneously, the data node participates in the execution of the sub-plan, can also reduce the computing load of the coordination node, effectively improve the query efficiency and improve the user experience.
For a better understanding of the above data processing method, an example of the data processing method of the present application is described in detail below in conjunction with fig. 5-1, and is applied to a database management system, where the database management system includes a coordinating node and a data node, and the method includes the following steps:
s501, the coordination node traverses each node of the query plan tree from the leaf node of the query plan tree, and in the traversing process, performs the following optimization operation on each node until traversing to the root node of the query plan tree, and generates the target query plan tree.
Wherein, the optimizing operation comprises: obtaining a MapKey list of nodes; calculating data nodes corresponding to the nodes through a preset mapping function according to the MapKey list; it is determined whether to insert a MapReduce node at a next level of the node based on the MapKey list.
In the embodiment of the application, when the MapKey list corresponding to the planning node and the original table meet the matching condition, the mapping relation from the MapKey list of the planning node to the data node is consistent, and then the upward iteration is continued according to the subsequent traversal rule; and when the MapKey list corresponding to the plan node and the original table do not meet the matching condition, indicating that the mapping relation from the MapKey list of the plan node to the data node is inconsistent, inserting a MapReduce node under the plan node.
When the MapKey list and the original list corresponding to the node do not meet the preset matching condition and the query task corresponding to the node is an aggregation task, decomposing the aggregation task into an intermediate aggregation task and a terminal aggregation task, and inserting a MapReduce node between the intermediate aggregation task and the terminal aggregation task.
The aggregation task is a grouping aggregation task of a non-grouping aggregation task or a non-table distribution field of a first field of a grouping key.
Specifically, the matching conditions include: the MapKey list comprises a distribution key corresponding to the original table, and the distribution mode of the original table corresponding to the node is any one of hash, modulo and copy.
S502, the coordination node divides the target query plan tree into a plurality of sub plans based on the MapReduce node, and distributes the plurality of sub plans to each data node.
In this embodiment of the present application, as shown in fig. 5-2, the target query plan tree includes a sub-query plan SubPlan1 and a sub-query plan SubPlan2, when the distribution key values corresponding to the SubPlan1 and the SubPlan2 are different and mapped to different data nodes, the SubPlan1 is scheduled to be executed by the data node DN1, the execution result of the DN1 is collected by the ClusterReduce node of the SubPlan2, and the data node DN2 completes execution of the SubPlan2 after obtaining the execution result of the DN 1.
S503, the data node acquires the corresponding sub-plan and executes the sub-plan, and stores the execution result of the obtained sub-plan in the table tuple slot.
S504, the coordination node obtains and gathers execution results of each sub-plan from the table element group slot through the reduce interface function so as to obtain target execution results of the query plan tree, and stores the target execution results in the table element group slot.
In the process of traversing the nodes of the query plan tree, determining the data node corresponding to each node through the feature list of each node, and determining whether to insert an expansion node in the next level of the node so as to obtain an optimized target query plan tree; the method comprises the steps of dividing a target query plan tree into a plurality of sub plans according to the expansion node, enabling the coordination node to distribute the sub plans to corresponding data nodes for execution, and obtaining a target execution result of the query plan tree.
According to the embodiment of the application, each node can be mapped to different data nodes according to the characteristic list of each node of the query plan tree, and data scheduling is carried out according to the expansion nodes, so that a distributed query plan comprising a plurality of sub-plans is constructed; compared with the prior art that larger data volume transmission exists between the computing node and the storage node, the method and the device can issue each sub-plan to the corresponding data node for execution, can effectively reduce the data transmission volume between the data node and the coordination node, simultaneously, the data node participates in the execution of the sub-plan, can also reduce the computing load of the coordination node, effectively improve the query efficiency and improve the user experience.
An embodiment of the present application provides a data processing apparatus, as shown in fig. 6, the data processing apparatus 60 may include: an optimization module 601, a division module 602, a distribution module 603 and a collection module 604;
the optimizing module is used for traversing each node of the query plan tree from the leaf node of the query plan tree, and in the traversing process, carrying out the following optimizing operation on each node until traversing to the root node of the query plan tree to generate a target query plan tree;
wherein the optimizing operation includes:
acquiring a characteristic list of the node; calculating data nodes corresponding to the nodes through a preset mapping function according to the feature list; determining whether to insert an extended node at a next level of the node based on the feature list; the feature list is used for providing data for the query task corresponding to the node; the expansion node is used for scheduling the query task corresponding to the node;
the dividing module is used for dividing the target query plan tree into a plurality of sub plans based on the expansion node;
the distribution module is used for distributing the multiple sub-plans to each data node for execution to obtain an execution result of each sub-plan;
and the collecting module is used for obtaining and collecting the execution results of each sub-plan to obtain the target execution result of the query plan tree.
In this embodiment, a possible implementation manner is provided in this application, where the optimization module 601 is configured to, when determining whether to insert an extended node at a next level of a node based on a feature list:
when the feature list and the original table corresponding to the node do not meet the preset matching condition, inserting an expansion node in the next level of the node;
and traversing to the next node when the feature list and the original table corresponding to the node meet the preset matching condition.
In this embodiment, a possible implementation manner is provided in this application, where the optimization module 601 is configured to, when determining whether to insert an extended node at a next level of a node based on a feature list:
when the feature list and the original list corresponding to the node do not meet the preset matching condition and the query task corresponding to the node is an aggregation task, decomposing the aggregation task into an intermediate aggregation task and a terminal aggregation task, and inserting an extension node between the intermediate aggregation task and the terminal aggregation task; the aggregation task is a grouping aggregation task of a non-grouping aggregation task or a non-table distribution field of a first field of a grouping key.
The embodiment of the application provides a possible implementation manner, and the matching conditions include:
The feature list comprises a distribution key corresponding to the original table, and the distribution mode of the original table corresponding to the node is any one of hash, modulo and copy.
The embodiment of the application provides a possible implementation manner, and the feature list is constructed based on the following manner:
acquiring a list of an original list corresponding to the node and binding parameters corresponding to the node;
and constructing a characteristic list of the node based on the list of the list and the binding parameters.
An embodiment of the present application provides a data processing apparatus, as shown in fig. 7, the data processing apparatus 70 may include: an acquisition module 701, and an execution module 704;
the acquiring module 701 is configured to acquire a sub-plan corresponding to the data node; wherein, the sub-plan is obtained by dividing the query plan tree by a coordination node of the database;
the execution module 702 is configured to execute the sub-plan, obtain an execution result of the sub-plan, and store the execution result in a preset storage address.
The apparatus of the embodiments of the present application may perform the method provided by the embodiments of the present application, and implementation principles of the method are similar, and actions performed by each module in the apparatus of each embodiment of the present application correspond to steps in the method of each embodiment of the present application, and detailed functional descriptions of each module of the apparatus may be referred to in the corresponding method shown in the foregoing, which is not repeated herein.
In the process of traversing the nodes of the query plan tree, determining the data node corresponding to each node through the feature list of each node, and determining whether to insert an expansion node in the next level of the node so as to obtain an optimized target query plan tree; the method comprises the steps of dividing a target query plan tree into a plurality of sub plans according to the expansion node, enabling the coordination node to distribute the sub plans to corresponding data nodes for execution, and obtaining a target execution result of the query plan tree. According to the embodiment of the application, each node can be mapped to different data nodes according to the characteristic list of each node of the query plan tree, and data scheduling is carried out according to the expansion nodes, so that a distributed query plan comprising a plurality of sub-plans is constructed; compared with the prior art that larger data volume transmission exists between the computing node and the storage node, the method and the device can issue each sub-plan to the corresponding data node for execution, can effectively reduce the data transmission volume between the data node and the coordination node, simultaneously, the data node participates in the execution of the sub-plan, can also reduce the computing load of the coordination node, effectively improve the query efficiency and improve the user experience.
The embodiment of the application provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of a data processing method, and compared with the related art, the method can realize the steps of the data processing method: in the process of traversing the nodes of the query plan tree, determining the data node corresponding to each node through the feature list of each node, and determining whether to insert an expansion node in the next level of the node so as to obtain an optimized target query plan tree; the method comprises the steps of dividing a target query plan tree into a plurality of sub plans according to the expansion node, enabling the coordination node to distribute the sub plans to corresponding data nodes for execution, and obtaining a target execution result of the query plan tree. According to the embodiment of the application, each node can be mapped to different data nodes according to the characteristic list of each node of the query plan tree, and data scheduling is carried out according to the expansion nodes, so that a distributed query plan comprising a plurality of sub-plans is constructed; compared with the prior art that larger data volume transmission exists between the computing node and the storage node, the method and the device can issue each sub-plan to the corresponding data node for execution, can effectively reduce the data transmission volume between the data node and the coordination node, simultaneously, the data node participates in the execution of the sub-plan, can also reduce the computing load of the coordination node, effectively improve the query efficiency and improve the user experience.
In an alternative embodiment, an electronic device is provided, as shown in fig. 8, the electronic device 80 shown in fig. 8 includes: a processor 801 and a memory 803. The processor 801 is coupled to a memory 803, such as via a bus 802. Optionally, the electronic device 80 may further comprise a transceiver 804, and the transceiver 804 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 804 is not limited to one, and the structure of the electronic device 80 is not limited to the embodiments of the present application.
The processor 801 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 801 may also be a combination of computing functions, e.g., including one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.
Bus 802 may include a path to transfer information between the aforementioned components. Bus 802 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or EISA (Extended Industry Standard Architecture ) bus, among others. Bus 802 may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 8, but not only one bus or one type of bus.
The Memory 803 may be, without limitation, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer.
The memory 803 is used for storing a computer program for executing the embodiments of the present application, and is controlled to be executed by the processor 801. The processor 801 is arranged to execute computer programs stored in the memory 803 to implement the steps shown in the foregoing method embodiments.
Among them, electronic devices include, but are not limited to: mobile terminals such as mobile phones, notebook computers, PADs, etc., and stationary terminals such as digital TVs, desktop computers, etc.
Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, may implement the steps and corresponding content of the foregoing method embodiments.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions such that the computer device performs:
traversing each node of the query plan tree from the leaf node of the query plan tree, and performing the following optimization operation on each node in the traversing process until traversing to the root node of the query plan tree to generate a target query plan tree;
Wherein the optimizing operation includes:
acquiring a characteristic list of the node; calculating data nodes corresponding to the nodes through a preset mapping function according to the feature list; determining whether to insert an extended node at a next level of the node based on the feature list; the feature list is used for providing data for the query task corresponding to the node; the expansion node is used for scheduling the query task corresponding to the node;
dividing the target query plan tree into a plurality of sub-plans based on the extended nodes;
distributing the multiple sub-plans to each data node for execution, and obtaining an execution result of each sub-plan;
and obtaining and collecting the execution results of each sub-plan to obtain the target execution result of the query plan tree.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the present application described herein may be implemented in other sequences than those illustrated or otherwise described.
It should be understood that, although the flowcharts of the embodiments of the present application indicate the respective operation steps by arrows, the order of implementation of these steps is not limited to the order indicated by the arrows. In some implementations of embodiments of the present application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages may be flexibly configured according to the requirement, which is not limited in the embodiment of the present application.
The foregoing is merely an optional implementation manner of the implementation scenario of the application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the application are adopted without departing from the technical ideas of the application, and also belong to the protection scope of the embodiments of the application.

Claims (10)

1. A data processing method, applied to a coordinator node of a database management system, the method comprising:
traversing each node of the query plan tree from a leaf node of the query plan tree, and performing the following optimization operation on each node in the traversing process until traversing to a root node of the query plan tree to generate a target query plan tree;
wherein the optimizing operation includes:
acquiring a feature list of the node; calculating data nodes corresponding to the nodes through a preset mapping function according to the feature list; determining whether to insert an extension node at a next level of the node based on the feature list; the feature list is used for providing data for the query task corresponding to the node; the expansion node is used for scheduling the query task corresponding to the node;
Dividing the target query plan tree into a plurality of sub-plans based on the expansion node;
distributing the multiple sub-plans to each data node for execution, and obtaining an execution result of each sub-plan;
and acquiring and collecting the execution results of each sub-plan to obtain a target execution result of the query plan tree.
2. The method of claim 1, wherein the determining whether to insert an extension node at a next level of the node based on the feature list comprises:
when the original table corresponding to the node and the feature list do not meet the preset matching condition, inserting the expansion node in the next level of the node;
and traversing to the next node when the feature list and the original table corresponding to the node meet the preset matching condition.
3. The method of claim 2, wherein the determining whether to insert an extension node at a next level of the node based on the feature list further comprises:
when the feature list and the original table corresponding to the node do not meet the preset matching condition and the query task corresponding to the node is an aggregation task, decomposing the aggregation task into an intermediate aggregation task and a terminal aggregation task, and inserting an expansion node between the intermediate aggregation task and the terminal aggregation task; the aggregation task is a grouping aggregation task of a non-grouping aggregation task or a non-table distribution field of a first field of a grouping key.
4. The method of claim 2, wherein the matching condition comprises:
the feature list comprises a distribution key corresponding to the original table, and the distribution mode of the original table corresponding to the node is any one of hash, modulo and copy.
5. The method of claim 1, wherein the feature list is constructed based on:
acquiring a list of an original list corresponding to the node and binding parameters corresponding to the node;
and constructing a characteristic list of the node based on the list and the binding parameters.
6. A data processing method for a data node of a database management system, the method comprising:
acquiring a sub-plan corresponding to the data node; wherein, the sub-plan is obtained by dividing a query plan tree by a coordination node of the database based on an expansion node;
executing the sub-plan to obtain an execution result of the sub-plan, and storing the execution result in a preset storage address.
7. A data processing apparatus, the apparatus comprising:
the optimizing module is used for traversing each node of the query plan tree from the leaf node of the query plan tree, and in the traversing process, carrying out the following optimizing operation on each node until traversing to the root node of the query plan tree to generate a target query plan tree;
Wherein the optimizing operation includes:
acquiring a feature list of the node; calculating data nodes corresponding to the nodes through a preset mapping function according to the feature list; determining whether to insert an extension node at a next level of the node based on the feature list; the feature list is used for providing data for the query task corresponding to the node; the expansion node is used for scheduling the query task corresponding to the node;
a partitioning module configured to partition the target query plan tree into a plurality of sub-plans based on the expansion node;
the distribution module is used for distributing the plurality of sub-plans to each data node for execution, and obtaining an execution result of each sub-plan;
and the collecting module is used for obtaining and collecting the execution results of each sub-plan to obtain the target execution result of the query plan tree.
8. A data processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring the sub-plan corresponding to the data node; wherein, the sub-plan is obtained by dividing a query plan tree by a coordination node of the database;
and the execution module is used for executing the sub-plan, obtaining an execution result of the sub-plan and storing the execution result in a preset storage address.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method according to any one of claims 1 to 6.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
CN202310165032.6A 2023-02-08 2023-02-08 Data processing method, device, electronic equipment and computer readable storage medium Pending CN116126901A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310165032.6A CN116126901A (en) 2023-02-08 2023-02-08 Data processing method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310165032.6A CN116126901A (en) 2023-02-08 2023-02-08 Data processing method, device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116126901A true CN116126901A (en) 2023-05-16

Family

ID=86301031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310165032.6A Pending CN116126901A (en) 2023-02-08 2023-02-08 Data processing method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116126901A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435594A (en) * 2023-12-18 2024-01-23 天津南大通用数据技术股份有限公司 Optimization method for distributed database distribution key
CN118210837A (en) * 2024-05-17 2024-06-18 北京力控元通科技有限公司 Data processing method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435594A (en) * 2023-12-18 2024-01-23 天津南大通用数据技术股份有限公司 Optimization method for distributed database distribution key
CN117435594B (en) * 2023-12-18 2024-04-16 天津南大通用数据技术股份有限公司 Optimization method for distributed database distribution key
CN118210837A (en) * 2024-05-17 2024-06-18 北京力控元通科技有限公司 Data processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109669934B (en) Data warehouse system suitable for electric power customer service and construction method thereof
US8316009B2 (en) Generating histograms of population data by scaling from sample data
US6879984B2 (en) Analytical database system that models data to speed up and simplify data analysis
CN116126901A (en) Data processing method, device, electronic equipment and computer readable storage medium
CN106528787B (en) query method and device based on multidimensional analysis of mass data
US9747349B2 (en) System and method for distributing queries to a group of databases and expediting data access
US7814045B2 (en) Semantical partitioning of data
US9135647B2 (en) Methods and systems for flexible and scalable databases
US9235621B2 (en) Data-aware scalable parallel execution of rollup operations
CN104239377A (en) Platform-crossing data retrieval method and device
CN112015741A (en) Method and device for storing massive data in different databases and tables
US11803550B2 (en) Workload-aware column imprints
CN111258978A (en) Data storage method
CN112506887B (en) Vehicle terminal CAN bus data processing method and device
CN108319604B (en) Optimization method for association of large and small tables in hive
CN116226250A (en) Convergence type management method and system for managing mass time sequence data in power generation field
CN115658680A (en) Data storage method, data query method and related device
CN112463904B (en) Mixed analysis method of distributed space vector data and single-point space data
CN104361090A (en) Data query method and device
CN117271578A (en) Data query method, visual data processing method and device, electronic equipment and storage medium
WO2008055202A2 (en) System and method for distributing queries to a group of databases and expediting data access
He et al. SLC-index: A scalable skip list-based index for cloud data processing
Taniar et al. Performance analysis of “Groupby-After-Join” query processing in parallel database systems
CN111260452A (en) Method and system for constructing tax big data model
CN117762949B (en) Data extraction method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination