CN114625761A - Optimization method, optimization device, electronic equipment and medium - Google Patents

Optimization method, optimization device, electronic equipment and medium Download PDF

Info

Publication number
CN114625761A
CN114625761A CN202210270656.XA CN202210270656A CN114625761A CN 114625761 A CN114625761 A CN 114625761A CN 202210270656 A CN202210270656 A CN 202210270656A CN 114625761 A CN114625761 A CN 114625761A
Authority
CN
China
Prior art keywords
node
query
child node
cost
deduplication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210270656.XA
Other languages
Chinese (zh)
Inventor
朱仲颖
万伟
韩朱忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dameng Database Co Ltd
Original Assignee
Shanghai Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dameng Database Co Ltd filed Critical Shanghai Dameng Database Co Ltd
Priority to CN202210270656.XA priority Critical patent/CN114625761A/en
Publication of CN114625761A publication Critical patent/CN114625761A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses an optimization method, an optimization device, electronic equipment and a medium. The method comprises the following steps: when the intermediate plan tree is generated based on the target structured query statement, generating a corresponding target level query node for the query statement meeting a preset condition, and setting a preset mark for the target level query node; when an optimal execution plan is generated based on the middle plan tree, determining a left child node and a right child node corresponding to a target level query node for the target level query node; and respectively optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node. Through the cost comparison results of the left child node and the right child node of the target level query node, the corresponding child nodes can be optimized, so that the resource waste caused by cleaning of repeated data generated by the bottom level query is avoided, and the level query performance and efficiency are improved.

Description

Optimization method, optimization device, electronic equipment and medium
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to an optimization method, an optimization device, electronic equipment and a medium.
Background
The hierarchical query is a database query function, and a data set of a tree structure is obtained by using a tree traversal mode to obtain a hierarchical relation report of the tree.
At present, when hierarchical query operation is performed in a database, when data having a hierarchical relationship with a selected row of data is queried, calculation needs to be performed on all data according to conditions on the right side of a query condition equal sign. When storing the calculated right-side data in the memory according to the hierarchical query condition, the right-side data is usually stored in a HASH (i.e., HASH) table because a lookup is required later. If the column repetition value specified by the hierarchical query condition is large, the in-slot conflict linked list of the HASH table is long. In the process, more repeated data can be cached in the HASH table, so that more repeated data also exists in a corresponding level query output result, redundant repeated data is removed by an upper layer deduplication node (i.e., the DISTINCT), and finally, only one unique value needs to be reserved.
In the process that a large amount of repeated data are generated by the whole inquiry of the bottom layer, and then the redundant repeated data are cleaned by the DISTINCT of the upper layer, unnecessary resource waste is caused for the repeated data cleaning generated by the inquiry of the bottom layer. Resulting in poor performance of the hierarchical query.
Disclosure of Invention
The embodiment of the invention provides an optimization method, an optimization device, electronic equipment and a medium, which are used for improving the performance of hierarchical query.
According to an aspect of the embodiments of the present invention, there is provided an optimization method, including:
when generating an intermediate plan tree based on a target structured query statement, generating a corresponding target level query node for the query statement meeting a preset condition, and setting a preset mark for the target level query node; the target structured query statement comprises a query statement, and the preset mark is used for representing that the target level query node supports deduplication optimization;
when an optimal execution plan is generated based on the middle plan tree, determining a left child node and a right child node corresponding to the target level query node for the target level query node;
optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node respectively; the cost comparison result is a comparison result of a first cost and a second cost, and the first cost is a cost obtained by calculation under the condition that the deduplication node is not added; the second cost is a cost calculated with the addition of a deduplication node.
According to another aspect of the embodiments of the present invention, there is provided an optimization apparatus, including:
the generating module is used for generating corresponding target level query nodes for the query sentences meeting preset conditions and setting preset marks for the target level query nodes when generating the intermediate plan tree based on the target structured query sentences; the target structured query statement comprises a query statement, and the preset mark is used for representing that the target level query node supports deduplication optimization;
the determining module is used for determining a left child node and a right child node corresponding to the target level query node for the target level query node when generating an optimal execution plan based on the middle plan tree;
the optimization module is used for optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node; the cost comparison result is a comparison result of a first cost and a second cost, and the first cost is a cost obtained by calculation under the condition that the deduplication node is not added; the second cost is a cost calculated with the addition of a deduplication node.
According to another aspect of the embodiments of the present invention, there is provided an electronic apparatus, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the optimization method according to any of the embodiments of the present invention.
According to another aspect of the embodiments of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the optimization method according to any one of the embodiments of the present invention when the computer instructions are executed.
According to the technical scheme of the embodiment of the invention, when the intermediate plan tree is generated based on the target structured query statement, the corresponding target level query node is generated for the query statement meeting the preset condition, and the preset mark is set for the target level query node; then, when an optimal execution plan is generated based on the middle plan tree, for a target level query node, determining a left child node and a right child node corresponding to the target level query node; finally, optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node; the cost comparison result is a comparison result of a first cost and a second cost, and the first cost is a cost obtained by calculation under the condition that the duplicate removal node is not added; the second cost is the cost calculated with the addition of a deduplication node. The method can further optimize the target level query node containing the preset mark when generating the optimal execution plan by setting the preset mark for the determined target level query node; in the process, the corresponding child nodes can be optimized by respectively comparing the cost of the left child node and the cost of the right child node of the target level query node, so that the resource waste caused by cleaning the repeated data generated by the bottom level query is avoided, and the performance and the efficiency of the level query are improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an optimization method according to an embodiment of the present invention;
FIG. 2 is a flow chart of an optimization method according to a second embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an optimization apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device implementing the optimization method according to the embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," "target," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of an optimization method, which is applicable to a case of optimizing a hierarchical query of a child node according to an embodiment of the present invention, and the method may be performed by an optimization apparatus, which may be implemented in a form of hardware and/or software, and the optimization apparatus may be configured in an electronic device. As shown in fig. 1, the method includes:
s110, when the intermediate plan tree is generated based on the target structured query statement, generating a corresponding target level query node for the query statement meeting the preset condition, and setting a preset mark for the target level query node.
In this embodiment, a Structured Query Language (SQL), which may also be referred to as SQL statement, is a database Query and programming Language, and may also refer to a relational database operation Language; it can be used for accessing data and querying, updating and managing a relational database system; and is also an extension of the database script file. The target structured query statement may be understood as a structured query statement containing a query statement input by a user, and the target structured query statement may include one query statement; the query statement may be understood as a statement beginning with a SELECT keyword in a target SQL statement. What is to be done with the corresponding database is described in the target structured query statement, but not specifically. Therefore, after receiving the target structured query statement, the database can generate an execution plan containing specific how to operate based on the target structured query statement. The execution plan may be implemented in the database as a "binary tree" of operators, i.e., a tree of execution plans. The intermediate plan tree may refer to an execution plan tree that is preliminarily generated based on the target structured query statement, so that the intermediate plan tree is subsequently optimized to obtain a final execution plan tree.
The preset condition may refer to a clause that includes a deduplication clause and a hierarchical query clause, but does not include a clause that requires the complete table data to obtain a correct result set. Where a query statement may include one or more clauses. The deduplication clause may refer to a clause with deduplication function, that is, data in the table is listed as a keyword, and only one redundant duplicate row data is deleted, which may be denoted as a DISTINCT clause. The hierarchical query clause may refer to a clause for querying a hierarchical relationship between data, and specifically, the hierarchical relationship between data may be obtained in the form of a CONNECT BY keyword. The clause which needs to rely on the complete table data to obtain the correct result set can be understood as the clause which needs to rely on the complete table data (i.e. the table data which is not deduplicated or deleted) to correctly execute the corresponding operation instruction to obtain the correct result set; for example, the clause that needs to rely on the complete table data to obtain the correct result set may be a clause including a set function, a paging function (i.e., rownum), an analysis function, or an uncertainty function, which is not limited herein.
Optionally, the optimization method further includes: determining the query statement meeting the preset condition according to the following modes: judging whether the query statement of the target structured query statement contains a deduplication clause and a hierarchical query clause; if the query sentence comprises the duplicate removal clause and the hierarchical query clause, judging whether the query sentence comprises a preset clause, wherein the preset clause is a clause which depends on the data of the complete table to obtain a correct result set; and if the query statement does not contain the preset clause, determining the query statement as the query statement meeting the preset condition.
The process of determining the query statement meeting the preset condition may be: judging whether the query statement of the target structured query statement contains a deduplication clause and a hierarchical query clause; then, if the query statement contains a duplicate removal clause and a hierarchical query clause, whether the query statement contains a preset clause can be judged, and if not, the query statement is not a sub-query statement meeting preset conditions; and finally, if the query statement does not contain the preset clause, wherein the preset clause can be understood as a clause which depends on the complete table data to obtain a correct result set, determining that the query statement is the query statement which meets the preset condition, otherwise, determining that the query statement is not the query statement which meets the preset condition.
Illustratively, the query statement A may be represented as: SELECT DISTINCT C1 FROM T1CONNECT BY PRIOR C1 ═ C2; where DISTINCT C1 may represent a deduplication clause and CONNECT BY may represent a hierarchical query clause. The query statement A does not contain a sub-query statement; if DISTINCT C1 is converted into DISTINCT (statement B beginning with SELECT keyword) on the basis of query statement A, then statement B is the sub-query statement of query statement A.
It is understood that the query statement may not include a sub-query statement, and may include one or more sub-query statements. And when judging whether the query statement meets the preset condition, similarly judging the sub-query statements contained in the query statement. If a certain sub-query statement in the query statement meets the preset condition, the query statement also meets the preset condition, but the sub-query statement meets the preset condition when the target level query node is generated.
In this embodiment, the target hierarchical query node may refer to a plan node for a hierarchical query generated for a query statement that meets a preset condition when an intermediate plan tree is generated based on a target structured query statement.
However, it should be noted that, if there are a plurality of sub-query statements all meeting the preset condition, each sub-query statement meeting the preset condition may generate a target hierarchical query node, which is not limited herein.
The preset mark can refer to a preset mark and can be used for representing that the target level query node supports deduplication optimization. Support for deduplication optimization may be understood as that deduplication of a deduplication clause may be supported during execution of the target hierarchical query node to implement corresponding optimization.
When the intermediate plan tree is generated based on the target structured query statement, for the query statement meeting the preset condition, a corresponding target level query node can be generated, and a corresponding preset mark is set for the target level query node. For example, a preset flag may be set only for the target hierarchical query node; on the basis of setting a preset mark for a target level query node, a mark can be also set for a non-target level query node (i.e. a level query node corresponding to a query statement or a sub-query statement which does not meet a preset condition) so as to represent that the non-target level query node does not support deduplication optimization; and it is understood that the label is different from the preset label of the target hierarchical query node.
S120, when an optimal execution plan is generated based on the middle plan tree, for the target level query node, determining a left child node and a right child node corresponding to the target level query node.
In this embodiment, the optimal execution plan may be understood as a final execution plan obtained by performing optimization based on the intermediate plan tree. In the process of generating the optimal execution plan based on the intermediate plan tree, whether the level query node in the intermediate plan tree is the target level query node can be judged according to whether the level query node contains the preset mark. If the preset mark is included, the level query node can be determined to be the target level query node, otherwise, the level query node is not the target level query node.
After the target hierarchical query node is determined, the next level of plan nodes connected with the target hierarchical query node can be determined as child nodes of the target hierarchical query node according to the binary tree structure relationship of the intermediate plan tree. In this process, since the target hierarchical query node has a function of querying the hierarchical relationship of the data, the child nodes of the target hierarchical query node are two table data, and here, the left and right sides of the middle plan tree can be used to distinguish the two child nodes, that is, the child nodes can be called as a left child node and a right child node, and one of the child nodes can correspond to one table data.
When the hierarchical query is carried out, one row of data is taken from the table data of the left child node to the table data of the right child node for the hierarchical query detection. The table data of the left child node and the table data of the right child node are determined according to the query statement, for example, the table data of the left child node and the table data of the right child node may be the same or different, and how to determine the table data of the left child node and the table data of the right child node according to the query statement is not specifically limited here.
Optionally, the intermediate planning tree includes one or more hierarchical query nodes, and the hierarchical query nodes are planning nodes generated based on query statements including hierarchical query clauses; for a target level query node, determining a left child node and a right child node corresponding to the target level query node, including: judging whether the hierarchical query node contains a preset mark or not aiming at each hierarchical query node; if yes, determining the level query node as a target level query node, and determining a left child node and a right child node corresponding to the target level query node.
Wherein, the intermediate plan tree may include one or more hierarchical query nodes, which may be understood as plan nodes generated based on query sentences containing hierarchical query clauses; it is understood that if the query statement includes a plurality of sub-query statements, the sub-query statement including the hierarchical query clause also generates a hierarchical query node, so that the intermediate plan tree may have a plurality of hierarchical query nodes. For a target hierarchical query node, the process of determining the left child node and the right child node corresponding to the target hierarchical query node may be: for each level query node, firstly, whether the level query node contains a preset mark can be judged; if the preset mark is included, the level query node can be determined to be a target level query node, and on the basis, a left child node and a right child node corresponding to the target level query node are determined based on the connection relation of the plan nodes in the middle plan tree.
S130, optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node.
In the present embodiment, the cost comparison result may be understood as a comparison result between the first cost and the second cost. The first cost can be understood as the cost calculated without adding a deduplication node; the second cost can be understood as the cost calculated in the case of adding a deduplication node. A deduplication node may be understood as a plan node generated based on a query statement containing a deduplication clause. The left child node corresponds to a first cost and a second cost, and the corresponding first cost and second cost can be determined based on the output column corresponding to the left child node. Similarly, the right child node corresponds to a first cost and a second cost, and the corresponding first cost and second cost can be determined based on the output column corresponding to the right child node.
Optionally, before optimizing the corresponding child node according to the cost comparison result corresponding to the left child node and the right child node, the method further includes: collecting a first output column of a left child node and a second output column of a right child node; a first cost and a second cost corresponding to the left child node are determined based on the first output column, and a first cost and a second cost corresponding to the right child node are determined based on the second output column.
Before optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node, a first cost and a second cost corresponding to the left child node and the right child node can be respectively determined. Specifically, output columns corresponding to the left child node (i.e., a first output column) and the right child node (i.e., a second output column) may be collected, respectively.
It can be understood that the left child node and the right child node correspond to a table data, the table data may include multiple rows of data, and each row of data may include multiple columns; the output columns of the child nodes may be determined based on the columns specified in the query condition of the hierarchical query clause and the columns specified in the deduplication condition of the deduplication clause, on the basis of which the output columns of the child nodes may be understood as table data composed of these determined columns. Since the query condition of the hierarchical query clause and the deduplication condition of the deduplication clause may be complex, as may not be for the same overall table data, the table data corresponding to the left child node and the right child node may be different.
A first cost and a second cost corresponding to the left child node may be determined based on the first output column, and a first cost and a second cost corresponding to the right child node may be determined based on the second output column.
Optionally, determining the first cost and the second cost corresponding to the left child node based on the first output column includes: determining a first duplication removal rate corresponding to the first output column based on the first output column; under the condition that a first duplicate removal node is not added, determining a first Central Processing Unit (CPU) and a memory utilization rate when executing the hierarchical query operation, and determining a first cost corresponding to a left child node according to the first duplicate removal rate and the first CPU and memory utilization rate; and under the condition of adding the first duplicate removal node, determining the second CPU and the memory utilization rate when the hierarchical query operation is executed, and determining a second cost corresponding to the left child node according to the first duplicate removal rate and the second CPU and memory utilization rate.
The first deduplication ratio may refer to a ratio of the number of repeated rows of data in the first output column to the total number of rows of data in the first output column. If the first output column includes 100 row data in total, and the number of repeated row data is 30, the first deduplication rate may be 30% at this time. The first deduplication node may refer to a plan node for deduplication generated based on the duplicate row data of the first output column. The first CPU and memory usage rate may be understood as a resource usage rate of the CPU and the memory occupied when the hierarchical query operation is performed without increasing the first deduplication node. Accordingly, the second CPU and memory usage rate may be understood as the resource usage rate of the CPU and memory occupied when executing the hierarchical query operation with the addition of the first deduplication node. Performing a hierarchical query operation may be understood as an overall hierarchical query operation based on the deduplication clause and the hierarchical query clause in the sub-query sentences that meet the preset condition.
The process of determining the first cost and the second cost corresponding to the left child node based on the first output column may be: based on the first output column, a first deduplication rate corresponding to the first output column is determined. On the basis, under the condition that a first duplicate removal node is not added, determining a first CPU and a memory utilization rate when the hierarchical query operation is executed, and determining a first cost corresponding to a left child node according to the first duplicate removal rate and the first CPU and memory utilization rate; and under the condition of adding the first duplicate removal node, determining the second CPU and the memory utilization rate when the hierarchical query operation is executed, and determining a second cost corresponding to the left child node according to the first duplicate removal rate and the second CPU and memory utilization rate. There is no specific limitation on how to determine the first cost corresponding to the left child node according to the first deduplication rate and the first CPU and memory usage rate, and how to determine the second cost corresponding to the left child node according to the first deduplication rate and the second CPU and memory usage rate.
Optionally, determining the first cost and the second cost corresponding to the right child node based on the second output column includes: determining a second duplication removal rate corresponding to the second output column based on the second output column; under the condition that a second duplicate removal node is not added, determining a third CPU and a memory utilization rate when the hierarchical query operation is executed, and determining a first cost corresponding to a right child node according to the second duplicate removal rate and the third CPU and memory utilization rate; and under the condition of adding a second duplicate removal node, determining the fourth CPU and the memory utilization rate when executing the hierarchical query operation, and determining a second cost corresponding to the right child node according to the second duplicate removal rate and the fourth CPU and memory utilization rate.
The second deduplication ratio may be a ratio of the number of repeated rows of data in the second output column to the total number of rows of data in the second output column. The second deduplication node may refer to a plan node for deduplication generated based on the repeated row data of the second output column. The third CPU and memory usage rate may be understood as a resource usage rate of the CPU and the memory occupied when the hierarchical query operation is performed without adding the second deduplication node. Accordingly, the fourth CPU and memory usage rate may be understood as the resource usage rate of the CPU and the memory occupied when the hierarchical query operation is performed with the addition of two deduplication nodes.
The process of determining the first cost and the second cost corresponding to the right child node based on the second output column may be: based on the second output column, a second deduplication rate corresponding to the second output column may be determined. On the basis, under the condition that a second duplicate removal node is not added, determining the utilization rate of a third CPU and a memory when the hierarchical query operation is executed, and determining a first cost corresponding to the right child node according to the second duplicate removal rate and the utilization rate of the third CPU and the memory; and under the condition of adding a second duplicate removal node, determining the fourth CPU and the memory utilization rate when executing the hierarchical query operation, and determining a second cost corresponding to the right child node according to the second duplicate removal rate and the fourth CPU and memory utilization rate. How to determine the first cost corresponding to the right child node according to the second deduplication rate and the third CPU and memory usage rate, and how to determine the second cost corresponding to the right child node according to the second deduplication rate and the fourth CPU and memory usage rate are not specifically limited herein.
In this embodiment, after determining the first cost and the second cost corresponding to the left child node, the left child node may be optimized according to the corresponding cost comparison result. For example, if the first price corresponding to the left child node is greater than the second price corresponding to the left child node, it may be indicated that adding the first deduplication node optimizes the execution of the corresponding execution plan, and at this time, the first deduplication node may be added between the target hierarchical query node and the left child node; otherwise, exiting the optimization.
Correspondingly, after the first cost and the second cost corresponding to the right child node are determined, the right child node can be optimized according to the corresponding cost comparison result. For example, if the first price corresponding to the right child node is greater than the second price corresponding to the right child node, it may be indicated that adding the second deduplication node optimizes the execution of the corresponding execution plan, and at this time, the second deduplication node may be added between the target hierarchical query node and the right child node; otherwise, exiting the optimization.
The first embodiment provides an optimization method, which includes the steps of firstly, when an intermediate plan tree is generated based on a target structured query statement, generating corresponding target level query nodes for the query statement meeting preset conditions, and setting preset marks for the target level query nodes; then, when an optimal execution plan is generated based on the middle plan tree, for a target level query node, determining a left child node and a right child node corresponding to the target level query node; finally, optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node; the cost comparison result is a comparison result of a first cost and a second cost, and the first cost is a cost obtained by calculation under the condition that the duplicate removal node is not added; the second cost is the cost calculated with the addition of a deduplication node. The method can further optimize the target level query node containing the preset mark when generating the optimal execution plan by setting the preset mark for the determined target level query node; in the process, the corresponding child nodes can be optimized by respectively comparing the cost of the left child node and the cost of the right child node of the target level query node, so that the resource waste caused by cleaning the repeated data generated by the bottom level query is avoided, and the performance and the efficiency of the level query are improved.
Example two
Fig. 2 is a flowchart of an optimization method according to a second embodiment of the present invention, which is further detailed based on the above-mentioned embodiment. In this embodiment, the process of determining the first cost and the second cost corresponding to the left child node and the right child node, respectively, and optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node is specifically described. As shown in fig. 2, the method includes:
s210, when the intermediate plan tree is generated based on the target structured query statement, generating a corresponding target level query node for the query statement meeting the preset condition, and setting a preset mark for the target level query node.
In this embodiment, the target structured query statement may include a query statement, and the preset flag may be used to characterize the target hierarchical query node to support deduplication optimization.
S220, when the optimal execution plan is generated based on the middle plan tree, for the target level query node, determining a left child node and a right child node corresponding to the target level query node.
In this embodiment, the intermediate plan tree may include one or more hierarchical query nodes, which may be understood as plan nodes generated based on a query statement containing a hierarchical query clause. When the optimal execution plan is generated based on the middle plan tree, the level query node containing the preset mark is determined as a target level query node aiming at each level query node, and a left child node and a right child node corresponding to the target level query node are determined on the basis.
And S230, collecting a first output column of the left child node and a second output column of the right child node.
S240, determining a first cost and a second cost corresponding to the left child node based on the first output column, and determining a first cost and a second cost corresponding to the right child node based on the second output column.
S250, judging whether the first price corresponding to the left child node is larger than the second price corresponding to the left child node, if so, executing S260; if not, executing S290.
In this embodiment, based on the first cost and the second cost corresponding to the left child node, it is determined whether the first cost corresponding to the left child node is greater than the second cost corresponding to the left child node, if yes, S260 may be executed, and a first deduplication node is added between the target hierarchical query node and the left child node; if the first cost corresponding to the left child node is less than or equal to the second cost corresponding to the left child node, it can be shown that adding the first deduplication node does not optimize the execution plan, but rather makes the execution cost of the execution plan larger, and at this time, the optimization can be exited, that is, the original execution plan of the left child node is maintained.
And S260, adding a first deduplication node between the target hierarchical query node and the left child node for optimizing the left child node, and finishing the optimization.
In this embodiment, the left child node is optimized by adding the first deduplication node between the target hierarchical query node and the left child node, and after the addition is completed, the optimization of the left child node is completed.
Optionally, adding a first deduplication node between the target hierarchical query node and the left child node includes: newly building a first deduplication node, and taking the first output column as a deduplication column corresponding to the first deduplication node; and adding the first deduplication node between the target level query node and the left child node, setting the first deduplication node as a child node of the target level query node, and setting the left child node as a child node of the first deduplication node.
Specifically, a new first deduplication node is created, where the first output rank can be used as a deduplication rank corresponding to the first deduplication node, and the deduplication rank can be understood as a rank targeted when deduplication operation is performed. For example, if the columns included in the first output column are c1 and c2, the deduplication columns are c1 and c2, and the deduplication processing is performed on the data of the c1 and c2 columns in each row of data. On the basis, adding the first deduplication node between the target level query node and the left child node can be understood as inserting the first deduplication node as a layer between two layers, namely the target level query node and the left child node, in the middle plan tree; the first deduplication node is set as a child node of the target hierarchical query node (i.e., the first deduplication node is at a next level of the target hierarchical query node), and the left child node is set as a child node of the first deduplication node (i.e., the left child node is at a next level of the first deduplication node).
S270, judging whether the first price corresponding to the right child node is larger than the second price corresponding to the right child node, and if so, executing S280; if not, executing S290.
In this embodiment, based on the first cost and the second cost corresponding to the right child node, it is determined whether the first cost corresponding to the right child node is greater than the second cost corresponding to the right child node, if yes, S280 may be executed, and a second deduplication node is added between the target hierarchical query node and the right child node; if the first cost corresponding to the right child node is less than or equal to the second cost corresponding to the right child node, it can be shown that adding the second deduplication node does not optimize the execution plan, but rather makes the execution cost of the execution plan larger, and at this time, the optimization can be exited, that is, the original right child node execution plan is maintained.
S280, adding a second deduplication node between the target hierarchical query node and the right child node for optimizing the right child node, and finishing the optimization.
In this embodiment, the right child node is optimized by adding a second deduplication node between the target hierarchical query node and the right child node, and after the addition is completed, the optimization of the right child node is completed.
Optionally, adding a second deduplication node between the target hierarchical query node and the right child node includes: newly building a second deduplication node, and taking the second output column as a deduplication column corresponding to the second deduplication node; and adding a second deduplication node between the target level query node and the right child node, setting the second deduplication node as the child node of the target level query node, and setting the right child node as the child node of the second deduplication node.
Accordingly, adding a second deduplication node between the target hierarchical query node and the right child node may be: and newly building a second deduplication node, and taking the second output column as a deduplication column corresponding to the second deduplication node. On the basis, the second deduplication node is added between the target hierarchical query node and the right child node, namely that the second deduplication node is inserted as a layer between two layers, namely the target hierarchical query node and the right child node, in the middle plan tree. And sets the second deduplication node as a child node of the target hierarchical query node (i.e., the second deduplication node is at a next level of the target hierarchical query node), and sets the right child node as a child node of the second deduplication node (i.e., the right child node is at a next level of the second deduplication node).
And S290, exiting the optimization.
In this embodiment, exiting optimization may be understood as not changing the execution plan corresponding to the original child node, and continuing to maintain the original execution plan to perform corresponding plan execution.
The second embodiment provides an optimization method, which embodies the process of determining the first cost and the second cost corresponding to the left child node and the right child node respectively, and optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node respectively. The method can determine whether corresponding deduplication nodes need to be added or not to optimize execution plans of the left child node and the right child node by comparing cost results before and after the deduplication nodes are added aiming at the left child node and the right child node respectively, so that execution performance and efficiency of hierarchical query are improved.
The invention is illustrated below:
the hierarchical query clause is a way to obtain the hierarchical relationship between data through the CONNECT BY keyword. Keywords may be understood as query conditions.
The DISTINCT clause is a deduplication function clause, and columns contained in the query items are used as keywords (i.e. deduplication conditions) to perform deduplication processing and output.
For statements that both DISTINCT and CONNECT BY exist, the following:
CREATE TABLE T1(C1 INT, C2 INT); (i.e., creating Table T1, Table T1 includes columns C1 and C2)
INSERT INTO T1 VALUES(3,4);
INSERT INTO T1 VALUES(2,3);
INSERT INTO T1 VALUES(2,3);
INSERT INTO T1 VALUES (1, 2); (i.e. insert 4 line data into table T1)
COMMIT;
SELECT DISTINCT C1 FROM T1 CONNECT BY PRIOR C1=C2;
Where PRIOR is used to designate the parent node, and C1 is the parent node. For example, if C1 is PRIOR C2, then C2 is the parent node.
An execution plan is derived that resembles the following:
node 1# NSET2: [1,1,8 ]; (i.e., the set of results output)
Node 2# PRJT2: [1,1,8 ]; exp _ num (1), is _ atom (false);
node 3# DISTINCT: [1,1,8] KEY (C1); (KEY (C1) stands for deduplication with C1 column as deduplication column)
Node 4# HIERARCHICAL QUERY (UNIQUE) [1,1,8 ]; KEY _ NUM (1); (i.e., hierarchical query plan nodes)
Node 5# CSCN2: [1,1,8 ]; INDEX33559824 (T1); (i.e., the table data of the left child node is the table T1)
Node 6# CSCN2: [1,1,8 ]; INDEX33559824 (T1); (i.e., the table data of the right child node is the table T1)
The hierarchical query detects the data of the left child node (the 5 th node) to the right child node (the 6 th node), and outputs a record meeting the PRIOR condition, and the execution flow is as follows:
1. taking a row of data from the left child node as driving row data to the right child node for detection, obtaining records meeting PRIOR conditions, and detecting a second row (2,3) and a third row (2,3) when processing the first row of data (3, 4); specifically, based on the hierarchical query condition of PRIOR C1 — C2, C2 column of row data that is the same as C1 column (i.e., value 3) of the first row of data (3,4), i.e., the second row (2,3) and the third row (2,3), are probed from the right child node, and C2 column of the row data is both value 3. And skipping to the step 2 for caching.
2. The records satisfying the condition (i.e., (2,3)) are cached, and the result is output to the parent layer (i.e., (3,4), (2,3)) which may be referred to as node 1.
3. Continuously detecting two lines of data (2,3) and (2,3) of the cache line by line to the right child node; specifically, the row data (2,3) is continuously detected to the right child node, so that records meeting the PRIOR condition, namely the row data (1,2), are obtained, and the steps 2 and 3 are continuously repeated; continuing to detect the next row data (2,3) to the right child node to obtain records meeting the PRIOR condition, namely row data (1,2), and continuing to repeat the steps 2 and 3; until the buffered data is processed. And (3) stopping the detection by taking the first row data (3,4) as the driving row data at this time because no matched data exists from (1,2) to the right child node.
4. And (3) taking the next row data (namely the second row data (2,3)) of the first row data (3,4) from the left child node to the right child node for detection, obtaining a record meeting the PRIOR condition, and repeating the steps 2,3 and 4 until the row data of the left child node is processed. When the left child node takes the second row of data (2,3), the processing flow is actually repeated with the previous first row of data (3, 4).
However, the above planning process has a problem, when the data of the right child node is stored in the memory according to the hierarchical query condition, because the data needs to be searched later, the obtained record meeting the PRIOR condition is generally put into a HASH (i.e., HASH) table, where PRIOR C1 is C2, that is, the hierarchical query key is C2 column. Given the large number of repeated values of C2, the in-slot collision linked list of the HASH table is long. When the process is executed, a large amount of repeated data can be cached, more repeated data exist in an output result, redundant repeated data are removed by the upper-layer DISTINCT node, and finally, only one value is reserved. In the whole process, the bottom layer is inquired to generate a plurality of repeated data, and then the upper layer cleans the repeated data, so that the performance is low.
Aiming at the problems, the embodiment of the invention provides a solution that DISTINCT is used in the upper layer and is also put down to the two sides (namely a left child node and a right child node) of the bottom layer level query, so that the row data on the two sides of the level query are removed in duplicate first, and when the data are saved and the matched data are detected, the scale of the middle data can be reduced, the driving number of the driving row and the detection number can be reduced, thereby avoiding the unnecessary waste of a memory and a CPU, and improving the performance.
The query plans may become similar after the above statements are optimized using the method of embodiments of the present invention:
node 1# NSET2: [1,1,8 ];
node 2# PRJT2: [1,1,8 ]; exp _ num (1), is _ atom (false);
node 3# DISTINCT [1,1,8] KEY (C1);
node 4# HIERARCHICAL QUERY (UNIQUE) [1,1,8 ]; KEY _ NUM (1);
node 5# DISTINCT [1,1,8] KEY (C1, C2);
node 6# CSCN2: [1,1,8 ]; INDEX33559824 (T1);
node 7# DISTINCT [1,1,8] KEY (C1, C2);
node 8# CSCN2: [1,1,8 ]; INDEX33559824 (T1);
the KEY of the newly added DISTINCT operator (node 5, node 7) should be set to all output columns of the original left (node 6) and right (node 8) nodes of the hierarchical query, i.e. the union of the hierarchical query condition PRIOR C1 ═ C2 and the query term C1, i.e. C1, C2 in the given example. In addition, before and after adding the new DISTINCT, the corresponding execution cost can be determined by estimating the duplication removal rate of the DISTINCT and the utilization rate of a CPU and a memory, and whether new DISTINCT nodes are added to the left side and the right side (namely, left child nodes and right child nodes) of the bottom-layer level query or not is considered according to performance saving and cost brought by the cost, so that the method can adapt to more data scenes and achieve a better performance optimization effect. The final result of adding DISTINCT nodes may be two-sided, one-sided, or 0-sided, depending on the actual situation.
Furthermore, the above example is a simple statement, and the above optimization can be applied to clauses with multi-table join or other complex forms on both sides of the hierarchical query.
The embodiment of the invention is based on the prior relational database, and further optimizes the SQL sentences input by the user after lexical, grammatical and semantic analysis is carried out on the SQL sentences and analyzed results (namely semantic analysis results) are successfully obtained. The method may comprise the steps of:
and (3) an intermediate planning tree generation stage:
1) based on the semantic analysis result, the sentence must contain a DISTINCT clause (namely a deduplication clause) and a hierarchical query clause, otherwise, the optimization is finished, and the optimization is quitted. Continuing the step 2;
2) the statement can not contain all clauses and keywords which can depend on complete data to obtain a correct result set, such as a set function, a rownum, an analysis function, an uncertain function and the like, otherwise, the optimization is finished.
Continuing to the step 3;
3) a hierarchical query plan node (i.e., a target hierarchical query node) is generated, and a flag F1 is set to TRUE (i.e., a preset flag) for the target hierarchical query node, indicating that the current hierarchical query supports DISTINCT optimization. And generating the intermediate plan tree according to the original logic. Continuing to step 4;
and (3) generating an execution plan stage:
4) when the intermediate plan tree generated based on the above conditions is tried to obtain the optimum plan in various tree forms, if the flag F1 is TRUE when the hierarchical query node is processed, the hierarchical query nodes are sequentially processed to query the left and right child nodes. And collecting output columns of the child nodes, and judging whether the child node on the side needs to be added with new DISTINCT processing or not according to the result that the cost (namely the first cost) calculated without DISTINCT and the cost (namely the second cost) calculated with DISTINCT are larger and smaller. The addition of the new DISTINCT may make the cost less, and it is considered necessary to add DISTINCT, and vice versa. If no new DISTINCT is needed, the optimization is exited, and if new DISTINCT is needed, the step 5 is continued;
5) newly building a DISTINCT node, taking the output columns of the left child node and the right child node of the hierarchical query as duplication eliminating columns, and executing a step 6;
6) adding the DISTINCT node between the target level query node and the corresponding child node, and setting the parent-child relationship between the nodes: the target level query child nodes are DISTINCT nodes, and the child nodes of DISTINCT nodes are the original child nodes (i.e., the original left child and right child nodes) of the target level query. And (6) finishing the optimization.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an optimization apparatus according to a third embodiment of the present invention. As shown in fig. 3, the apparatus includes: a generation module 310, a determination module 320, and an optimization module 330;
a generating module 310, configured to, when generating an intermediate plan tree based on a target structured query statement, generate a corresponding target level query node for a query statement meeting a preset condition, and set a preset flag for the target level query node; the target structured query statement comprises a query statement, and the preset mark is used for representing that the target level query node supports deduplication optimization;
a determining module 320, configured to determine, for the target hierarchical query node, a left child node and a right child node corresponding to the target hierarchical query node when generating an optimal execution plan based on the middle plan tree;
the optimization module 330 is configured to optimize the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node; the cost comparison result is a comparison result of a first cost and a second cost, and the first cost is a cost obtained by calculation under the condition that the deduplication node is not added; the second cost is a cost calculated with the addition of a deduplication node.
The device firstly generates a corresponding target level query node for a query statement meeting a preset condition and sets a preset mark for the target level query node through a generating module 310 when generating an intermediate plan tree based on the target structured query statement; then, through the determining module 320, when the optimal execution plan is generated based on the middle plan tree, for the target level query node, determining a left child node and a right child node corresponding to the target level query node; finally, the optimization module 330 optimizes the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node; the cost comparison result is a comparison result of a first cost and a second cost, and the first cost is calculated under the condition that the duplicate removal node is not added; the second cost is the cost calculated with the addition of a deduplication node. The device can further optimize the target level query node containing the preset mark when generating the optimal execution plan by setting the preset mark for the determined target level query node; in the process, the corresponding child nodes can be optimized by respectively comparing the cost of the left child node and the cost of the right child node of the target level query node, so that the resource waste caused by cleaning of repeated data generated by the bottom level query is avoided, and the level query performance and efficiency are improved.
Optionally, the apparatus further comprises:
determining the query statement meeting the preset condition according to the following modes:
the first judgment module is used for judging whether the query sentence of the target structured query sentence contains the deduplication clause and the hierarchical query clause;
a second judging module, configured to judge whether the query statement includes a preset clause if the query statement includes a deduplication clause and a hierarchical query clause, where the preset clause is a clause that depends on complete table data to obtain a correct result set;
and the statement determining module is used for determining the query statement as a query statement meeting preset conditions if the query statement does not contain the preset clause.
Optionally, the intermediate planning tree includes one or more hierarchical query nodes, where the hierarchical query nodes are planning nodes generated based on a query statement containing a hierarchical query clause;
the determining module 320 specifically includes:
the system comprises a mark judging unit, a judging unit and a judging unit, wherein the mark judging unit is used for judging whether each level query node contains a preset mark or not;
and if so, determining the hierarchical query node as a target hierarchical query node, and determining a left child node and a right child node corresponding to the target hierarchical query node.
Optionally, the apparatus further includes:
a collecting module, configured to collect a first output column of the left child node and a second output column of the right child node before optimizing the corresponding child nodes according to cost comparison results corresponding to the left child node and the right child node, respectively;
and the cost determining module is used for determining a first cost and a second cost corresponding to the left child node based on the first output column, and determining a first cost and a second cost corresponding to the right child node based on the second output column.
Optionally, the optimizing module 330 is specifically configured to:
if the first price corresponding to the left child node is less than or equal to the second price corresponding to the left child node, exiting the optimization; otherwise, adding a first deduplication node between the target hierarchical query node and the left child node for optimizing the left child node;
if the first price corresponding to the right child node is less than or equal to the second price corresponding to the right child node, exiting the optimization; otherwise, adding a second deduplication node between the target hierarchical query node and the right child node for optimizing the right child node.
Optionally, a first deduplication node is added between the target hierarchical query node and the left child node, and is specifically configured to:
newly building a first deduplication node, and taking the first output column as a deduplication column corresponding to the first deduplication node;
adding the first deduplication node between the target hierarchical query node and the left child node, setting the first deduplication node as a child node of the target hierarchical query node, and setting the left child node as a child node of the first deduplication node;
and adding a second deduplication node between the target hierarchical query node and the right child node, comprising:
newly building a second deduplication node, and taking the second output column as a deduplication column corresponding to the second deduplication node;
and adding the second deduplication node between the target hierarchical query node and the right child node, setting the second deduplication node as the child node of the target hierarchical query node, and setting the right child node as the child node of the second deduplication node.
Optionally, the cost determining module is specifically configured to:
determining a first duplicate removal rate corresponding to the first output column based on the first output column;
under the condition that a first duplicate removal node is not added, determining the utilization rate of a first Central Processing Unit (CPU) and a memory when the hierarchical query operation is executed, and determining a first cost corresponding to the left child node according to the first duplicate removal rate and the utilization rate of the first CPU and the memory;
and under the condition of adding a first duplicate removal node, determining the utilization rate of a second CPU and a memory when executing the hierarchical query operation, and determining a second cost corresponding to the left child node according to the first duplicate removal node and the utilization rate of the second CPU and the memory.
Optionally, the cost determining module is further specifically configured to:
determining a second duplication removal rate corresponding to the second output column based on the second output column;
under the condition that a second duplicate removal node is not added, determining a third CPU and a memory utilization rate when the hierarchical query operation is executed, and determining a first cost corresponding to the right child node according to the second duplicate removal rate and the third CPU and memory utilization rate;
and under the condition of adding a second duplicate removal node, determining the fourth CPU and the memory utilization rate when executing the hierarchical query operation, and determining a second cost corresponding to the right child node according to the second duplicate removal rate and the fourth CPU and memory utilization rate.
The optimization device provided by the embodiment of the invention can execute the optimization method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
FIG. 4 shows a schematic block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device 10 may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM)12, a Random Access Memory (RAM)13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM)12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The processor 11 performs the various methods and processes described above, such as the optimization method.
In some embodiments, the optimization method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the optimization method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the optimization method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A method of optimization, the method comprising:
when generating an intermediate plan tree based on a target structured query statement, generating a corresponding target level query node for the query statement meeting a preset condition, and setting a preset mark for the target level query node; the target structured query statement comprises a query statement, and the preset mark is used for representing that the target level query node supports deduplication optimization;
when an optimal execution plan is generated based on the middle plan tree, determining a left child node and a right child node corresponding to the target level query node for the target level query node;
optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node respectively; the cost comparison result is a comparison result of a first cost and a second cost, and the first cost is a cost obtained by calculation under the condition that the deduplication node is not added; the second cost is a cost calculated with the addition of a deduplication node.
2. The method of claim 1, further comprising:
determining the query statement meeting the preset condition according to the following modes:
judging whether the query sentence of the target structured query sentence contains a de-duplication clause and a hierarchical query clause;
if the query statement comprises a duplicate removal clause and a hierarchical query clause, judging whether the query statement comprises a preset clause, wherein the preset clause is a clause which depends on complete table data to obtain a correct result set;
and if the query statement does not contain the preset clause, determining the query statement as a query statement meeting preset conditions.
3. The method of claim 1, wherein the intermediate plan tree comprises one or more hierarchical query nodes, the hierarchical query nodes being plan nodes generated based on a query statement containing a hierarchical query clause;
for the target hierarchical query node, determining a left child node and a right child node corresponding to the target hierarchical query node includes:
judging whether each level inquiry node contains a preset mark or not;
if yes, determining the level query node as a target level query node, and determining a left child node and a right child node corresponding to the target level query node.
4. The method of claim 1, further comprising, before optimizing the corresponding child node according to the cost comparison results corresponding to the left child node and the right child node, respectively:
collecting a first output column of the left child node and a second output column of the right child node;
and determining a first cost and a second cost corresponding to the left child node based on the first output column, and determining a first cost and a second cost corresponding to the right child node based on the second output column.
5. The method of claim 4, wherein optimizing the corresponding child node according to the cost comparison results corresponding to the left child node and the right child node respectively comprises:
if the first price corresponding to the left child node is less than or equal to the second price corresponding to the left child node, exiting the optimization; otherwise, adding a first deduplication node between the target hierarchical query node and the left child node for optimizing the left child node;
if the first price corresponding to the right child node is less than or equal to the second price corresponding to the right child node, the optimization is exited; otherwise, adding a second deduplication node between the target hierarchical query node and the right child node for optimizing the right child node.
6. The method of claim 5, wherein adding a first deduplication node between the target hierarchical query node and the left child node comprises:
newly building a first deduplication node, and taking the first output column as a deduplication column corresponding to the first deduplication node;
adding the first deduplication node between the target hierarchical query node and the left child node, setting the first deduplication node as a child node of the target hierarchical query node, and setting the left child node as a child node of the first deduplication node;
and adding a second deduplication node between the target hierarchical query node and the right child node, comprising:
newly building a second deduplication node, and taking the second output column as a deduplication column corresponding to the second deduplication node;
and adding the second deduplication node between the target hierarchical query node and the right child node, setting the second deduplication node as the child node of the target hierarchical query node, and setting the right child node as the child node of the second deduplication node.
7. The method of claim 4, wherein determining the first cost and the second cost corresponding to the left child node based on the first output column comprises:
determining a first duplicate removal rate corresponding to the first output column based on the first output column;
under the condition that a first duplicate removal node is not added, determining the utilization rate of a first Central Processing Unit (CPU) and a memory when the hierarchical query operation is executed, and determining a first cost corresponding to the left child node according to the first duplicate removal rate and the utilization rate of the first CPU and the memory;
and under the condition of adding a first duplicate removal node, determining the utilization rate of a second CPU and a memory when executing the hierarchical query operation, and determining a second cost corresponding to the left child node according to the first duplicate removal node and the utilization rate of the second CPU and the memory.
8. The method of claim 4, wherein determining the first cost and the second cost corresponding to the right child node based on the second output column comprises:
determining a second duplication removal rate corresponding to the second output column based on the second output column;
under the condition that a second duplicate removal node is not added, determining a third CPU and a memory utilization rate when a hierarchical query operation is executed, and determining a first cost corresponding to the right child node according to the second duplicate removal rate and the third CPU and memory utilization rate;
and under the condition of adding a second duplicate removal node, determining the fourth CPU and the memory utilization rate when executing the hierarchical query operation, and determining a second cost corresponding to the right child node according to the second duplicate removal rate and the fourth CPU and memory utilization rate.
9. An optimization device, comprising:
the generating module is used for generating corresponding target level query nodes for the query sentences meeting preset conditions and setting preset marks for the target level query nodes when generating the intermediate plan tree based on the target structured query sentences; the target structured query statement comprises a query statement, and the preset mark is used for representing that the target level query node supports deduplication optimization;
the determining module is used for determining a left child node and a right child node corresponding to the target level query node for the target level query node when generating an optimal execution plan based on the middle plan tree;
the optimization module is used for optimizing the corresponding child nodes according to the cost comparison results corresponding to the left child node and the right child node; the cost comparison result is a comparison result of a first cost and a second cost, and the first cost is a cost obtained by calculation under the condition that the deduplication node is not added; the second cost is a cost calculated with the addition of a deduplication node.
10. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the optimization method of any one of claims 1-8.
11. A computer-readable storage medium storing computer instructions for causing a processor to perform the optimization method of any one of claims 1-8 when executed.
CN202210270656.XA 2022-03-18 2022-03-18 Optimization method, optimization device, electronic equipment and medium Pending CN114625761A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210270656.XA CN114625761A (en) 2022-03-18 2022-03-18 Optimization method, optimization device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210270656.XA CN114625761A (en) 2022-03-18 2022-03-18 Optimization method, optimization device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN114625761A true CN114625761A (en) 2022-06-14

Family

ID=81902954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210270656.XA Pending CN114625761A (en) 2022-03-18 2022-03-18 Optimization method, optimization device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114625761A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115033595A (en) * 2022-08-10 2022-09-09 杭州悦数科技有限公司 Query statement processing method, system, device and medium based on super node

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115033595A (en) * 2022-08-10 2022-09-09 杭州悦数科技有限公司 Query statement processing method, system, device and medium based on super node

Similar Documents

Publication Publication Date Title
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
JPH11134366A (en) Method and device for estimating number of singulra values of data base
CN114625761A (en) Optimization method, optimization device, electronic equipment and medium
CN113722600B (en) Data query method, device, equipment and product applied to big data
CN115576983A (en) Statement generation method and device, electronic equipment and medium
CN108959454B (en) Prompting clause specifying method, device, equipment and storage medium
CN111984625B (en) Database load characteristic processing method and device, medium and electronic equipment
CN117807121A (en) Knowledge graph-based power grid knowledge correlation method, device, equipment and medium
CN116578646A (en) Time sequence data synchronization method, device, equipment and storage medium
CN116401281A (en) SQL query time prediction method, device, equipment and medium
CN115080607A (en) Method, device, equipment and storage medium for optimizing structured query statement
CN115658869A (en) Entity retrieval method, entity retrieval device, electronic equipment and storage medium
CN112307050A (en) Identification method and device for repeated correlation calculation and computer system
CN116303578A (en) Query expression processing method, device, equipment and storage medium
CN116383454B (en) Data query method of graph database, electronic equipment and storage medium
CN115017132B (en) Database migration method, equipment and storage medium
CN116414856A (en) Data query method and device, electronic equipment and storage medium
CN118467575A (en) Target query statement determination method and device, electronic equipment and storage medium
CN115712645A (en) Data processing method, device, equipment and storage medium
CN116361323A (en) Relation tree adjustment method, device, equipment and storage medium
CN117827840A (en) Index creation method and device, electronic equipment and storage medium
CN115964387A (en) Data query method and device, distributed database system and medium
CN116701442A (en) SQL sentence processing method, device, equipment and storage medium
CN116401269A (en) Data query method and device, electronic equipment and storage medium
CN115495368A (en) Data testing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination