CN113656437A - Method and device for determining optimal query plan - Google Patents

Method and device for determining optimal query plan Download PDF

Info

Publication number
CN113656437A
CN113656437A CN202110751189.8A CN202110751189A CN113656437A CN 113656437 A CN113656437 A CN 113656437A CN 202110751189 A CN202110751189 A CN 202110751189A CN 113656437 A CN113656437 A CN 113656437A
Authority
CN
China
Prior art keywords
query step
query
execution cost
stability
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110751189.8A
Other languages
Chinese (zh)
Other versions
CN113656437B (en
Inventor
常剑
董宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Innovation Co
Original Assignee
Alibaba Singapore Holdings Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Singapore Holdings Pte Ltd filed Critical Alibaba Singapore Holdings Pte Ltd
Priority to CN202110751189.8A priority Critical patent/CN113656437B/en
Publication of CN113656437A publication Critical patent/CN113656437A/en
Application granted granted Critical
Publication of CN113656437B publication Critical patent/CN113656437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • G06F16/24545Selectivity estimation or determination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One or more embodiments of the present specification provide an updating method for a reference execution cost of a query step feature, and basic information of the same or similar query steps has the same feature, so that a corresponding relationship between the query step feature and the reference execution cost is constructed by using the feature of the query step as an identifier. And updating the reference execution cost of the query step features: the method comprises the steps of obtaining execution result information after a database execution engine finishes any query step, determining the characteristics of the query step according to basic information of the query step, determining the actual execution cost of the query step information according to the execution result information, and taking the actual execution cost as the reference execution cost corresponding to the characteristics of the query step. Thus, when the optimizer engine needs to evaluate the execution cost of a certain query step, the corresponding reference execution cost can be queried through the characteristics of the query step.

Description

Method and device for determining optimal query plan
Technical Field
One or more embodiments of the present disclosure relate to the field of database technologies, and in particular, to a method and an apparatus for determining an optimal query plan.
Background
The database generally comprises a parser engine, an optimizer engine and an executor engine, wherein after receiving a query request aiming at certain data, the database is parsed into a query plan by the parser engine and is sent to the optimizer engine, the optimizer engine optimizes the query plan and then sends the optimized query plan to the execution engine, and the executor engine completes query on the data according to the received query plan to obtain a query result.
After receiving the query plan, the optimizer engine decomposes the query plan to determine whether each query step can be optimized, and specifically, when it is determined that another query step can be substituted for a certain query step, determines which query step is more suitable to select by evaluating the execution cost of the query steps, for example, if a certain query relates to three tables, the connection order of the three tables is different, and the actual execution cost is also different, the connection order of the tables is determined by evaluating the execution cost of each order.
In the prior art, an optimizer engine generally evaluates execution cost information of each query step through basic statistical information of tables, where the basic statistical information is stored in a system table of a database and includes the number of tables, the number of rows of each table, and other basic statistical information, however, the basic statistical information needs to traverse each table for statistics, and therefore, in order to not waste query resources, the basic statistical information is generally updated when the database is idle or a large amount of data is changed, which causes that in some cases, although actual contents of the database are changed, the basic statistical information is not updated in time, and thus accuracy of an optimization result is affected.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a reference execution cost determination method.
To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
according to a first aspect of one or more embodiments of the present specification, there is provided a method for optimizing a query plan, where features of a query step are used as identifiers, and a corresponding relationship between the features of the query step and a reference execution cost is constructed, the method including:
acquiring execution result information of the database execution engine after completing any query step;
determining the characteristics of the query step according to the basic information of the query step, and determining the actual execution cost of the query step information according to the execution result information;
updating the reference execution cost corresponding to the characteristics of the query step according to the actual execution cost of the query step; and the corresponding relation between the query step characteristics and the reference execution cost is used for evaluating the cost of any query step when the query plan is optimized.
According to a second aspect of one or more embodiments of the present specification, there is provided a method for determining an optimal query plan, the method being applied to a database optimization engine, the method including:
analyzing each query step in the query plan;
for any query step, determining an alternative query step to the query step;
determining the characteristics of any alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation;
and selecting the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
According to a third aspect of one or more embodiments of the present specification, there is provided a model construction method for predicting a stability of a reference execution cost, including:
according to the preset duration T1, acquiring a set of query step data samples finished in the historical time period of the duration T1, wherein any query step data sample comprises: basic information of the query step and actual execution cost corresponding to the query step;
for any query step data sample, converting the query step data sample into a training sample by:
determining the characteristics and the completion timestamp of the query step according to the basic information of the query step, and obtaining the reference execution cost when the query step is evaluated by querying the corresponding relation;
calculating to obtain the stability of the reference execution cost corresponding to the query step according to the actual execution cost and the reference execution cost;
obtaining a training sample corresponding to the query step by taking the characteristic of the query step as a characteristic value and taking the reference execution cost stability of the query step as a label;
traversing the data sample set and executing the conversion step to obtain a training sample set;
training by using the training sample set to obtain a model, wherein the model is used for: and predicting the stability of the reference execution cost of any query step according to the characteristics of the query step.
According to a fourth aspect of one or more embodiments of the present specification, there is provided a method for predicting a stability of a reference execution cost, for predicting a stability of a reference execution cost corresponding to any one query step, including:
receiving a query step and determining characteristics of the query step;
inputting the characteristics of the query step into a model for predicting the stability of the execution cost of the reference; the model is constructed by using the construction method;
and predicting the stability of the reference execution cost corresponding to the query step according to the output of the model.
According to a fifth aspect of one or more embodiments of the present specification, there is provided a method for determining an optimal query plan, the method being applied to an optimizer execution engine, the method including:
analyzing each query step in the query plan;
for any query step, determining an alternative query step to the query step;
for any one alternative query step, predicting the reference execution cost stability corresponding to the alternative query step by using the method for predicting the reference execution cost stability; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation;
and selecting the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
In one or more embodiments of the present specification, since the basic information of the same or similar query steps has the same characteristics, the correspondence between the characteristics of the query steps and the reference execution cost is constructed by using the characteristics of the query steps as identifiers. Updating the reference execution cost of the query step features: the method comprises the steps of obtaining execution result information after a database execution engine finishes any query step, determining the characteristics of the query step according to basic information of the query step, determining the actual execution cost of the query step information according to the execution result information, and taking the actual execution cost as the reference execution cost corresponding to the characteristics of the query step. Thus, when the database optimizer engine needs to evaluate the execution cost of a certain query step, the corresponding reference execution cost can be queried through the characteristics of the query step.
In one or more embodiments of the present disclosure, when performing the execution cost evaluation for one query step, the optimizer engine has a more specific reference execution cost than the basic statistical information in the prior art, and has a more reference meaning, and when updating the reference execution cost, the optimizer engine does not need to occupy the query resource of the database, thereby reducing the query load.
Drawings
FIG. 1 is a flowchart illustrating a method for optimizing a query plan in accordance with an exemplary embodiment.
FIG. 2 is a flow diagram illustrating the relationship between query steps of a query plan, according to an exemplary embodiment.
Fig. 3 is a flowchart illustrating a method for determining a reference execution cost according to an exemplary embodiment.
FIG. 4 is a flowchart illustrating a model building method for predicting the stability of a reference execution cost according to an exemplary embodiment.
FIG. 5 is a flowchart illustrating a predictive model training method according to an exemplary embodiment.
Fig. 6 is a flowchart illustrating a method for predicting stability of execution cost with reference to an exemplary embodiment.
Fig. 7 is a flowchart illustrating another method for determining a cost with reference to execution according to an exemplary embodiment.
Fig. 8 is a schematic diagram of an embodiment framework of a method for determining a cost with reference to execution according to an exemplary embodiment.
FIG. 9 is a block diagram of an apparatus for optimizing a query plan in accordance with an exemplary embodiment.
Fig. 10 is a block diagram of a reference execution cost determination apparatus according to an exemplary embodiment.
Fig. 11 is a block diagram of a model building apparatus for predicting stability of a reference execution cost according to an exemplary embodiment.
Fig. 12 is a block diagram of a device for predicting stability of reference execution cost according to an exemplary embodiment.
Fig. 13 is a block diagram of another reference execution cost determination apparatus according to an exemplary embodiment.
Fig. 14 is a schematic structural diagram of an apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
The basic statistical information mainly refers to a type of information describing tables, sizes and scales of indexes, distribution conditions of the database, and the like, for example, the number of rows of the tables, the number of blocks, the size of each row on average, the number of rows of the index field, the sizes of different values, and the like. The basic statistical information can be obtained only by traversing each table, and the tables are necessarily accessed by frequently calling an I/O process during traversal. However, in practical applications, when data is queried, related calculation is not the most time-consuming, and it is only a time-consuming part when an I/O process is called to access each table for I/O, and it is obvious that resources of the I/O process are precious, so that in order to ensure efficient external access to the database, resources are not wasted for statistics of basic statistical information when the database is frequently accessed externally (when the database is busy).
On one hand, the optimizer engine carries out cost evaluation on the query plan by means of basic statistical information, the accuracy of the basic statistical information influences the accuracy of optimization, but when the database is busy, the basic statistical information is difficult to update, and the accuracy of the basic statistical information is difficult to ensure; on the other hand, the basic statistical information is the macroscopic statistical information belonging to the database, and the estimated execution cost is the macroscopic execution cost.
The inventor finds in practice that in some scenarios where there are a large number of access requests, the query data and the query type have temporal locality and spatial locality, for example, in an analytic database application scenario, a large number of select statements (query statements) are encountered, and in a period of time, frequent accesses may be made to some banks or tables of the database, in another period of time, frequent accesses may be made to other banks or tables, and the queries may be the same or similar. The query step is composed of a plurality of query steps, for example, the table a and the table B are connected in the first step, the row with a keyword not being empty is selected in the second step, and the executor engine completes the query step by step when executing the query plan, and accordingly, after executing a step, the relevant execution process information of the step can be obtained, such as how many rows are read, how many rows are output, how long the time is spent, and the like. Therefore, if the database optimizer engine evaluates the execution cost of a certain query step and the same or similar query step is executed and completed in the recent period of time, the optimizer engine can evaluate the execution cost of completing the query step by directly using the actual execution cost of the same or similar query step in the recent period of time after the query step is executed and completed, which is obviously more accurate.
Based on this, the present specification provides a method for updating the reference execution cost of the features of the query step, and the basic information of the same or similar query steps has the same features, so that the corresponding relationship between the features of the query step and the reference execution cost is constructed by using the features of the query step as identifiers. And updating the reference execution cost of the query step features: the method comprises the steps of obtaining execution result information after a database execution engine finishes any query step, determining the characteristics of the query step according to basic information of the query step, determining the actual execution cost of the query step information according to the execution result information, and taking the actual execution cost as the reference execution cost corresponding to the characteristics of the query step. Therefore, when the execution cost of a certain query step needs to be evaluated, the corresponding reference execution cost can be queried through the characteristics of the query step.
In one or more embodiments of the present disclosure, when performing the execution cost evaluation for one query step, the optimizer engine has a more specific reference execution cost than the basic statistical information in the prior art, and has a more reference meaning, and when updating the reference execution cost, the optimizer engine does not need to occupy the query resource of the database, thereby reducing the query load.
Based on the above thought, the present specification provides a method for optimizing a query plan, a method for determining an optimal query plan, a model construction method for predicting the stability of the reference execution cost, and a method for predicting the stability of the reference execution cost.
The method for optimizing the query plan utilizes the corresponding relation between the constructed query step characteristics and the reference execution cost, and the corresponding relation referred by other methods for evaluating the cost of any query step is updated according to the method.
The reference execution cost determination method is to determine the reference execution cost of the query step by querying the corresponding relationship updated by the updating method according to the characteristics of the query step.
In the model construction method, when constructing the training sample, the correspondence as described above needs to be queried according to the characteristics of the sample query step, and then the model executing the cost stability from the prediction reference is trained by using the constructed training sample set.
The method for predicting the stability of the reference execution cost uses the model constructed by the model construction method, before the reference execution cost of the query step is determined, the stability of the current reference execution cost can be predicted, and the higher the stability is, the more accurate the current reference execution cost is.
These methods will be described in detail below.
First, an optimized query plan method is explained, as shown in fig. 1, which is a flow diagram of the optimized query plan method illustrated in this description.
Before the method is executed, the characteristics of the query step are used as identifiers, and the corresponding relation between the characteristics of the query step and the reference execution cost is constructed. And evaluating the cost of any query step when the corresponding relation between the query step characteristics and the reference execution cost is used for optimizing the query plan.
Each query step has its basic information, including the query plan to which the query step belongs, the query type, the expression of the query step (including the involved operators, databases, tables, columns), the submission time of the query step, etc., for example, the query step is "table a inner JOIN table B on a. key ═ b.key", the query plan to which the query step belongs has ID 1 (each query plan has its unique identifier), the query type is JOIN type, the involved operators are "═ c", the key column to which a table and a table are involved, and the key column to which B table and B table are involved.
In which the same or similar query steps will have the same or similar query statements, i.e. the same references to databases, tables and columns, and the same or similar expressions, so that in practical applications, the database, tables, columns and expressions, which are referred to in each query step, can be used as the characteristics of the query step, such as List < expressontype, List < Symbol > >.
The reference execution cost is used to evaluate the execution cost for executing the query step, such as how many rows need to be input, how many rows need to be output, how much memory is consumed, and the like.
And 102, acquiring the execution result information of any query step.
After receiving a query plan (also called as an execution plan), the executor engine executes the query plan according to steps in the query plan, wherein each query step has a plurality of corresponding query steps, each query step is executed and completed by an operator in the executor engine (one operator is responsible for completing a basic data processing logic, one group of operators completes a group of query steps of data according to the query plan, in other words, one query step is executed and completed by at least one operator to obtain a result), after each operator is executed, how many rows (rows) are read by itself, how many rows are output, how many data (KB) of how many sizes are read in total, how many data of how many sizes are output in total, how many times of total consumption are consumed, and how many total memory usage sums are reported. The execution engine generally encapsulates each query step belonging to one query plan as the execution result information of one query plan, that is, the execution result information of one query plan includes the execution result information of a plurality of query steps. Therefore, in practical applications, the execution result information may be acquired from the execution engine in units of one query plan.
For example, a database has three tables, nation, region, customer, and a query statement:
SELECT count(*)
FROM
nation,region,customer
WHERE
c_nationkey=n_nationkey
AND n_regionkey=r_regionkey
AND r_name='ASIA';
after the optimization engine completes optimization, the query plan sent to the execution engine is as follows:
1. scan the table nation to find the column n _ nationkey, n _ regionkey.
2. Scan the table region, find the column r _ region key, and screen out the r _ name ═ ASIA' row.
3. And internally connecting the tables obtained in the first step and the second step by c _ nationkey ═ n _ nationkey (inner join).
4. The table customer is scanned for column c _ nationkey.
5. And internally connecting the tables obtained in the fourth step and the fifth step by n _ regionkey-r _ regionkey (inner join).
6. And calculating the number of rows in the table obtained in the fifth step.
As shown in fig. 2, the execution relationship of each query step executed for the above query plan. For example, steps 1 and 2 may be performed simultaneously, and the two operators perform the operation to obtain results respectively, and then perform the interconnection according to the results obtained in steps 1 and 2, and the interconnection step is performed by the other operator, and so on.
In practical applications, the order of magnitude of query plans completed by the database executor engine may be large, and therefore, in one or more embodiments of the present description, the obtaining condition may be set, for example, the sampling ratio may be set, and the database executor engine obtains the execution result information of one query plan every time5 query plans are completed; or setting an acquisition period, acquiring the execution result information of a query plan and the like every 1 ms; or receiving a specified command, and starting to acquire the execution result information of each query plan.
And 104, determining the characteristics and the actual execution cost of the query step.
As described above, each query plan has a plurality of query steps, and after each query step is executed, an execution result message is obtained, including the execution process of the query step, including how many rows (rows) are read during execution, how many rows are output, how many data sizes (KB) are read in total, how many data sizes are output in total, how many total time consumption is, how many total memory usage is, and the like.
The execution process of the query step can reflect how much cost is consumed for executing the query step, and therefore, one or more execution process data are selected as the actual execution cost for executing the query step by acquiring the execution result information of the query step.
And step 106, updating the reference execution cost corresponding to the characteristics of the query step.
Wherein, according to the actual execution cost determined in step 104, the reference execution cost corresponding to the query step feature is updated.
In practical application, the corresponding relationship between the query step features and the reference execution cost may be updated only, and the following table is the corresponding relationship before updating:
characteristics of query steps Reference execution cost
Characteristic A Cost A1
Characteristic B Cost B1
TABLE 1
When a query step is received, the query step being characterized as feature a, and the actual execution cost of the query step is cost a2, then after updating, the correspondence becomes:
characteristics of query steps Reference execution cost
Characteristic A Cost A2
Characteristic B Cost B1
TABLE 2
The corresponding relationship may also be constructed by constructing an actual execution cost and a corresponding relationship between the timestamp and the query step feature, at this time, the actual execution cost of the query step feature obtained each time may be sequentially saved according to the time sequence of the timestamp (i.e., the actual execution cost and the corresponding relationship between the timestamp and the query step feature are also updated), and the corresponding actual execution cost with the latest time is used as the reference actual execution cost of the query step, which is the corresponding relationship before updating as shown in the following table:
Figure BDA0003146311250000071
TABLE 3
After receiving a query step with a query step characteristic of feature a, the actual execution cost of the query step is actual cost a3, the timestamp is TimeA3, and then the actual execution cost and the completion timestamp are added to the identifier corresponding to the query step characteristic, then after updating, the correspondence relationship becomes:
Figure BDA0003146311250000072
TABLE 4
The reference execution cost value corresponding to each query step feature is a value of an actual execution cost corresponding to a latest timestamp of the timestamp, as in table 3, the actual execution cost corresponding to the latest timestamp of the feature a is a2, so the reference execution cost corresponding to the feature a is a reference cost a2 (that is, the reference cost a2 is an actual cost a2), and the actual execution cost corresponding to a latest timestamp of the feature B is 2, so the reference execution cost corresponding to the feature B is a reference cost B2, as in table 4, the actual reference cost corresponding to the latest timestamp of the feature a is a reference cost A3, so the reference execution cost corresponding to the feature a is a reference cost A3.
In practical application, if the actual execution cost and the timestamp after the execution of each query step is completed are stored without limitation, and the data amount is accumulated for a long time, therefore, in one or more embodiments of the present specification, a storage condition may be set, for example, if the actual execution cost corresponding to any feature exceeds 20, the previous 10 are deleted and then stored; or if the time length of the time stamp from the current time stamp exceeds 4h, deleting the corresponding actual execution cost and the corresponding time stamp.
The present specification shows that the query step features, the reference execution cost, the actual execution cost, and the timestamp are stored in one table, and in fact, a corresponding relationship table may be established between the query step features and the reference execution cost, and a corresponding relationship table may be established between the query step features and the actual execution cost, and the timestamp.
In practical applications, in some cases, the query step feature corresponding to a certain query step is determined not to be found in the corresponding relationship (for example, when the corresponding relationship is constructed, no query step feature is available, or similar query steps have not been executed before), at this time, the identifier of the query step feature may be added to the corresponding relationship, and the actual execution cost corresponding to the query step is taken as the reference execution cost.
In practical application, the corresponding relationship may be stored in a table form, or may be stored in an index method, and only the corresponding reference execution cost may be found according to the query step. During storage, the system table may be stored in a database, as may the basic statistical information, may also be stored in a memory, and may also be stored in a hard disk.
The above is a description of the method for optimizing query planning, and the corresponding relationship between the query step features and the reference execution cost is constructed and updated, and the reference meaning is more specific with respect to the basic statistical information, for example, when table join is performed, table a has 20 rows, table B has 10 rows, so that after the table join is performed, there are 10 rows at most, and the optimizer engine will only evaluate as 10 rows or a rough number when evaluating. The reference execution cost is obtained by using an actual execution result (i.e., an actual execution cost) after the table a and the table B are connected in the row, for example, 4 rows are obtained after the table a and the table B are actually connected in the row, and then after the execution cost of the table a and the table B is evaluated, it can be determined that 4 rows are output after the table a and the table B are connected in the row. Moreover, since the execution result information of any query step executed by the executor engine is obtained (that is, the execution result information of the query step that the execution engine originally needs to execute is obtained), the query resource of the database is not occupied, and the query load of the database is not caused (the query irrelevant to the service is not performed).
The following describes a method for determining an optimal query plan, which uses the correspondence between the query step characteristics updated by the above-described updating method and the reference execution costs. The updating method improves the accuracy of the reference execution cost of the characteristics of the query steps (compared with the basic statistical information, the reference meaning is more specific, the updating speed is high, and therefore the updating method is more accurate), therefore, no matter technicians obtain the reference execution cost of each query step to determine the operation state of the database, or an optimizer engine generates an optimal query plan by using the reference execution cost, the reference meaning of the reference execution cost is higher than that of the basic statistical information.
As shown in fig. 3, a flowchart of a method for determining an optimal query plan, applied to an optimizer engine, is shown in this specification, and includes the following steps:
step 302, analyzing each query step in the query plan.
After receiving a query plan, the optimizer engine parses out each query step in the query plan, which is in the form of a tree.
Step 304, for any query step, determining an alternative query step to the query step.
Some query steps are provided with alternative query steps, such as left-hand or right-hand connection when two tables are connected, as described above.
Step 306, for any alternative query step, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the correspondence.
The optimizer engine optimizes each query step based on cost, and therefore, the cost of executing each query step needs to be evaluated to select the alternative query step with the lowest execution cost.
In this specification, the optimizer engine optimizes and evaluates the cost of executing each query step, and obtains the execution cost of each query step by querying the corresponding relationship between the query step characteristics and the reference execution cost.
Wherein, the corresponding relationship between the query step and the reference execution cost is the corresponding relationship obtained by the above updating method.
And 308, selecting the alternative query step with the minimum execution cost of each query step, and determining the optimal query plan.
In practical application, when the reference execution cost of a certain query step needs to be determined, it is found that the query step features of the query step are not in the corresponding relationship (that is, no similar query step has been executed before), at this time, the basic statistical information can be referred to as an evaluation basis, so that after the query step is executed, the actual execution cost can be obtained and used as the reference execution cost of the subsequent same or similar query step.
By using the optimization query planning method and the determination method, the accuracy of the optimizer engine in evaluating each query step can be greatly improved, but in some cases, the data change is fast, so that the difference between the actual execution cost and the actual execution cost is large, and based on the method, the model construction method for predicting the stability of the reference execution cost is provided.
The following describes in detail a model construction method for predicting a reference execution cost stability, as shown in fig. 4, which is a flow diagram of the model construction method for predicting a reference execution cost stability shown in this specification, and includes the following steps:
step 402, obtaining a data sample of the query step.
When acquiring data samples, acquiring a set of query step data samples executed within a history time period of a time length T1 according to a preset time length T1, wherein any query step data sample includes: basic information of the query step and actual execution cost corresponding to the query step.
Since the data changes are temporal locality and spatial locality, the data samples of the query step selected by the building model need to be samples within a near time period, such as within 30 mis. For query data samples long before, the significance of model construction is not great, for example, some data modification operations in the period of 6:00-6:10 cause the reference execution cost of partial query steps to be very unstable, and if the query data samples in the period of 6:00-6:10 are used for constructing the model, unstable prediction results can be obtained when the reference execution cost of some query steps is predicted. If the data changes are not much during 16:00-16:10, i.e., the reference execution cost of each query step is stable, a wrong prediction result may be obtained if the model constructed by 6:00-6:10 is used. Thus, in building the model, it is necessary to obtain a set of data samples of query steps that are performed to completion within a historical period of time of duration T1.
And step 404, constructing a training sample set.
For any query step data sample, converting the query step data sample into a training sample in the following way:
determining the characteristics and the completion timestamp of the query step according to the basic information of the query step, and obtaining the reference execution cost when the query step is evaluated by querying the corresponding relation shown in tables 3 and 4;
calculating to obtain the stability of the reference execution cost corresponding to the query step according to the actual execution cost and the reference execution cost;
obtaining a training sample corresponding to the query step by taking the characteristic of the query step as a characteristic value and taking the reference execution cost stability of the query step as a label;
and traversing the data sample set and executing the conversion step to obtain a training sample set.
As shown in table 4, if the completion timestamp of a certain query step is Time5, the query step uses data set 2 corresponding to Time2 when the execution cost is evaluated.
As shown in fig. 5, which is a schematic flow chart of a predictive model training method shown in this specification, a data sample of any query step includes basic information of the query step and an actual execution cost after the query step is executed, a feature of the query step is extracted through the basic information of the query step, and the actual execution cost for actually executing the query step is determined through execution result information of the query step.
Taking the execution cost as the input line number and the output line number as an example, let statsfiff (place node states-operators)/Max (place node states, operators), where statsfiff is used to represent the difference between the actual execution cost and the reference execution cost, and place node states represents the obtained reference execution cost, and if 1-statsfiff is used as a label, the closer the label is to 1, the more stable the reference execution cost is, the closer to 0, the more unstable the reference execution cost is.
In addition, in one or more embodiments of the present description, the reference execution cost of each query step in the evaluation and the actual execution cost of the query step may be saved as a query step data sample when the query step is completed.
Step 406, training a model for predicting the stability of the reference execution cost of any query step.
And training a model for predicting the stability of the reference execution cost of any query step by using the constructed training sample set. The model for predicting the execution cost stability of the reference can be constructed based on different algorithms, for example, the model can be constructed based on a support vector machine, the model can be constructed based on a neural network, the model can be constructed based on a random forest, and the constructed model can achieve the purpose shown in the specification.
Taking a random forest as an example:
assuming that the number of samples of the constructed training sample set is N, for any tree, N (N < N) training samples are randomly and replaceably extracted from the training sample set as the training sample set of the tree.
If the feature dimension of each sample is M, a constant M < < M is assigned, M feature subsets are randomly selected from the M feature monkeys, and each time the tree is split, the optimal feature subset is selected from the M feature collections.
Each tree grows to the greatest extent possible without pruning.
In practical applications, the model may be constructed under a preset condition, for example, the preset condition may be that the model is constructed at a time point of a preset period, and the prediction model is constructed once every time period passes. The model prediction method can also be used for constructing a model, and when the model is determined not to be updated for a long time, the model needs to be updated, namely reconstructed, in order to ensure the accuracy of the model prediction.
The above is a description of a model construction method for predicting the stability of the reference execution cost, and based on the above model construction method, the present description also provides a corresponding model using method, that is, a method for predicting the stability of the reference execution cost. Next, a method of predicting the execution cost stability of the reference will be described in detail.
In practical applications, the data of some tables and columns of the database may change dramatically, and the reference execution cost of the query steps related to the tables and columns may also change greatly. Due to rapid and sharp change of data, the actual execution cost after the execution is finished is far from the actual execution cost after the last execution is finished.
As shown in fig. 6, a flowchart of a method for predicting the execution cost stability of a reference is shown in the present specification, which includes the following steps:
step 602, determine the characteristics of the query step.
The received query steps can be sent by the optimizer engine, or can be input by a tester for testing, and are determined according to the actual application scenario.
The characteristics of the inquiry step are determined based on the basic information of the inquiry step, and the same method as the above-mentioned method for determining the characteristics of the inquiry step is used.
Step 604, inputting the characteristics of the query step into the model.
The prediction model uses a model construction method for predicting the stability of the reference execution cost as described above, and will not be described in detail here.
And 606, predicting the reference execution cost stability of the query step according to the output of the model.
Because the data sample for constructing the prediction model is the query step within the preset historical time, if the data in the database in the recent historical time is stable, the reference execution cost of each query step is also stable, and the reference execution cost predicted by using the model in the recent time in the future is also stable. If the data in the database in the past time is not stable, the reference execution cost of each query step is also not stable, and the reference execution cost in the future time predicted by using the model is also not stable.
In one or more embodiments of the present description, when a prediction model is used for prediction, a construction time of the prediction model may be determined, and if the construction time of the prediction model is long, it indicates that the model is not updated for a long time, and at this time, the model is used to predict the execution cost stability, and an obtained result does not have a great reference meaning, so that when it is determined that the construction time of the prediction model exceeds the preset time, the prediction model may be updated.
In practical applications, before the reference execution cost is determined by using the reference execution cost determining method, whether the reference execution cost is stable or not can be predicted by using the method for predicting the reference execution cost stability.
As shown in fig. 7, for another method for determining an optimal query plan, which is applied to an optimizer execution engine, the method includes the following steps:
step 702, analyzing each query step in the query plan.
Step 704, for any query step, determines an alternative query step to the query step.
Step 706, for any alternative query step, predicting the reference execution cost stability corresponding to the alternative query step by using the method for predicting the reference execution cost stability as described above; and if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation.
Step 708, selecting the alternative query step with the minimum execution cost of each query step, and determining the optimal query plan.
If the stability is greater than the preset value, it indicates that the reference execution cost corresponding to the characteristics of the query step is relatively stable in a recent period of historical time, and therefore, the stored reference execution cost has high reliability (high accuracy) and can be directly queried and obtained.
Wherein, obtaining the reference execution cost of the query step is by using the reference execution cost determination method as described above.
If the stability is smaller than the predicted value, it is indicated that the reference execution cost corresponding to the feature of the query step is relatively unstable in a recent period of historical time, so that the reliability of the stored reference execution cost is not high (the accuracy is low) and the reference meaning is not large.
In practical application, if the stability is smaller than a preset value, the model for predicting the stability of the reference execution cost is updated.
In one or more embodiments of the present specification, a prediction result of a model for predicting the stability of the reference execution cost is directly related to sample data used for constructing the model, and if a training sample for constructing the prediction model is unstable, the obtained prediction result is also unstable, and if the training sample for constructing the prediction model is stable, the obtained prediction result is also stable. Therefore, when the stability is determined to be less than, it is indicated that the data related to the query step is unstable, and the data related to the query step is stable, and then the model is constructed by using the stable data sample of the query step, then the cost of predicting the reference execution of the query step by using the updated model is also stable.
In addition, in practical application, the optimizer engine determines that the order of the reference execution cost and the stability of the predicted reference execution cost may not be fixed for any alternative query step, and may obtain the reference execution cost first, then predict the stability of the reference execution cost, and determine whether the reference execution cost needs to be obtained by the optimizer engine.
Fig. 8 is a schematic block diagram of an embodiment of a method for determining a cost with reference to execution, which is described in detail below.
The data acquisition module is responsible for acquiring data in the database executor engine, and includes execution result information of any query step, which may be organized in the form of [ QueryId, TimeStamp, PlanInfo, and ExecutionInfo ], where QueryId is a unique identifier of each query plan, TimeStamp is a query submission TimeStamp, PlanInfo is plan information of the query plan, which includes multiple PlanNodeInfo (query step information), and ExecutionInfo is execution result information, which includes execution result information after execution of each query step is completed.
Wherein, each PlanNodeInfo uniquely identifies the PlanNodeId of the query step, the type PlanNodeType of the query step, and the reference execution cost PlanNodeStats of the query step, and the expression List List < expression type, List < Symbol > >, of the query step. Taking the reference execution cost as the input line number and the output line number as an example, plant nodestats [ InputRowCount, OutputRowCount ], where the input line number InputRowCount and the output line number OutputRowCount are taken as examples.
The Symbol may include schema id, TableId, column id, which are unique identifiers of the database, the table, and the column, respectively, and the expresstype is an expression operator of each type supported by the database.
The ExecutionInfo includes an actual execution cost after each query step is completed, corresponding to the reference execution cost, and the actual execution cost OperatorStats ═ InputRowCount, OutputRowCount.
The data storage module stores the data collected by the data collection module, the PlanNodeId of each query step is used as a unique identifier, and [ QueryId, Timestamp, PlanNodeType, List < ExpressionType, List < Symbol > >, PlanNodeStats, OperatorStats ] is used as a structure for storage.
The intelligent decision-making module can control data acquisition (whether to acquire and acquisition sampling ratio) of the data acquisition module, and can also perform model training by using data stored in the data storage module (wherein, the model needs to be updated all the time to ensure the accuracy of prediction).
The present specification also provides apparatuses, electronic devices, and computer-readable storage media corresponding to the above methods, and the following are descriptions of the apparatuses, devices, and storage media corresponding to the above methods.
The present specification provides an optimized query plan apparatus, as shown in fig. 9, which uses the features of the query step as identifiers to construct a corresponding relationship between the features of the query step and the reference execution cost, where the method includes:
an execution result information obtaining model 902, configured to obtain execution result information after the database execution engine completes any query step;
a characteristic and actual execution cost determining module 904, configured to determine the characteristic of the query step according to the basic information of the query step, and determine the actual execution cost of the query step information according to the execution result information;
an updating module 906, configured to update the reference execution cost corresponding to the feature of the query step according to the actual execution cost of the query step; and the corresponding relation between the query step characteristics and the reference execution cost is used for evaluating the cost of any query step when the query plan is optimized.
Wherein the execution cost is execution process information which can be obtained from the execution result information.
The correspondence further includes: the optimized query planning apparatus may further include a completion timestamp determining apparatus (not shown) for determining a completion timestamp of the query step according to the basic information of the query step; at this time, the updating module 906 is configured to add the actual execution cost and the completion timestamp to the identifier corresponding to the query step feature, and update the reference execution cost value corresponding to the query step feature to the actual execution cost value.
The present specification provides a reference execution cost determination apparatus, as shown in fig. 10, for determining a reference execution cost of any query step, including:
an analysis module 1002, configured to analyze each query step in the query plan;
a determination module 1004 for determining, for any query step, alternative query steps to the query step;
a reference execution cost determination module 1006, for any alternative query step, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the correspondence relationship of claim 1;
the optimal query plan determining module 1108 selects an alternative query step with the minimum execution cost for each query step to obtain an optimal query plan.
The present specification provides a model building apparatus for predicting a stability of a reference execution cost, as shown in fig. 11, including:
a data sample obtaining module 1102, configured to obtain, according to a preset time duration T1, a set of query step data samples executed within a history time period of a time duration T1, where any query step data sample includes: basic information of the query step and actual execution cost corresponding to the query step;
the training sample determination module 1104 includes three subunits, which is used to convert any query step data sample into a training sample by the following method:
the query subunit 1114 is configured to determine, according to the basic information of the query step, characteristics and a completion timestamp of the query step, and obtain, by querying the correspondence as described above, a reference execution cost when the query step is evaluated;
the stability calculation subunit 1124, configured to calculate, according to the actual execution cost and the reference execution cost, a stability of the reference execution cost corresponding to the query step;
a training sample determination subunit 1134, configured to obtain a training sample corresponding to the query step by using the feature of the query step as a feature value and using the reference execution cost stability of the query step as a label;
a training sample set determining module 1106, configured to perform the conversion step on the data sample set in a traversal manner, so as to obtain a training sample set;
a model training module 1108, configured to train with the training sample set to obtain a model, where the model is configured to: and predicting the stability of the reference execution cost of any query step according to the characteristics of the query step.
The present specification provides an apparatus for predicting a stability of a reference execution cost, as shown in fig. 12, the apparatus for predicting a stability of a reference execution cost corresponding to any query step includes:
a feature determination module 1202 for receiving a query step and determining a feature of the query step;
an input module 1204, configured to input the features of the query step into a model for predicting the stability of the execution cost of the reference; the model is constructed by using the model construction device;
and a prediction result obtaining module 1206, configured to predict, according to the output of the model, a reference execution cost stability corresponding to the query step.
The present specification also provides another reference execution cost determination apparatus, as shown in fig. 13, including:
an analysis module 1302, configured to analyze each query step in the query plan;
a determining module 1304 for determining, for any query step, an alternative query step to the query step;
a reference execution cost determining module 1306, configured to predict, for any one alternative query step, a reference execution cost stability corresponding to the alternative query step by using the above apparatus for predicting a reference execution cost stability; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation;
an optimal query plan determining module 1308, configured to select an alternative query step with the smallest execution cost in each query step, to obtain an optimal query plan.
The apparatus may further include an updating module (not shown) configured to update the model for predicting the reference execution cost stability if the reference execution cost stability corresponding to the querying step is smaller than a preset value.
The apparatuses, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
This specification provides an electronic device including:
a processor;
a memory for storing processor-executable instructions;
the processor implements a method for optimizing a query plan by running the executable instructions, and constructs a corresponding relationship between the query step features and a reference execution cost by using the features of the query step as identifiers, wherein the method at least comprises the following steps:
acquiring execution result information of the database execution engine after completing any query step;
determining the characteristics of the query step according to the basic information of the query step, and determining the actual execution cost of the query step information according to the execution result information;
updating the reference execution cost corresponding to the characteristics of the query step according to the actual execution cost of the query step; and the corresponding relation between the query step characteristics and the reference execution cost is used for evaluating the cost of any query step when the query plan is optimized.
This specification also provides an electronic device, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements a method of determining an optimal query plan by executing the executable instructions, the method comprising at least:
analyzing each query step in the query plan;
for any query step, determining an alternative query step to the query step;
determining the characteristics of the alternative query steps aiming at any alternative query step, and querying the reference execution cost corresponding to the alternative query steps according to the corresponding relation between the characteristics of the query steps and the reference execution cost;
and selecting the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
This specification also provides an electronic device, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements a model building method for predicting a stability of a reference execution cost by executing the executable instructions, the method comprising at least:
according to the preset duration T1, acquiring a set of query step data samples finished in the historical time period of the duration T1, wherein any query step data sample comprises: basic information of the query step and actual execution cost corresponding to the query step;
for any query step data sample, converting the query step data sample into a training sample by:
determining the characteristics and the completion timestamp of the query step according to the basic information of the query step, and obtaining the reference execution cost when the query step is evaluated through the corresponding relation between the characteristics of the query step and the reference execution cost;
calculating to obtain the stability of the reference execution cost corresponding to the query step according to the actual execution cost and the reference execution cost;
obtaining a training sample corresponding to the query step by taking the characteristic of the query step as a characteristic value and taking the reference execution cost stability of the query step as a label;
traversing the data sample set and executing the conversion step to obtain a training sample set;
training by using the training sample set to obtain a model, wherein the model is used for: and predicting the stability of the reference execution cost of any query step according to the characteristics of the query step.
This specification also provides an electronic device, including:
a processor;
a memory for storing processor-executable instructions;
the processor executes the executable instructions to realize a method for predicting the stability of the reference execution cost, which is used for predicting the stability of the reference execution cost corresponding to any query step, and the method at least comprises the following steps:
receiving a query step and determining characteristics of the query step;
inputting the characteristics of the query step into a model for predicting the stability of the execution cost of the reference; the model is constructed by using the construction method;
and predicting the stability of the reference execution cost corresponding to the query step according to the output of the model.
This specification also provides an electronic device, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements a method of determining an optimal query plan by executing the executable instructions, the method comprising at least:
analyzing each query step in the query plan;
for any query step, determining an alternative query step to the query step;
for any one alternative query step, predicting the stability of the reference execution cost corresponding to the alternative query step; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation between the characteristics of the query step and the reference execution cost;
and selecting the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
FIG. 14 is a schematic block diagram of an apparatus provided in an exemplary embodiment. Referring to FIG. 14, at the hardware level, the device includes a processor 1402, an internal bus 1404, a network interface 1406, a memory 1408, and a non-volatile storage 1410, although other hardware required for service may be included. One or more embodiments of the present description can be implemented in software, such as by processor 1402 reading corresponding computer programs from non-volatile storage 1410 into memory 1408 and then running. Of course, besides software implementation, the one or more embodiments in this specification do not exclude other implementations, such as logic devices or combinations of software and hardware, and so on, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
The apparatuses shown in fig. 9-13 can be applied to the device shown in fig. 14 to implement the technical solution of the present specification.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
The present specification provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method of optimizing a query plan, identifying characteristics of query steps, and constructing a correspondence between the characteristics of the query steps and a reference execution cost, the method comprising:
acquiring execution result information of the database execution engine after completing any query step;
determining the characteristics of the query step according to the basic information of the query step, and determining the actual execution cost of the query step information according to the execution result information;
updating the reference execution cost corresponding to the characteristics of the query step according to the actual execution cost of the query step; and the corresponding relation between the query step characteristics and the reference execution cost is used for evaluating the cost of any query step when the query plan is optimized.
The present specification also provides a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method of determining an optimal query plan, the method comprising at least:
analyzing each query step in the query plan;
for any query step, determining an alternative query step to the query step;
determining the characteristics of the alternative query steps aiming at any alternative query step, and querying the reference execution cost corresponding to the alternative query steps according to the corresponding relation between the characteristics of the query steps and the reference execution cost;
and selecting the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
The present specification also provides a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a model building method for predicting a stability of a reference execution cost, the method comprising at least:
according to the preset duration T1, acquiring a set of query step data samples finished in the historical time period of the duration T1, wherein any query step data sample comprises: basic information of the query step and actual execution cost corresponding to the query step;
for any query step data sample, converting the query step data sample into a training sample by:
determining the characteristics and the completion timestamp of the query step according to the basic information of the query step, and obtaining the reference execution cost when the query step is evaluated through the corresponding relation between the characteristics of the query step and the reference execution cost;
calculating to obtain the stability of the reference execution cost corresponding to the query step according to the actual execution cost and the reference execution cost;
obtaining a training sample corresponding to the query step by taking the characteristic of the query step as a characteristic value and taking the reference execution cost stability of the query step as a label;
traversing the data sample set and executing the conversion step to obtain a training sample set;
training by using the training sample set to obtain a model, wherein the model is used for: and predicting the stability of the reference execution cost of any query step according to the characteristics of the query step.
The present specification also provides a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method of predicting a stability of a reference execution cost for predicting a stability of a reference execution cost corresponding to any one of query steps, the method comprising at least:
receiving a query step and determining characteristics of the query step;
inputting the characteristics of the query step into a model for predicting the stability of the execution cost of the reference; the model is constructed by using the construction method;
and predicting the stability of the reference execution cost corresponding to the query step according to the output of the model.
The present specification also provides a computer-readable storage medium having stored thereon computer instructions that, when executed by a processor, are executable to implement a method of determining an optimal query plan, the method comprising at least:
analyzing each query step in the query plan;
for any query step, determining an alternative query step to the query step;
for any one alternative query step, predicting the stability of the reference execution cost corresponding to the alternative query step; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation between the characteristics of the query step and the reference execution cost;
and selecting the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Furthermore, the present description also provides a computer program for implementing the method as described in any one of the above when executed by one or more processors.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims (14)

1. A method for optimizing a query plan, which takes characteristics of a query step as identification and constructs a corresponding relation between the characteristics of the query step and a reference execution cost, comprises the following steps:
acquiring execution result information of the database execution engine after completing any query step;
determining the characteristics of the query step according to the basic information of the query step, and determining the actual execution cost of the query step information according to the execution result information;
updating the reference execution cost corresponding to the characteristics of the query step according to the actual execution cost of the query step;
and the corresponding relation between the query step characteristics and the reference execution cost is used for evaluating the cost of any query step when the query plan is optimized.
2. The method of claim 1, the correspondence further comprising: the actual execution cost and the corresponding relation between the timestamp and the characteristics of the query step, the method further comprises the following steps:
determining a completion timestamp of the query step according to the basic information of the query step;
updating the reference execution cost corresponding to the characteristics of the query step according to the actual execution cost of the query step, including:
and adding the actual execution cost and the completion timestamp to the identifier corresponding to the query step feature, and updating the reference execution cost value corresponding to the query step feature to the actual execution cost value.
3. A method for determining an optimal query plan, the method being applied to a database optimization engine, the method comprising:
analyzing each query step in the query plan;
for any query step, determining an alternative query step to the query step;
for any alternative query step, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation of claim 1;
and selecting the alternative query step with the minimum execution cost of each query step to determine the optimal query plan.
4. A model construction method for predicting the stability of a reference execution cost comprises the following steps:
according to the preset duration T1, acquiring a set of query step data samples finished in the historical time period of the duration T1, wherein any query step data sample comprises: basic information of the query step and actual execution cost corresponding to the query step;
for any query step data sample, converting the query step data sample into a training sample by:
determining the characteristics and the completion timestamp of the query step according to the basic information of the query step, and obtaining the reference execution cost when the query step is evaluated by querying the corresponding relation as claimed in claim 2;
calculating to obtain the stability of the reference execution cost corresponding to the query step according to the actual execution cost and the reference execution cost;
obtaining a training sample corresponding to the query step by taking the characteristic of the query step as a characteristic value and taking the reference execution cost stability of the query step as a label;
traversing the data sample set and executing the conversion step to obtain a training sample set;
training by using the training sample set to obtain a model, wherein the model is used for: and predicting the stability of the reference execution cost of any query step according to the characteristics of the query step.
5. A method for predicting the stability of reference execution cost is used for predicting the stability of reference execution cost corresponding to any query step, and comprises the following steps:
receiving a query step and determining characteristics of the query step;
inputting the characteristics of the query step into a model for predicting the stability of the execution cost of the reference; the model is constructed by the construction method of claim 4;
and predicting the stability of the reference execution cost corresponding to the query step according to the output of the model.
6. A method for determining an optimal query plan, the method being applied to an optimizer execution engine, the method comprising:
analyzing each query step in the query plan;
for any query step, determining an alternative query step to the query step;
for any alternative query step, predicting the stability of the reference execution cost corresponding to the alternative query step by using the method as claimed in claim 5; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relationship of claim 1;
and selecting the alternative query step with the minimum execution cost of each query step to determine the optimal query plan.
7. The method of claim 6, further comprising:
and if the reference execution cost stability corresponding to the query step is smaller than a preset value, updating the model for predicting the reference execution cost stability.
8. An apparatus for optimizing a query plan, using query step features of a query step as an identifier, and constructing a correspondence between the query step features and a reference execution cost, the apparatus comprising:
the execution result information acquisition module is used for acquiring the execution result information of the database execution engine after any query step is finished;
the characteristic and actual execution cost determining module is used for determining the characteristics of the query step according to the basic information of the query step and determining the actual execution cost of the query step information according to the execution result information;
the updating module is used for updating the reference execution cost corresponding to the characteristics of the query step according to the actual execution cost of the query step; and the corresponding relation between the query step characteristics and the reference execution cost is used for evaluating the cost of any query step when the query plan is optimized.
9. A model construction apparatus for predicting a stability of a reference execution cost, comprising:
a data sample obtaining module, configured to obtain, according to a preset duration T1, a set of query step data samples that are executed within a historical time period of a duration T1, where a data sample of any query step includes: basic information of the query step and actual execution cost corresponding to the query step;
the training sample determining module comprises three subunits and is used for converting the data sample of the query step into the training sample by the following modes aiming at the data sample of any query step:
a query subunit, configured to determine, according to the basic information of the query step, a feature and a completion timestamp of the query step, and obtain, by querying the correspondence relationship according to claim 8, a reference execution cost when the query step is evaluated;
the stability degree calculation operator unit is used for calculating and obtaining the stability degree of the reference execution cost corresponding to the query step according to the actual execution cost and the reference execution cost;
a training sample determining subunit, configured to obtain a training sample corresponding to the query step by using the feature of the query step as a feature value and using the reference execution cost stability of the query step as a label;
a training sample set determining module, configured to perform the conversion step on the data sample set in a traversal manner, so as to obtain a training sample set;
a model training module, configured to train with the training sample set to obtain a model, where the model is configured to: and predicting the stability of the reference execution cost of any query step according to the characteristics of the query step.
10. An apparatus for predicting a reference execution cost stability, which is used for predicting the reference execution cost stability corresponding to any query step, includes:
the characteristic determining module is used for receiving the query step and determining the characteristics of the query step;
an input module for inputting the characteristics of the query step to a model for predicting the stability of the execution cost of the reference; the model is constructed by using the model construction device of claim 9;
and the prediction result acquisition module is used for predicting the stability of the reference execution cost corresponding to the query step according to the output of the model.
11. An apparatus for determining an optimal query plan, the apparatus comprising:
the analysis module is used for analyzing each query step in the query plan;
a determination module for determining, for any query step, an alternative query step to the query step;
a reference execution cost determination module, configured to, for any alternative query step, predict a reference execution cost stability corresponding to the alternative query step by using the apparatus according to claim 10; if the stability of the reference execution cost corresponding to the alternative query step is greater than the preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relationship of claim 8;
and the optimal query plan determining module is used for selecting the alternative query step with the minimum execution cost of each query step to determine the optimal query plan.
12. A computer program for implementing the method of any one of claims 1-7 when executed by one or more processors.
13. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any one of claims 1-7 by executing the executable instructions.
14. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 7.
CN202110751189.8A 2021-07-02 2021-07-02 Model construction method for predicting execution cost stability of reference Active CN113656437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110751189.8A CN113656437B (en) 2021-07-02 2021-07-02 Model construction method for predicting execution cost stability of reference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110751189.8A CN113656437B (en) 2021-07-02 2021-07-02 Model construction method for predicting execution cost stability of reference

Publications (2)

Publication Number Publication Date
CN113656437A true CN113656437A (en) 2021-11-16
CN113656437B CN113656437B (en) 2023-10-03

Family

ID=78489892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110751189.8A Active CN113656437B (en) 2021-07-02 2021-07-02 Model construction method for predicting execution cost stability of reference

Country Status (1)

Country Link
CN (1) CN113656437B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309777A (en) * 2022-10-10 2022-11-08 北京奥星贝斯科技有限公司 Data query method and device

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085375A1 (en) * 2004-10-14 2006-04-20 International Business Machines Corporation Method and system for access plan sampling
US20070033159A1 (en) * 2005-08-03 2007-02-08 Cherkauer Kevin J Query plan editor with integrated optimizer
US20100257154A1 (en) * 2009-04-01 2010-10-07 Sybase, Inc. Testing Efficiency and Stability of a Database Query Engine
US20130060753A1 (en) * 2010-02-25 2013-03-07 Maxim Lukichev Optimization Method And Apparatus
CN103324724A (en) * 2013-06-26 2013-09-25 华为技术有限公司 Method and device for processing data
CN103927346A (en) * 2014-03-28 2014-07-16 浙江大学 Query connection method on basis of data volumes
US20170004175A1 (en) * 2015-06-30 2017-01-05 International Business Machines Corporation Enhancements for optimizing query executions
JP2017060355A (en) * 2015-09-18 2017-03-23 三菱電機株式会社 System controller and system stabilization system
CN106815339A (en) * 2017-01-03 2017-06-09 北京华胜信泰数据技术有限公司 The method and apparatus that inquiry plan updates
CN109241093A (en) * 2017-06-30 2019-01-18 华为技术有限公司 A kind of method of data query, relevant apparatus and Database Systems
CN110807041A (en) * 2019-11-01 2020-02-18 广州华多网络科技有限公司 Index recommendation method and device, electronic equipment and storage medium
CN111444220A (en) * 2020-05-09 2020-07-24 南京大学 Cross-platform SQ L query optimization method combining rule driving and data driving
CN111563101A (en) * 2020-07-11 2020-08-21 阿里云计算有限公司 Execution plan optimization method, device, equipment and storage medium
CN111597209A (en) * 2020-04-30 2020-08-28 清华大学 Database materialized view construction system, method and system creation method
US20200356563A1 (en) * 2019-05-08 2020-11-12 Datameer, Inc. Query performance model generation and use in a hybrid multi-cloud database environment
CN112380243A (en) * 2020-11-19 2021-02-19 东北大学 SQL query selectivity estimation method based on machine learning

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085375A1 (en) * 2004-10-14 2006-04-20 International Business Machines Corporation Method and system for access plan sampling
US20070033159A1 (en) * 2005-08-03 2007-02-08 Cherkauer Kevin J Query plan editor with integrated optimizer
US20100257154A1 (en) * 2009-04-01 2010-10-07 Sybase, Inc. Testing Efficiency and Stability of a Database Query Engine
US20130060753A1 (en) * 2010-02-25 2013-03-07 Maxim Lukichev Optimization Method And Apparatus
CN103324724A (en) * 2013-06-26 2013-09-25 华为技术有限公司 Method and device for processing data
CN103927346A (en) * 2014-03-28 2014-07-16 浙江大学 Query connection method on basis of data volumes
US20170004175A1 (en) * 2015-06-30 2017-01-05 International Business Machines Corporation Enhancements for optimizing query executions
JP2017060355A (en) * 2015-09-18 2017-03-23 三菱電機株式会社 System controller and system stabilization system
CN106815339A (en) * 2017-01-03 2017-06-09 北京华胜信泰数据技术有限公司 The method and apparatus that inquiry plan updates
CN109241093A (en) * 2017-06-30 2019-01-18 华为技术有限公司 A kind of method of data query, relevant apparatus and Database Systems
US20200356563A1 (en) * 2019-05-08 2020-11-12 Datameer, Inc. Query performance model generation and use in a hybrid multi-cloud database environment
CN110807041A (en) * 2019-11-01 2020-02-18 广州华多网络科技有限公司 Index recommendation method and device, electronic equipment and storage medium
CN111597209A (en) * 2020-04-30 2020-08-28 清华大学 Database materialized view construction system, method and system creation method
CN111444220A (en) * 2020-05-09 2020-07-24 南京大学 Cross-platform SQ L query optimization method combining rule driving and data driving
CN111563101A (en) * 2020-07-11 2020-08-21 阿里云计算有限公司 Execution plan optimization method, device, equipment and storage medium
CN112380243A (en) * 2020-11-19 2021-02-19 东北大学 SQL query selectivity estimation method based on machine learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何信;: "数据库查询执行成本模型研究", 计算机工程与应用, no. 13 *
非我在: "《A Tour of PostgreSQL Internals》学习笔记——查询处理分析", 《HTTPS://WWW.CNBLOGS.COM/FLYING-TIGER/P/5892070.HTML》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309777A (en) * 2022-10-10 2022-11-08 北京奥星贝斯科技有限公司 Data query method and device
CN115309777B (en) * 2022-10-10 2023-01-24 北京奥星贝斯科技有限公司 Data query method and device

Also Published As

Publication number Publication date
CN113656437B (en) 2023-10-03

Similar Documents

Publication Publication Date Title
US11977541B2 (en) Systems and methods for rapid data analysis
CN112988782B (en) Hive-supported interactive query method and device and storage medium
JP2004518226A (en) Database system and query optimizer
CN110147367B (en) Temperature missing data filling method and system and electronic equipment
CN110210684B (en) Grain processing scheme optimization method, device, equipment and storage medium
CN115617830A (en) Data query optimization processing method and device based on machine learning
CN105512264A (en) Performance prediction method of concurrency working loads in distributed database
CN110659282A (en) Data route construction method and device, computer equipment and storage medium
KR20200010624A (en) Big Data Integrated Diagnosis Prediction System Using Machine Learning
CN111752955A (en) Data processing method, device, equipment and computer readable storage medium
CN117271481B (en) Automatic database optimization method and equipment
WO2021012861A1 (en) Method and apparatus for evaluating data query time consumption, and computer device and storage medium
CN113918605A (en) Data query method, device, equipment and computer storage medium
CN113553341A (en) Multidimensional data analysis method, multidimensional data analysis device, multidimensional data analysis equipment and computer readable storage medium
CN113656437B (en) Model construction method for predicting execution cost stability of reference
CN115730507A (en) Model engine construction method, kernel function processing method, device and storage medium
CN110377805B (en) Sensor resource recommendation method based on rapid branch allocation and sorting algorithm
CN116737511A (en) Graph-based scheduling job monitoring method and device
CN113672496B (en) Cosine similarity-based test method and system
CN115168509A (en) Processing method and device of wind control data, storage medium and computer equipment
CN113360553B (en) Data cold and hot degree evaluation method and server
CN114564501A (en) Database data storage and query methods, devices, equipment and medium
US11386155B2 (en) Filter evaluation in a database system
CN113656292A (en) Multi-dimensional cross-space-time basic software performance bottleneck detection method
CN112506953A (en) Query method, device and storage medium based on Structured Query Language (SQL)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40069613

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240305

Address after: # 03-06, Lai Zan Da Building 1, 51 Belarusian Road, Singapore

Patentee after: Alibaba Innovation Co.

Country or region after: Singapore

Address before: Room 01, 45th Floor, AXA Building, 8 Shanton Road, Singapore

Patentee before: Alibaba Singapore Holdings Ltd.

Country or region before: Singapore