Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.
It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.
The basic statistical information mainly refers to a type of information describing a table in a database, the size of an index, the scale, the distribution condition of the database, and the like, for example, the number of rows of the table, the number of blocks, the average size of each row, the number of rows of an index field, the sizes of different values, and the like. Where these basic statistics are obtained by traversing the tables, the traversal necessarily requires frequent calls to the I/O process to access the tables. However, in practical application, when data is queried, the related calculation is not the most time-consuming, and the I/O process is called to access each table to perform I/O, which is the time-consuming part, so that the I/O process resources are precious, and therefore, in order to ensure that the database is accessed externally efficiently, when the database is accessed externally frequently (when the database is busy), the resources are not wasted for counting basic statistical information.
On one hand, the optimizer engine carries out cost evaluation on the query plan by means of basic statistical information, the accuracy of the basic statistical information influences the accuracy of optimization, but when the database is busy, the basic statistical information is difficult to update, and the accuracy of the basic statistical information is difficult to ensure; on the other hand, the basic statistical information belongs to the macroscopic statistical information of the database, and the estimated execution cost is macroscopic execution cost.
The inventors have found in practice that in some scenarios where there are a large number of access requests, the query data and query types have temporal locality as well as spatial locality, taking the analytical database application scenario as an example, a large number of select statements (query statements) may be faced, during a period of time, with frequent accesses to some of the libraries or tables of the database, and during another period of time, with frequent accesses to other libraries or tables, and the queries may be the same or similar. The query step is composed of multiple query steps, for example, the first step is to connect the A table with the B table, the second step is to select a row whose key word is not empty, and the executor engine completes the query step by step when executing the query plan, and correspondingly, each time a step is executed, the execution process information related to the step is obtained, such as how many rows are read, how many rows are output, how long time is spent, etc. Thus, if the database optimizer engine evaluates the execution cost of a query step and the execution of the same or similar query step is completed in a short period of time, the optimizer engine evaluates the execution cost of the completed query step by directly using the actual execution cost of the same or similar query step after the completion of the execution of the same or similar query step in a short period of time, which is obviously more accurate.
Based on this, the present specification provides a method for updating the reference execution cost of the query step feature, and the basic information of the same or similar query step has the same feature, so that the corresponding relationship between the query step feature and the reference execution cost is constructed by using the feature of the query step as the identifier. And updating the reference execution cost of the query step characteristics: the method comprises the steps of obtaining execution result information of any query step completed by a database execution engine, determining characteristics of the query step according to basic information of the query step, determining actual execution cost of the query step information according to the execution result information, and taking the actual execution cost as reference execution cost corresponding to the characteristics of the query step. Thus, when the execution cost of a query step needs to be evaluated, the corresponding reference execution cost can be queried through the characteristics of the query step.
In one or more embodiments of the present disclosure, when performing execution cost evaluation for a query step, the optimizer engine refers to the execution cost more specifically than the basic statistical information in the prior art, and has a reference meaning, and when updating the reference execution cost, the optimizer engine does not need to occupy the query resources of the database, thereby reducing the query load.
The specification provides a method for optimizing a query plan, a method for determining the optimal query plan, a model construction method for predicting the stability of the execution cost of a reference, and a method for predicting the stability of the execution cost of the reference based on the thought.
The method for optimizing the query plan utilizes the corresponding relation between the constructed query step characteristics and the reference execution cost, and the corresponding relation referred by other methods for evaluating the cost of any query step is updated according to the method.
The reference execution cost determining method is to determine the reference execution cost of the query step by querying the corresponding relation updated by the updating method according to the characteristics of the query step.
In the model construction method, when a training sample is constructed, the corresponding relation is queried according to the characteristics of a sample query step, and then a model with the stability of the execution cost for prediction reference is trained by utilizing the constructed training sample set.
The method for predicting the stability of the reference execution cost uses a model constructed by a model construction method, and before the reference execution cost of the query step is determined, the stability of the current reference execution cost can be predicted first, and the higher the stability is, the more accurate the current reference execution cost is.
These methods are described in detail below.
First, an optimized query plan method is described, as shown in fig. 1, which is a schematic flow chart of the optimized query plan method shown in this description.
Before the method is executed, the corresponding relation between the characteristics of the query step and the reference execution cost is constructed by taking the characteristics of the query step as the identification. And evaluating the cost of any query step when the corresponding relation between the query step characteristics and the reference execution cost is used for optimizing the query plan.
Each query step has its basic information including a query plan to which the query step belongs, a query type, an expression of the query step (including related operators, databases, tables, columns), a submission time of the query step, for example, the query step is "table A inner JOIN table B on a.key=b.key", the ID of the query plan to which the query step belongs is 1 (each query plan has its unique identifier), the query type is a JOIN type, related operators are "=", related to key columns of tables a and a, and related key columns of tables B and B.
The same or similar query steps may have the same or similar query terms, i.e. the database, table and column are the same, and the related expressions are the same or similar, so in practical application, the database, table, column and expression related to each query step may be used as the features of the query step, such as List < ExpressionType, list < Symbol >.
The reference execution cost is used to evaluate the execution cost of executing the query step, such as how many lines need to be input, how many lines need to be output, how much memory is consumed, etc.
Step 102, obtaining the execution result information of any query step.
After the executor engine receives a query plan (also called as an execution plan), the query plan is executed according to steps in the query plan, wherein each query step has a plurality of corresponding query steps, each query step is completed by an operator in the executor engine (one operator is responsible for completing a basic data processing logic, one operator completes a group of query steps of data according to the query plan, in other words, one query step is completed by at least one operator), after each operator completes execution, each operator has how many rows (rows) are read when executing, how many rows are output, how much data (KB) are read in total, how much data are output in total, how much total consumption is, and how much total memory use is reported. The execution engine generally encapsulates each query step belonging to one query plan as execution result information of one query plan, that is, execution result information of one query plan, including execution result information of a plurality of query steps. Therefore, in practical application, the execution result information may be acquired from the execution engine in units of one query plan.
For example, a database has three tables, nation, region, customer, a query statement is:
SELECT count(*)
FROM
nation, region, customer
WHERE
c_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'ASIA';
after the optimization engine completes the optimization, the query plan sent to the execution engine is:
1. the table orientation is scanned to find the column n_relationship, n_region.
2. The table region is scanned for columns r_region key and r_name= 'ASIA' rows are filtered out.
3. The tables obtained in the first and second steps are internally connected (inner join) with c_relationship=n_relationship.
4. The table customer is scanned for column c_relationship.
5. The table obtained in the fourth step and the fifth step is internally concatenated (inner join) with n_region=r_region.
6. And calculating how many rows of the table obtained in the fifth step are.
As shown in fig. 2, the execution relationship of each query step executed by the query plan is shown. For example, steps 1 and 2 can be performed simultaneously, and then the two operators are submitted to obtain results respectively, then the internal connection is performed according to the results obtained in steps 1 and 2, and the internal connection is performed by the other operator, and so on.
In practical applications, the order of magnitude of the query plans completed by the database executor engine may be large, so in one or more embodiments of the present disclosure, an acquisition condition may be set, for example, a sampling ratio is set, and each time the database executor engine completes 5 query plans, execution result information of one of the query plans is acquired; or setting an acquisition period, and acquiring execution result information of a query plan at intervals of 1 ms; or receiving a specified command, and starting to acquire the execution result information of each query plan.
Step 104, determining the characteristics of the query step and the actual execution cost.
As described above, each query plan has a plurality of query steps, and after each query step is executed, execution result information including the execution process of the query step, including how many rows (rows) are read during execution, how many rows are output, how much data (KB) are read in total, how much data are output in total, how much total consumption time is, how much total memory usage is, and so on.
The execution process of the query step can reflect how much cost is spent executing the query step, so that by acquiring the execution result information of the query step, one or more pieces of execution process data are selected as the actual execution cost of executing the query step.
And 106, updating the reference execution cost corresponding to the query step characteristics.
And updating the reference execution cost corresponding to the query step feature according to the actual execution cost determined in the step 104.
In practical application, only the corresponding relation between the query step feature and the reference execution cost can be updated, and the following table is the corresponding relation before updating:
when a query step characterized by feature a is received, the actual execution cost of the query step is the cost A2, and after updating, the correspondence relationship becomes:
The corresponding relationship may also be constructed by constructing an actual execution cost and a corresponding relationship between a timestamp and the query step feature, where the actual execution cost of the query step feature obtained each time may be sequentially saved according to the time sequence of the timestamp (that is, the actual execution cost and the corresponding relationship between the timestamp and the query step feature are updated), and the corresponding actual execution cost with the latest time is used as the reference actual execution cost of the query step, as shown in the following table, which is the corresponding relationship before updating:
after a query step with the feature of the query step being the feature a is received, the actual execution cost of the query step is the actual cost A3, the timestamp is the timestamp A3, then the actual execution cost and the completion timestamp are added to the identifier corresponding to the feature of the query step, and after updating, the corresponding relationship becomes:
the value of the reference execution cost corresponding to each query step feature is the value of the actual execution cost corresponding to the latest timestamp, as in table 3, the actual execution cost corresponding to the latest timestamp of feature a is A2, so the reference execution cost corresponding to feature a is a reference cost A2 (that is, reference cost a2=actual cost A2), the actual execution cost corresponding to the latest timestamp of feature B is 2, so the reference execution cost corresponding to feature B is a reference cost B2, and the actual reference cost corresponding to the latest timestamp of feature a is a reference cost A3, as in table 4, so the reference execution cost corresponding to feature a is a reference cost A3.
In practical application, if the actual execution cost and the timestamp after the execution of each query step is completed are saved without limitation, the data volume is quite huge after long-term accumulation, so in one or more embodiments of the present disclosure, the save condition may be set, for example, if the actual execution cost corresponding to any feature exceeds 20, the first 10 pieces are deleted and then saved; or if the time length of the time stamp from the current time stamp exceeds 4h, deleting the corresponding actual execution cost and the corresponding time stamp.
The present description illustrates that the query step feature, the reference execution cost, the actual execution cost and the timestamp are stored in a table, and in fact, a corresponding relationship table may be established between the query step feature and the reference execution cost, and a corresponding relationship table may be established between the query step feature, the actual execution cost and the timestamp.
In some cases, in the actual application, the determined query step feature corresponding to a certain query step is not found in the corresponding relationship (for example, when the corresponding relationship is built, no query step feature is found, or no similar query step is executed before), at this time, the identifier of the query step feature may be added to the corresponding relationship, and the actual execution cost corresponding to the query step is taken as the reference execution cost.
In practical application, the corresponding relation can be stored in a table form or can be stored in an index method, and only the corresponding reference execution cost can be found according to the query step. When the data is stored, the data can be stored in a system table of a database, a memory, a hard disk and the like as basic statistical information.
The above is an explanation of the method for optimizing the query plan, and the corresponding relation between the features of the query step and the execution cost of the reference is constructed and updated, and the reference meaning is more specific than the basic statistical information, for example, when the table connection is performed, the table a has 20 rows, the table B has 10 rows, and after the internal connection is performed, at most, 10 rows are provided, and when the optimizer engine performs the evaluation, only the optimizer engine evaluates the table as 10 rows or a rough number. The reference execution cost is obtained by using an actual execution result (i.e., an actual execution cost) after the internal connection of the table a and the table B, for example, the table a and the table B are actually connected and then are in 4 rows, and after the execution cost of the internal connection of the table a and the table B is evaluated, it can be determined that the table a and the table B are connected and then output 4 rows. In addition, because the execution result information of any query step is obtained after the execution of any query step by the executor engine (that is, the execution result information of the query step which is originally needed to be executed by the execution engine is obtained), the query resource of the database is not occupied, and the query load (the query irrelevant to the service is not performed) is not caused by the database.
A method for determining an optimal query plan using the correspondence between the query step characteristics updated by the updating method and the reference execution cost will be described. The updating method improves the accuracy of the reference execution cost of the query step characteristics (compared with the basic statistical information, the reference meaning is more specific and the updating speed is faster and therefore more accurate), so that whether a technician acquires the reference execution cost of each query step to determine the running condition of the database or an optimizer engine generates an optimal query plan by utilizing the reference execution cost, the reference meaning of the reference execution cost is higher than the basic statistical information.
Referring to fig. 3, a flowchart of a method for determining an optimal query plan shown in the specification is shown, and the method is applied to an optimizer engine, and includes the following steps:
step 302, each query step in the query plan is parsed.
After receiving a query plan, the optimizer engine parses out each query step in the query plan, which is in the form of a tree.
Step 304, for any query step, determining an alternative query step to the query step.
Some query steps have alternative query steps, such as the connection of two tables, left or right.
Step 306, determining the characteristics of any alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the correspondence as described above.
The optimizer engine optimizes the individual query steps based on cost, and therefore, the cost of executing each query step needs to be evaluated to select the alternative query step with the smallest execution cost.
In the present specification, the optimizer engine optimizes and evaluates the cost of executing each query step, which is the corresponding relationship between the features of the query step and the reference execution cost, so as to obtain the execution cost of each query step.
The corresponding relation between the query step and the reference execution cost is the corresponding relation obtained by the updating method.
Step 308, selecting an alternative query step with the minimum execution cost of each query step, and determining an optimal query plan.
In practical applications, when the reference execution cost of a certain query step needs to be determined, it is found that the query step features of the query step are not in the corresponding relationship (that is, similar query steps have not been executed before), and at this time, basic statistical information can be referred to as an evaluation basis, so that after the query step is executed, the actual execution cost can be obtained and used as the reference execution cost of the subsequent same or similar query steps.
By using the optimized query planning method and the determining method, the accuracy of the optimizer engine in evaluating each query step can be improved to a great extent, but in some cases, the data change is fast, so that the difference between the actual execution cost and the actual execution cost is also large, and based on the fact, the model building method for predicting the stability of the reference execution cost is provided in the specification.
The following describes in detail a model construction method for predicting the execution cost stability of a reference, as shown in fig. 4, which is a schematic flow chart of the model construction method for predicting the execution cost stability of a reference shown in the present specification, and includes the following steps:
step 402, obtaining a query step data sample.
When the data sample is acquired, acquiring a query step data sample set which is completed in a history period of the time length T1 according to a preset time length T1, wherein any query step data sample comprises: the basic information of the query step and the actual execution cost corresponding to the query step.
Because the data changes are temporally localized and spatially localized, the query step data samples selected by the model are required to be samples within a period of time, such as within about 30 mils. For long-term query data samples, the meaning of model construction is not great, for example, some of the data modification operations have a large amount during the period of 6:00-6:10, so that the reference execution cost of part of query steps is very unstable, and if the model is constructed by using the query step data samples during the period of 6:00-6:10, unstable prediction results are obtained when the reference execution cost of some query steps is predicted. If the data is not changed much during the period 16:00-16:10, that is, the reference execution cost of each query step is stable, erroneous prediction results are obtained if the model constructed by using the 6:00-6:10 is used. Thus, in building the model, a sample set of query step data that is completed during the historical period of time duration T1 needs to be obtained.
Step 404, constructing a training sample set.
Wherein, for any query step data sample, the query step data sample is converted into a training sample by:
determining the characteristics and the completion time stamp of the query step according to the basic information of the query step, and obtaining the reference execution cost when the query step is evaluated by querying the corresponding relation shown in the tables 3 and 4;
according to the actual execution cost and the reference execution cost, calculating to obtain the stability of the reference execution cost corresponding to the query step;
taking the characteristic of the query step as a characteristic value and the stability of the reference execution cost of the query step as a label to obtain a training sample corresponding to the query step;
and traversing the data sample set to execute the conversion step to obtain a training sample set.
The reference execution cost is updated with Time, so when determining the reference execution cost of each query step, the Time of the query step needs to be determined, the Time is different, and the reference execution cost when evaluating the execution cost is different, as shown in table 4, if the completion timestamp of a query step is Time5, then the query step uses the data set 2 corresponding to Time2 when evaluating the execution cost.
As shown in fig. 5, in order to illustrate a flow chart of a predictive model training method shown in the present specification, any query step data sample includes basic information of the query step and actual execution cost after the execution of the query step, features of the query step are extracted through the basic information of the query step, and the actual execution cost of actually executing the query step is determined through the execution result information of the query step.
Taking the execution cost as an input line number and an output line number as an example, let statsdiff= (plannodest-operatostate)/Max (PlanNodeStates, operatorStates), where StatsDiff is used to represent the difference between the actual execution cost and the reference execution cost, plannodest represents the acquired reference execution cost, if 1-StatsDiff is used as a label, the closer the label is to 1, the more stable the reference execution cost is, the closer to 0, and the more unstable the reference execution cost is.
Furthermore, in one or more embodiments of the present description, at the completion of each query step, both the reference execution cost of the query step at the time of evaluation and the actual execution cost of the query step may be saved as one query step data sample.
Step 406, training a model for predicting the reference execution cost stability of any query step.
And training a model for predicting the reference execution cost stability of any query step by using the constructed training sample set. The model for predicting the execution cost stability of the reference can be constructed based on different algorithms, for example, the model can be constructed based on a support vector machine mode, the model can be constructed based on a neural network mode, the model can be constructed based on a random forest mode, and the constructed model can achieve the purposes shown in the specification.
Taking a random forest as an example:
assuming that the number of samples of the constructed training sample set is N, for any tree, N (N < N) training samples are randomly and repeatedly extracted from the training sample set as the training sample set of the tree.
If the feature dimension of each sample is M, a constant M < M is assigned, and M feature subsets are randomly selected from the one of the M feature monkeys, each time the tree is split, the optimal is selected from the M feature summaries.
Each tree grows to the greatest extent, and the pruning process is not carried out on the tree.
In practical applications, the model may be constructed under a preset condition, for example, the preset condition may be that the model is constructed at a time point of a preset period, and the prediction model is constructed once every time a time period passes. And the construction time length can be also be the construction time length, and when the model is determined to be not updated for a long time, the model needs to be updated, namely, reconstructed, in order to ensure the accuracy of model prediction.
The above is a description of a model construction method for predicting the execution cost stability, and based on the model construction method, the present description also provides a corresponding model use method, that is, a method for predicting the execution cost stability of a reference. Next, a method of performing the cost stability of the prediction reference will be described in detail.
In practical applications, the data of some tables and columns of the database may change dramatically, and the cost of performing the reference to the query steps involving those tables and columns may also change significantly. Because of rapid and abrupt changes of data, the actual execution cost after the execution is finished at this time is far different from the actual execution cost after the execution is finished at the last time.
As shown in fig. 6, a flow chart of a method for predicting the execution cost stability of reference shown in the present specification includes the following steps:
step 602, determining characteristics of the querying step.
The received query step may be sent by the optimizer engine, or may be input by a tester for testing, depending on the actual application scenario.
The features of the query step are determined based on the basic information of the query step, and are the same as the method for determining the features of the query step.
Step 604, inputting the features of the query step into the model.
The prediction model uses the model construction method for predicting the execution cost stability as described above, and will not be described in detail here.
And step 606, predicting the reference execution cost stability of the query step according to the output of the model.
Because the data sample for constructing the prediction model is a query step in a preset history time, if the data in the database is stable in a near-period history time, the reference execution cost of each query step is stable, and the reference execution cost in a near-period future predicted by the model is also stable. If the data in the database is unstable in a near-period of history, the cost of reference execution of each query step is also unstable, and the cost of reference execution in a near-period future predicted by using the model is also unstable.
In one or more embodiments of the present disclosure, when the prediction model is used for prediction, the construction duration of the prediction model may be determined, if the construction duration of the prediction model is long, which indicates that the model is not updated for a long time, and at this time, the prediction using the model refers to the execution cost stability, and the obtained result does not have a great reference meaning, so when it is determined that the construction duration of the prediction model exceeds the preset duration, the prediction model may be updated.
In practical application, before the reference execution cost is determined by the reference execution cost determining method, whether the reference execution cost is stable or not can be predicted by using the method for predicting the reference execution cost stability.
As shown in FIG. 7, another method for determining an optimal query plan shown in the present specification is applied to an optimizer execution engine, and includes the following steps:
step 702, analyze each query step in the query plan.
Step 704, for any query step, determining an alternative query step to the query step.
Step 706, for any alternative query step, predicting the reference execution cost stability corresponding to the alternative query step by using the method for predicting the reference execution cost stability as described above; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation.
Step 708, selecting an alternative query step with the minimum execution cost of each query step, and determining an optimal query plan.
If the stability is greater than the preset value, it indicates that the reference execution cost corresponding to the characteristic of the query step is relatively stable in a near historical time, so that the stored reference execution cost has high reliability (high accuracy) and can be directly queried and obtained.
The reference execution cost of the query step is obtained by using the reference execution cost determining method.
If the stability is smaller than the predicted value, the reference execution cost corresponding to the characteristic of the query step is unstable in a near historical time, so that the reliability (low accuracy) of the stored reference execution cost is low and the reference meaning is not great.
In practical application, if the stability is smaller than the preset value, updating the model for predicting the stability of the reference execution cost.
In one or more embodiments of the present disclosure, the prediction result of the model for predicting the execution cost stability is directly related to the sample data used to construct the model, and if the training sample to construct the prediction model is unstable, the obtained prediction result is also unstable, and if the training sample to construct the prediction model is stable, the obtained prediction result is also stable. Thus, when the stability is determined to be less than the threshold, the data related to the query step is unstable, and the data related to the query step is stable, and then a model is built by using the stable query step data sample, then the updated model is used to predict that the reference execution cost of the query step is stable.
In addition, in practical application, in any alternative query step, the optimizer engine may determine that the order of the reference execution cost and the stability of the predicted reference execution cost is not necessarily the same, and may first obtain the reference execution cost, then predict the stability of the reference execution cost, and determine whether the obtained reference execution cost is needed by the optimizer engine.
Fig. 8 is a schematic diagram of an embodiment framework of the cost determining method according to the present specification, and is described in detail below.
The data acquisition module is responsible for acquiring data in a database executor engine, including execution result information of any query step, and may be organized in the form of [ QueryId, timeStamp, planInfo, executionInfo ], where QueryId is a unique identifier of each query plan, timeStamp is a TimeStamp of query submission, planInfo is plan information of the query plan, including a plurality of PlanNodeInfo (query step information), and ExecutionInfo is execution result information, including execution result information after execution of each query step is completed.
Wherein each PlanNodeInfo uniquely identifies PlanNodeId of the query step, the type PlanNodeType of the query step, and the reference execution cost PlanNodeStats of the query step, the List of expressions of the query step is < ExpressionType, list < Symbol >. Taking the reference execution cost as the input line number and the output line number as an example, plannodstats= [ InputRowCount, outputRowCount ], taking the input data line number inputroller count as an example, the output data line number outputroller count.
The Symbol may contain SchemaId, tableId, columnId, which is a unique identifier of a database, a table, and a column, and ExpressionType is an operator of various expressions supported by the database.
The ExecutionInfo includes an actual execution cost after completion of each query step, corresponding to the reference execution cost, the actual execution cost operatorstats= [ InputRowCount, outputRowCount ].
The data storage module stores the data acquired by the data acquisition module, takes PlanNodeId of each query step as a unique identifier, and takes [ QueryId, timeStamp, planNodeType, list < ExpressionType, list < Symbol >, planNodeStats, operatorStats ] as a structure.
The intelligent decision module can control data acquisition (whether acquisition and acquisition sampling ratio) of the data acquisition module, and can also utilize data stored by the data storage module to perform model training (wherein, the model needs to be updated all the time to ensure the accuracy of prediction), and when the database optimizer engine queries the reference execution cost of any query step to the intelligent decision module, on one hand, the intelligent decision module finds the reference execution cost corresponding to the query step from the data storage module, and on the other hand, the intelligent decision module utilizes the model to predict whether the reference execution cost of the query step is stable.
The present disclosure also provides an apparatus, an electronic device, and a computer readable storage medium corresponding to the above method, and the following description describes an apparatus, an device, and a storage medium corresponding to the above method.
The present disclosure provides an optimized query plan device, as shown in fig. 9, using features of a query step as an identifier, and constructing a correspondence between the features of the query step and a reference execution cost, where the method includes:
the execution result information obtaining model 902 is configured to obtain execution result information after the database execution engine completes any query step;
the feature and actual execution cost determining module 904 is configured to determine features of the query step according to the basic information of the query step, and determine an actual execution cost of the query step information according to the execution result information;
an updating module 906, configured to update the reference execution cost corresponding to the query step feature according to the actual execution cost of the query step; the corresponding relation between the query step characteristics and the reference execution cost is used for evaluating the cost of any query step when optimizing the query plan.
The execution cost is execution process information which can be obtained from the execution result information.
The correspondence further includes: the optimizing query planning device may further include a completion timestamp determining device (not shown) for determining a completion timestamp of the query step according to the basic information of the query step; at this time, the updating module 906 is configured to increase the actual execution cost and the completion timestamp to an identifier corresponding to the query step feature, and update a value of the reference execution cost corresponding to the query step feature to a value of the actual execution cost.
The present disclosure provides a reference execution cost determining apparatus, as shown in fig. 10, for determining a reference execution cost of any query step, including:
the parsing module 1002 is configured to parse each query step in the query plan;
a determining module 1004, for any query step, determining an alternative query step to the query step;
a reference execution cost determination module 1006, configured to determine, for any alternative query step, a feature of the alternative query step, and query a reference execution cost corresponding to the alternative query step according to the correspondence as set forth in claim 1;
The optimal query plan determining module 1008 selects the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
The present specification provides a model construction apparatus for predicting reference execution cost stability, as shown in fig. 11, including:
the data sample obtaining module 1102 is configured to obtain, according to a preset time length T1, a query step data sample set that is completed in a history period of the time length T1, where any query step data sample includes: basic information of the query step and actual execution cost corresponding to the query step;
the training sample determination module 1104 includes three subunits for converting, for any query step data sample, the query step data sample into a training sample by:
a query subunit 1114, configured to determine a feature and a completion timestamp of the query step according to the basic information of the query step, and obtain, by querying the correspondence, a reference execution cost when the query step is evaluated;
the stability calculating subunit 1124 is configured to calculate, according to the actual execution cost and the reference execution cost, a stability of the reference execution cost corresponding to the query step;
A training sample determining subunit 1134, configured to obtain a training sample corresponding to the query step by using the feature of the query step as a feature value and using the stability of the reference execution cost of the query step as a label;
a training sample set determining module 1106, configured to perform the converting step on the data sample set traversal, to obtain a training sample set;
a model training module 1108, configured to train to obtain a model by using the training sample set, where the model is used to: and predicting the reference execution cost stability of any query step according to the characteristics of the query step.
The present disclosure provides a device for predicting reference execution cost stability, as shown in fig. 12, for predicting reference execution cost stability corresponding to any query step, including:
a feature determination module 1202 for receiving a query step and determining features of the query step;
an input module 1204 for inputting the features of the query step into a model for predicting a reference execution cost stability; the model is constructed by using the model construction device;
and the prediction result obtaining module 1206 is configured to predict the reference execution cost stability corresponding to the query step according to the output of the model.
The present specification also provides another execution cost determining apparatus, as shown in fig. 13, including:
an parsing module 1302, configured to parse each query step in the query plan;
a determining module 1304, configured to determine, for any query step, an alternative query step to the query step;
a reference execution cost determining module 1306, configured to predict, for any alternative query step, a reference execution cost stability corresponding to the alternative query step by using the device for predicting reference execution cost stability as described above; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation;
the optimal query plan determining module 1308 is configured to select an alternative query step with the smallest execution cost of each query step, so as to obtain an optimal query plan.
The apparatus may further include an updating module (not shown) configured to update a model for predicting the reference execution cost stability if the reference execution cost stability corresponding to the querying step is less than a preset value.
The apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
The present specification provides an electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
the processor is configured to execute the executable instruction to implement a method for optimizing a query plan, and construct a correspondence between features of a query step and a reference execution cost by using the features of the query step as an identifier, where the method at least includes:
acquiring execution result information after the database execution engine completes any query step;
determining the characteristics of the query step according to the basic information of the query step, and determining the actual execution cost of the query step information according to the execution result information;
Updating the reference execution cost corresponding to the characteristic of the query step according to the actual execution cost of the query step; and the corresponding relation between the query step characteristics and the reference execution cost is used for evaluating the cost of any query step when optimizing the query plan.
The present specification also provides an electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements a method of determining an optimal query plan by executing the executable instructions, the method comprising at least:
analyzing each query step in the query plan;
determining an alternative query step to the query step for any query step;
for any alternative query step, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation between the query step characteristics and the reference execution cost;
and selecting the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
The present specification also provides an electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
Wherein the processor implements a model building method for predicting a reference execution cost stability by executing the executable instructions, the method comprising at least:
according to a preset time length T1, acquiring a query step data sample set which is completed in the historical time period of the time length T1, wherein any query step data sample comprises: basic information of the query step and actual execution cost corresponding to the query step;
for any query step data sample, the query step data sample is converted into a training sample by:
determining the characteristics and the completion time stamp of the query step according to the basic information of the query step, and obtaining the reference execution cost when the query step is evaluated through the corresponding relation between the characteristics of the query step and the reference execution cost;
according to the actual execution cost and the reference execution cost, calculating to obtain the stability of the reference execution cost corresponding to the query step;
taking the characteristic of the query step as a characteristic value and the stability of the reference execution cost of the query step as a label to obtain a training sample corresponding to the query step;
performing the conversion step on the data sample set traversal to obtain a training sample set;
Training with the training sample set to obtain a model, wherein the model is used for: and predicting the reference execution cost stability of any query step according to the characteristics of the query step.
The present specification also provides an electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
the method for predicting the stability of the reference execution cost by the processor through running the executable instruction is used for predicting the stability of the reference execution cost corresponding to any query step, and the method at least comprises the following steps:
receiving a query step and determining characteristics of the query step;
inputting the characteristics of the query step into a model for predicting the stability of the execution cost of the reference; the model is constructed by using the construction method;
and predicting the stability of the reference execution cost corresponding to the query step according to the output of the model.
The present specification also provides an electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements a method of determining an optimal query plan by executing the executable instructions, the method comprising at least:
Analyzing each query step in the query plan;
determining an alternative query step to the query step for any query step;
predicting the stability of the reference execution cost corresponding to any alternative query step; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation between the characteristics of the query step and the reference execution cost;
and selecting the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
Fig. 14 is a schematic structural diagram of an apparatus provided in an exemplary embodiment. Referring to fig. 14, at a hardware level, the device includes a processor 1402, an internal bus 1404, a network interface 1406, a memory 1408, and a nonvolatile storage 1410, although other hardware required by other services is also possible. One or more embodiments of the present description may be implemented in a software-based manner, such as by the processor 1402 reading a corresponding computer program from the non-volatile storage 1410 into the memory 1408 and then running. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.
The apparatus shown in fig. 9-13 may be applied to the device shown in fig. 14, so as to implement the technical solution of the present specification.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
The present specification provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method of optimizing a query plan, the method comprising:
acquiring execution result information after the database execution engine completes any query step;
determining the characteristics of the query step according to the basic information of the query step, and determining the actual execution cost of the query step information according to the execution result information;
updating the reference execution cost corresponding to the characteristic of the query step according to the actual execution cost of the query step; and the corresponding relation between the query step characteristics and the reference execution cost is used for evaluating the cost of any query step when optimizing the query plan.
The present specification also provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method of determining an optimal query plan, the method comprising at least:
analyzing each query step in the query plan;
determining an alternative query step to the query step for any query step;
for any alternative query step, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation between the query step characteristics and the reference execution cost;
and selecting the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
The present specification also provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a model building method for predicting execution cost stability with reference to a reference, the method comprising at least:
according to a preset time length T1, acquiring a query step data sample set which is completed in the historical time period of the time length T1, wherein any query step data sample comprises: basic information of the query step and actual execution cost corresponding to the query step;
For any query step data sample, the query step data sample is converted into a training sample by:
determining the characteristics and the completion time stamp of the query step according to the basic information of the query step, and obtaining the reference execution cost when the query step is evaluated through the corresponding relation between the characteristics of the query step and the reference execution cost;
according to the actual execution cost and the reference execution cost, calculating to obtain the stability of the reference execution cost corresponding to the query step;
taking the characteristic of the query step as a characteristic value and the stability of the reference execution cost of the query step as a label to obtain a training sample corresponding to the query step;
performing the conversion step on the data sample set traversal to obtain a training sample set;
training with the training sample set to obtain a model, wherein the model is used for: and predicting the reference execution cost stability of any query step according to the characteristics of the query step.
The present disclosure also provides a computer readable storage medium having stored thereon computer instructions that, when executed by a processor, implement a method for predicting a reference execution cost stability corresponding to any query step, the method at least comprising:
Receiving a query step and determining characteristics of the query step;
inputting the characteristics of the query step into a model for predicting the stability of the execution cost of the reference; the model is constructed by using the construction method;
and predicting the stability of the reference execution cost corresponding to the query step according to the output of the model.
The present specification also provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, can execute the instructions to implement a method of determining an optimal query plan, the method comprising at least:
analyzing each query step in the query plan;
determining an alternative query step to the query step for any query step;
predicting the stability of the reference execution cost corresponding to any alternative query step; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation between the characteristics of the query step and the reference execution cost;
and selecting the alternative query step with the minimum execution cost of each query step to obtain the optimal query plan.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
Furthermore, the present description provides a computer program for implementing a method as described in any of the above, when the computer program is executed by one or more processors.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.