CN113656437B

CN113656437B - Model construction method for predicting execution cost stability of reference

Info

Publication number: CN113656437B
Application number: CN202110751189.8A
Authority: CN
Inventors: 常剑; 董宇
Original assignee: Alibaba Singapore Holdings Pte Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2023-10-03
Anticipated expiration: 2041-07-02
Also published as: CN113656437A

Abstract

The present disclosure provides a model construction method for predicting stability of a reference execution cost, where basic information of the same or similar query steps has the same features, so that the features of the query steps are identified, and a correspondence between the features of the query steps and the reference execution cost is constructed. And updating the reference execution cost of the query step characteristics: the method comprises the steps of obtaining execution result information of any query step completed by a database execution engine, determining characteristics of the query step according to basic information of the query step, determining actual execution cost of the query step information according to the execution result information, and taking the actual execution cost as reference execution cost corresponding to the characteristics of the query step. Thus, when the optimizer engine needs to evaluate the execution cost of a query step, the corresponding reference execution cost can be queried through the characteristics of the query step.

Description

Model building method for predicting reference execution cost stability

技术领域Technical field

本说明书一个或多个实施例涉及数据库技术领域，尤其涉及一种用于预测参照执行代价稳定度的模型构建方法。One or more embodiments of this specification relate to the field of database technology, and in particular, to a model construction method for predicting reference execution cost stability.

背景技术Background technique

数据库一般包括解析器引擎、优化器引擎以及执行器引擎，数据库接收到针对某个数据的查询请求后，由解析器引擎解析为一个查询计划，并发送至优化器引擎，优化器引擎对查询计划进行优化后，将优化后的查询计划发送至执行引擎，执行器引擎按照接收的查询计划完成对该数据的查询，得到查询结果。The database generally includes a parser engine, an optimizer engine and an executor engine. After the database receives a query request for a certain data, the parser engine parses it into a query plan and sends it to the optimizer engine. The optimizer engine analyzes the query plan. After optimization, the optimized query plan is sent to the execution engine. The executor engine completes the query on the data according to the received query plan and obtains the query results.

优化器引擎接收到查询计划后，会对查询计划进行查询步骤分解，确定每个查询步骤是否可以优化，具体而言，当确定某个查询步骤有其他查询步骤可替代，通过评估这些查询步骤的执行代价，确定选择哪一个查询步骤更合适，例如，某个查询涉及三个表，三个表的连接顺序不同，实际执行的代价也不同，那么通过评估每种顺序的执行代价，确定表的连接顺序。After the optimizer engine receives the query plan, it will decompose the query plan into query steps and determine whether each query step can be optimized. Specifically, when it is determined that a query step can be replaced by other query steps, it will evaluate the performance of these query steps. Execution cost, determine which query step is more appropriate. For example, if a query involves three tables, the connection order of the three tables is different, and the actual execution cost is also different, then by evaluating the execution cost of each order, determine the table Connection order.

现有技术中，优化器引擎一般通过表的基本统计信息，评估各个查询步骤的执行代价信息，其中，基本统计信息是保存在数据库的系统表中，包括表的个数，每个表的行数等基本统计信息，然而，基本统计信息需要遍历各个表来进行统计的，因此，为了不浪费查询资源，一般是在数据库空闲或存在大量数据更改时，对基本统计信息进行更新，这使得在有些情况下，尽管数据库实际内容发生了变化，但是基本统计信息并不会及时更新，进而会影响优化结果的准确性。In the existing technology, the optimizer engine generally evaluates the execution cost information of each query step through the basic statistical information of the table. The basic statistical information is stored in the system table of the database, including the number of tables and the rows of each table. However, basic statistical information needs to be traversed through each table for statistics. Therefore, in order not to waste query resources, the basic statistical information is generally updated when the database is idle or there are a large number of data changes, which makes In some cases, although the actual contents of the database have changed, the basic statistical information will not be updated in time, which will affect the accuracy of the optimization results.

发明内容Contents of the invention

有鉴于此，本说明书一个或多个实施例提供参照执行代价确定方法。In view of this, one or more embodiments of this specification provide a reference execution cost determination method.

为实现上述目的，本说明书一个或多个实施例提供技术方案如下：To achieve the above objectives, one or more embodiments of this specification provide the following technical solutions:

根据本说明书一个或多个实施例的第一方面，提出了一种优化查询计划的方法，以查询步骤的特征为标识，构建查询步骤特征与参照执行代价的对应关系，所述方法包括：According to the first aspect of one or more embodiments of this specification, a method for optimizing a query plan is proposed. The characteristics of the query steps are used as identifiers to construct a corresponding relationship between the characteristics of the query steps and the reference execution cost. The method includes:

获取数据库执行引擎完成任一查询步骤后的执行结果信息；Obtain execution result information after the database execution engine completes any query step;

根据该查询步骤的基本信息确定该查询步骤的特征，并根据所述执行结果信息确定该查询步骤信息的实际执行代价；Determine the characteristics of the query step based on the basic information of the query step, and determine the actual execution cost of the query step information based on the execution result information;

根据该查询步骤的实际执行代价，对该查询步骤特征对应的参照执行代价进行更新；所述查询步骤特征与参照执行代价的对应关系，用于优化查询计划时评估任一查询步骤的代价。According to the actual execution cost of the query step, the reference execution cost corresponding to the query step feature is updated; the correspondence between the query step feature and the reference execution cost is used to evaluate the cost of any query step when optimizing the query plan.

根据本说明书一个或多个实施例的第二方面，提出了一种最优查询计划的确定方法，所述方法应用于数据库优化引擎，所述方法包括：According to the second aspect of one or more embodiments of this specification, a method for determining an optimal query plan is proposed. The method is applied to a database optimization engine. The method includes:

解析查询计划中的各个查询步骤；Parse each query step in the query plan;

针对任一查询步骤，确定该查询步骤的可替代查询步骤；For any query step, determine alternative query steps for the query step;

针对任一可替代查询步骤，确定该可替代查询步骤的特征，根据如上述的对应关系，查询与该可替代查询步骤对应的参照执行代价；For any alternative query step, determine the characteristics of the alternative query step, and query the reference execution cost corresponding to the alternative query step according to the above-mentioned correspondence;

选取各查询步骤的执行代价最小的可替代查询步骤，得到最优查询计划。Select the alternative query step with the smallest execution cost of each query step to obtain the optimal query plan.

根据本说明书一个或多个实施例的第三方面，提出了一种用于预测参照执行代价稳定度的模型构建方法，包括：According to the third aspect of one or more embodiments of this specification, a model construction method for predicting reference execution cost stability is proposed, including:

根据的预设的时长T1，获取时长T1的历史时段内执行完成的查询步骤数据样本集合，任一查询步骤数据样本包括：该查询步骤基本信息、该查询步骤对应的实际执行代价；According to the preset time period T1, obtain a data sample set of query steps completed during the historical period of time T1. The data sample of any query step includes: the basic information of the query step and the actual execution cost corresponding to the query step;

针对任一查询步骤数据样本，通过以下方式将该查询步骤数据样本转换为训练样本：For any query step data sample, convert the query step data sample into a training sample in the following way:

根据该查询步骤的基本信息，确定该查询步骤的特征以及完成时间戳，通过查询如上述的对应关系，得到该查询步骤评估时的参照执行代价；According to the basic information of the query step, determine the characteristics and completion timestamp of the query step, and obtain the reference execution cost when evaluating the query step by querying the corresponding relationship as above;

根据实际执行代价和参照执行代价，计算得到该查询步骤对应的参照执行代价稳定度；Based on the actual execution cost and the reference execution cost, the stability of the reference execution cost corresponding to the query step is calculated;

以该查询步骤特征为特征值、以该查询步骤的参照执行代价稳定度为标签，得到该查询步骤对应的训练样本；Using the characteristics of the query step as the feature value and the stability of the reference execution cost of the query step as the label, the training samples corresponding to the query step are obtained;

对所述数据样本集合遍历执行所述转换步骤，得到训练样本集合；Perform the conversion step traversally on the data sample set to obtain a training sample set;

利用所述训练样本集合训练得到模型，所述模型用于：根据任一查询步骤的特征，预测该查询步骤的参照执行代价稳定度。A model is obtained by training using the training sample set, and the model is used to predict the reference execution cost stability of any query step based on the characteristics of the query step.

根据本说明书一个或多个实施例的第四方面，提出了一种预测参照执行代价稳定度的方法，用于预测任一查询步骤对应的参照执行代价稳定度，包括：According to the fourth aspect of one or more embodiments of this specification, a method for predicting reference execution cost stability is proposed, which is used to predict the reference execution cost stability corresponding to any query step, including:

接收查询步骤，并确定所述查询步骤的特征；receiving a query step and determining characteristics of said query step;

将所述查询步骤的特征输入至用于预测参照执行代价稳定度的模型；所述模型使用如上述的构建方法构建得到；Input the characteristics of the query step into a model used to predict the stability of the reference execution cost; the model is constructed using the above construction method;

根据所述模型的输出，预测该查询步骤对应的参照执行代价稳定度。According to the output of the model, the reference execution cost stability corresponding to the query step is predicted.

根据本说明书一个或多个实施例的第五方面，提出了一种最优查询计划的确定方法，所述方法应用于优化器执行引擎，所述方法包括：According to the fifth aspect of one or more embodiments of this specification, a method for determining an optimal query plan is proposed. The method is applied to the optimizer execution engine. The method includes:

针对任一可替代查询步骤，利用上述的预测参照执行代价稳定度的方法，预测该可替代查询步骤对应的参照执行代价稳定度；若该可替代查询步骤对应的参照执行代价稳定度大于预设值，则确定该可替代查询步骤的特征，根据如上述的对应关系，查询与该可替代查询步骤对应的参照执行代价；For any alternative query step, use the above method of predicting reference execution cost stability to predict the reference execution cost stability corresponding to the alternative query step; if the reference execution cost stability corresponding to the alternative query step is greater than the preset value, then determine the characteristics of the alternative query step, and query the reference execution cost corresponding to the alternative query step according to the above-mentioned correspondence relationship;

在本说明书一个或多个实施例中，由于相同或相似的查询步骤的基本信息具有相同的特征，因此，以查询步骤的特征为标识，构建查询步骤特征与参照执行代价的对应关系。对查询步骤特征的参照执行代价的进行更新：通过获取数据库执行引擎完成任一查询步骤后的执行结果信息，并根据该查询步骤的基本信息确定该查询步骤的特征，根据执行结果信息确定该查询步骤信息的实际执行代价，然后将该实际执行代价作为该查询步骤特征对应的参照执行代价。如此，当数据库优化器引擎需要评估某查询步骤的执行代价时，就可通过该查询步骤的特征，查询对应的参照执行代价。In one or more embodiments of this specification, since the basic information of the same or similar query steps has the same characteristics, the characteristics of the query steps are used as identifiers to construct a corresponding relationship between the characteristics of the query steps and the reference execution cost. Update the reference execution cost of query step characteristics: obtain the execution result information after the database execution engine completes any query step, determine the characteristics of the query step based on the basic information of the query step, and determine the query based on the execution result information The actual execution cost of the step information is then used as the reference execution cost corresponding to the query step feature. In this way, when the database optimizer engine needs to evaluate the execution cost of a certain query step, it can query the corresponding reference execution cost through the characteristics of the query step.

在本说明一个或多个实施例中，优化器引擎在针对一个查询步骤进行执行代价评估时，参考执行代价相较于现有技术中的基本统计信息更加具体，更具有参考意义，且更新参考执行代价的时候，并不需要占用数据库的查询资源，减少了查询负载。In one or more embodiments of this description, when the optimizer engine evaluates the execution cost for a query step, the reference execution cost is more specific and has more reference significance than the basic statistical information in the prior art, and the reference is updated. When executing the cost, it does not need to occupy the query resources of the database, reducing the query load.

附图说明Description of the drawings

图1是一示例性实施例提供的一种优化查询计划的方法的流程示意图。Figure 1 is a schematic flowchart of a method for optimizing a query plan provided by an exemplary embodiment.

图2是一示例性实施例提供的一种查询计划各查询步骤关系的流程示意图。Figure 2 is a schematic flowchart of the relationship between each query step of a query plan provided by an exemplary embodiment.

图3是一示例性实施例提供的一种参照执行代价确定方法的流程示意图。FIG. 3 is a schematic flowchart of a method for determining a reference execution cost provided by an exemplary embodiment.

图4是一示例性实施例提供的一种用于预测参照执行代价稳定度的模型构建方法的流程示意图。FIG. 4 is a schematic flowchart of a model construction method for predicting reference execution cost stability provided by an exemplary embodiment.

图5是一示例性实施例提供的一种预测模型训练方法的流程示意图。Figure 5 is a schematic flowchart of a prediction model training method provided by an exemplary embodiment.

图6是一示例性实施例提供的一种参照执行代价稳定度预测方法的流程示意图。FIG. 6 is a schematic flowchart of a method for predicting execution cost stability according to an exemplary embodiment.

图7是一示例性实施例提供的另一种参照执行代价确定方法的流程示意图。FIG. 7 is a schematic flowchart of another reference execution cost determination method provided by an exemplary embodiment.

图8是一示例性实施例提供的一种参照执行代价确定方法的实施例框架示意图。FIG. 8 is a schematic diagram of an embodiment framework of a method for determining a reference execution cost provided by an exemplary embodiment.

图9是一示例性实施例提供的一种优化查询计划的装置的框图。Figure 9 is a block diagram of a device for optimizing a query plan provided in an exemplary embodiment.

图10是一示例性实施例提供的一种参照执行代价确定装置的框图。FIG. 10 is a block diagram of a device for determining a reference execution cost provided in an exemplary embodiment.

图11是一示例性实施例提供的一种用于预测参照执行代价稳定度的模型构建装置的框图。FIG. 11 is a block diagram of a model construction device for predicting reference execution cost stability provided by an exemplary embodiment.

图12是一示例性实施例提供的一种参照执行代价稳定度预测装置的框图。FIG. 12 is a block diagram of a device for predicting stability of reference execution costs provided by an exemplary embodiment.

图13是一示例性实施例提供的另一种参照执行代价确定装置的框图。FIG. 13 is a block diagram of another device for determining a reference execution cost provided by an exemplary embodiment.

图14是一示例性实施例提供的一种设备的结构示意图。Figure 14 is a schematic structural diagram of a device provided by an exemplary embodiment.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书一个或多个实施例相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本说明书一个或多个实施例的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of this specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of one or more embodiments of this specification as detailed in the appended claims.

需要说明的是：在其他实施例中并不一定按照本说明书示出和描述的顺序来执行相应方法的步骤。在一些其他实施例中，其方法所包括的步骤可以比本说明书所描述的更多或更少。此外，本说明书中所描述的单个步骤，在其他实施例中可能被分解为多个步骤进行描述；而本说明书中所描述的多个步骤，在其他实施例中也可能被合并为单个步骤进行描述。It should be noted that in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, methods may include more or fewer steps than described in this specification. In addition, a single step described in this specification may be broken down into multiple steps for description in other embodiments; and multiple steps described in this specification may also be combined into a single step in other embodiments. describe.

基本统计信息主要指描述数据库中表、索引的大小、规模、数据库分布状况等的一类信息，例如，表的行数、块数、平均每行的大小、索引字段的行数、不同值的大小等。其中，这些基本统计信息是需要遍历各个表才能获取的，遍历时必然需要频繁调用I/O进程访问这些表。然而在实际应用中，查询数据时，相关的计算并不是最耗时的，调用I/O进程访问各个表进行I/O才是比较耗时的部分，可见，I/O进程资源比较珍贵，因此，为了保证外部高效访问数据库，在外部频繁访问数据库时（数据库繁忙时），不会为统计基本统计信息浪费资源。Basic statistical information mainly refers to a type of information that describes the size, scale, database distribution, etc. of tables and indexes in the database, such as the number of rows and blocks in the table, the average size of each row, the number of rows in the index field, and the number of different values. size etc. Among them, these basic statistical information can only be obtained by traversing each table. During traversal, it is necessary to frequently call the I/O process to access these tables. However, in actual applications, when querying data, the relevant calculations are not the most time-consuming. Calling the I/O process to access each table for I/O is the more time-consuming part. It can be seen that the I/O process resources are relatively precious. Therefore, in order to ensure efficient external access to the database, resources will not be wasted for counting basic statistical information when the database is frequently accessed from the outside (when the database is busy).

一方面，优化器引擎依靠基本统计信息对查询计划进行代价评估，基本统计信息的准确性影响着优化的准确性，但在数据库繁忙时，基本统计信息难以更新，很难保证基本统计信息的准确性；另一方面，基本统计信息都是属于数据库较宏观的统计信息，评估的执行代价也是较宏观的执行代价。On the one hand, the optimizer engine relies on basic statistical information to evaluate the cost of the query plan. The accuracy of the basic statistical information affects the accuracy of the optimization. However, when the database is busy, the basic statistical information is difficult to update, and it is difficult to ensure the accuracy of the basic statistical information. nature; on the other hand, the basic statistical information belongs to the macroscopic statistical information of the database, and the execution cost of the evaluation is also the macroscopic execution cost.

发明人在实践中发现，在一些存在大量访问请求的场景中，查询数据及查询类型具有时间局部性以及空间局部性，以分析性数据库应用场景为例，会面临大量的select语句（查询语句），在一段时间内，可能会对数据库的某几个库或表进行频繁的访问，另外一段时间内，可能会对另外几个库或表进行频繁的访问，并且查询可能相同或相似。查询步骤是由多个查询步骤组成，例如，第一步先将A表与B表进行连接，第二步选择某个关键字不为空的行，而执行器引擎在执行查询计划时，是一步一步完成查询的，相应的，每执行完一个步骤，便会得到该步骤相关的执行过程信息，如读取了多少行，输出了多少行，耗费时长为多久等。因此，若数据库优化器引擎在评估某查询步骤的执行代价时，而近一段时间内有相同或相似的查询步骤执行完成过，那么优化器引擎如果直接使用近一段时间内相同或相似的查询步骤执行完成后的实际执行代价，来评估该完成该查询步骤的执行代价，显然会更准确。The inventor found in practice that in some scenarios where there are a large number of access requests, the query data and query types have temporal locality and spatial locality. Taking the analytical database application scenario as an example, you will face a large number of select statements (query statements) , In a period of time, certain libraries or tables of the database may be frequently accessed, and in another period of time, several other libraries or tables may be frequently accessed, and the queries may be the same or similar. The query step is composed of multiple query steps. For example, the first step is to connect table A and table B, and the second step is to select a row with a certain keyword that is not empty. When the executor engine executes the query plan, The query is completed step by step. Correspondingly, after each step is executed, the execution process information related to that step will be obtained, such as how many rows were read, how many rows were output, how long it took, etc. Therefore, if the database optimizer engine evaluates the execution cost of a certain query step, and the same or similar query steps have been completed in a recent period of time, then the optimizer engine will directly use the same or similar query steps in the recent period of time. It will obviously be more accurate to evaluate the execution cost of completing the query step based on the actual execution cost after the execution is completed.

基于此，本说明书提供了查询步骤特征的参照执行代价的更新方法，相同或相似的查询步骤的基本信息具有相同的特征，因此，以查询步骤的特征为标识，构建查询步骤特征与参照执行代价的对应关系。并对查询步骤特征的参照执行代价的进行更新：通过获取数据库执行引擎完成任一查询步骤后的执行结果信息，并根据该查询步骤的基本信息确定该查询步骤的特征，根据执行结果信息确定该查询步骤信息的实际执行代价，然后将该实际执行代价作为该查询步骤特征对应的参照执行代价。如此，当需要评估某查询步骤的执行代价时，就可通过该查询步骤的特征，查询对应的参照执行代价。Based on this, this specification provides a method for updating the reference execution cost of query step characteristics. The basic information of the same or similar query steps has the same characteristics. Therefore, using the characteristics of the query step as an identifier, the query step characteristics and the reference execution cost are constructed. corresponding relationship. And update the reference execution cost of the query step characteristics: obtain the execution result information after the database execution engine completes any query step, determine the characteristics of the query step based on the basic information of the query step, and determine the query step based on the execution result information. The actual execution cost of the query step information is then used as the reference execution cost corresponding to the query step characteristics. In this way, when it is necessary to evaluate the execution cost of a certain query step, the corresponding reference execution cost can be queried through the characteristics of the query step.

本说明书基于上述的思路，提供了优化查询计划的方法，最优查询计划的确定方法，用于预测参照执行代价稳定度的模型构建方法，预测参照执行代价稳定度的方法。Based on the above ideas, this manual provides methods for optimizing query plans, methods for determining optimal query plans, methods for building models for predicting the stability of reference execution costs, and methods for predicting stability of reference execution costs.

其中，优化查询计划的方法是利用了构建的查询步骤特征与参照执行代价的对应关系，评估任一查询步骤的代价其他方法所参照的对应关系都是依据该方法进行更新的。Among them, the method of optimizing the query plan is to use the corresponding relationship between the constructed query step characteristics and the reference execution cost. The corresponding relationship referenced by other methods to evaluate the cost of any query step is updated based on this method.

参照执行代价确定方法是根据查询步骤的特征，查询利用上述更新方法更新的对应关系，确定出查询步骤的参照执行代价。The reference execution cost determination method is to determine the reference execution cost of the query step based on the characteristics of the query step and query the corresponding relationship updated by the above update method.

在模型构建方法中，在构建训练样本时，需要根据样本查询步骤的特征，查询如上述的对应关系，然后利用构建的训练样本集合训练出于预测参照执行代价稳定度的模型。In the model construction method, when constructing training samples, it is necessary to query the above-mentioned corresponding relationships based on the characteristics of the sample query step, and then use the constructed training sample set to train a model for predicting reference execution cost stability.

预测参照执行代价稳定度的方法使用了模型构建方法构建的模型，在确定查询步骤的参照执行代价前，可以先预测一下当前参照执行代价的稳定性，稳定性越高，当前的参照执行代价越准确。The method of predicting the stability of the reference execution cost uses the model constructed by the model construction method. Before determining the reference execution cost of the query step, you can first predict the stability of the current reference execution cost. The higher the stability, the higher the current reference execution cost. precise.

下面对这些方法进行详细的说明。These methods are described in detail below.

首先对优化查询计划方法进行说明，如图1所示，为本说明说示出的优化查询计划方法的流程示意图。First, the method of optimizing the query plan is explained, as shown in Figure 1, which is a schematic flow chart of the method of optimizing the query plan shown in this explanation.

在执行该方法前，需要先以查询步骤的特征为标识，构建查询步骤特征与参照执行代价的对应关系。其中，查询步骤特征与参照执行代价的对应关系用于优化查询计划时，评估任一查询步骤的代价。Before executing this method, it is necessary to first use the characteristics of the query step as an identifier to construct a corresponding relationship between the characteristics of the query step and the reference execution cost. Among them, the correspondence between the query step characteristics and the reference execution cost is used to evaluate the cost of any query step when optimizing the query plan.

每个查询步骤都会有其基本信息，包括该查询步骤所属的查询计划，查询类型，所该查询步骤的表达式（包括涉及的运算符、数据库、表、列），该查询步骤的提交时间等，例如查询步骤为“table A inner join table B on A.key = B.key”，该查询步骤所属查询计划的ID为1（每个查询计划都有其唯一的标识），查询类型为JOIN类型，涉及的运算符为“=”，涉及有A表以及A表的key列，B表以及B表的key列。Each query step will have its basic information, including the query plan to which the query step belongs, the query type, the expression of the query step (including the operators, databases, tables, and columns involved), the submission time of the query step, etc. , for example, the query step is "table A inner join table B on A.key = B.key", the ID of the query plan to which this query step belongs is 1 (each query plan has its own unique identifier), and the query type is JOIN type , the operator involved is "=", involving table A and the key column of table A, table B and the key column of table B.

其中，相同或相似的查询步骤会有相同或相似的查询语句，即涉及数据库、表以及列相同，涉及的表达式相同或相似，因此，在实际应用中，可以将每个查询步骤涉及的数据库、表、列以及表达式作为查询步骤的特征，如List＜ExpressionType,List＜Symbol＞＞。Among them, the same or similar query steps will have the same or similar query statements, that is, the databases, tables, and columns involved are the same, and the expressions involved are the same or similar. Therefore, in practical applications, the databases involved in each query step can be , tables, columns and expressions as characteristics of query steps, such as List<ExpressionType, List<Symbol>>.

参考执行代价用于评估执行该查询步骤的执行代价，如需要输入多少行，输出多少行，耗费多少内存等。The reference execution cost is used to evaluate the execution cost of executing the query step, such as how many rows need to be input, how many rows are output, how much memory is consumed, etc.

步骤102、获取任一查询步骤的执行结果信息。Step 102: Obtain the execution result information of any query step.

执行器引擎接收到一个查询计划（也可称之为执行计划）后，会根据查询计划中的步骤执行这个查询计划，其中，每个查询步骤会有对应的若干个查询步骤，每个查询步骤都是有执行器引擎中的算子执行完成的（一个算子负责完成一个基本的数据处理逻辑，一组算子按照查询计划完成数据的一组查询步骤，换而言之，一个查询步骤至少由一个算子执行完成得到结果），每个算子执行完成后，都会有将自己执行时读取了多少行（rows），输出了多少行，总共读取了多少大小的数据（KB），总共输出了多少大小的数据，总耗时是多少，内存使用总和是多少进行上报。执行引擎一般将属于一个查询计划的各个查询步骤进行封装，作为一个查询计划的执行结果信息，即，一个查询计划的执行结果信息中，包括多个查询步骤的执行结果信息。因此，在实际应用中，可以以一个查询计划为单位，从执行引擎获取执行结果信息。After the executor engine receives a query plan (also called an execution plan), it will execute the query plan according to the steps in the query plan. Each query step will have several corresponding query steps. Each query step They are all executed by operators in the executor engine (an operator is responsible for completing a basic data processing logic, and a group of operators completes a set of query steps for the data according to the query plan. In other words, a query step is at least The result is obtained after the execution of an operator). After each operator is executed, it will show how many rows it read during its execution, how many rows it output, and how much data it read in total (KB). Report the total size of data output, the total time spent, and the total memory usage. The execution engine generally encapsulates each query step belonging to a query plan as execution result information of a query plan. That is, the execution result information of a query plan includes execution result information of multiple query steps. Therefore, in practical applications, execution result information can be obtained from the execution engine in units of one query plan.

例如，某数据库有三个表，nation、region、customer，某查询语句为：For example, a database has three tables, nation, region, and customer, and a certain query statement is:

SELECT count(*)SELECT count(*)

FROMFROM

nation, region, customernation, region, customer

WHEREWHERE

c_nationkey = n_nationkeyc_nationkey = n_nationkey

AND n_regionkey = r_regionkeyAND n_regionkey = r_regionkey

AND r_name = 'ASIA';AND r_name = 'ASIA';

优化引擎完成优化后，发往执行引擎的查询计划为：After the optimization engine completes optimization, the query plan sent to the execution engine is:

1、扫描表nation，找到列n_nationkey，n_regionkey。1. Scan the table nation and find the columns n_nationkey and n_regionkey.

2、扫描表region，找到列r_regionkey，并将r_name = 'ASIA'行筛选出来。2. Scan the table region, find the column r_regionkey, and filter out the r_name = 'ASIA' row.

3、把第一步与第二步得到的表以c_nationkey = n_nationkey进行内连接（innerjoin）。3. Innerjoin the tables obtained in the first and second steps with c_nationkey = n_nationkey.

4、扫描表customer，找到列c_nationkey。4. Scan the customer table and find column c_nationkey.

5、将第四步与第五步得到的表以n_regionkey = r_regionkey进行内连接（innerjoin）。5. Perform inner join (innerjoin) on the tables obtained in steps 4 and 5 with n_regionkey = r_regionkey.

6、计算第五步得到的表有多少行。6. Calculate how many rows the table obtained in step 5 has.

如图2所示，为上述查询计划执行的各个查询步骤的执行关系。例如步骤1、2可以同时进行，交由两个算子完成，分别得到结果，然后根据1、2步得到的结果，进行内连接，内连接这个步骤由另外一个算子执行完成，以此类推。As shown in Figure 2, it is the execution relationship of each query step executed by the above query plan. For example, steps 1 and 2 can be performed at the same time and handed over to two operators to complete the results. Then, based on the results obtained in steps 1 and 2, an inner join is performed. The inner join step is completed by another operator, and so on. .

实际应用中，数据库执行器引擎完成的查询计划数量级可能会很大，因此，在本说明说的一个或多个实施例中，可设置获取条件，例如，设置抽样比，数据库执行引擎每完成5个查询计划，获取其中一个查询计划的执行结果信息；或者设置获取周期，每隔1ms，获取一个查询计划的执行结果信息等；又或者接收到指定命令，开始获取各个查询计划的执行结果信息。In actual applications, the query plan completed by the database executor engine may be of a large order of magnitude. Therefore, in one or more embodiments of this description, acquisition conditions can be set, for example, a sampling ratio is set. Every time the database execution engine completes 5 query plan, obtain the execution result information of one of the query plans; or set the acquisition period, every 1ms, obtain the execution result information of a query plan, etc.; or receive the specified command, start to obtain the execution result information of each query plan.

步骤104、确定查询步骤的特征及实际执行代价。Step 104: Determine the characteristics and actual execution cost of the query step.

如上述，每个查询计划会有若干个查询步骤，每个查询步骤执行完成后，都会得到一个执行结果信息包括该查询步骤的执行过程，包括在执行时读取了多少行（rows），输出了多少行，总共读取了多少大小的数据（KB），总共输出了多少大小的数据，总耗时是多少，内存使用总和是多少等。As mentioned above, each query plan will have several query steps. After each query step is executed, an execution result information will be obtained including the execution process of the query step, including how many rows (rows) were read during execution, and the output How many rows have been read, the total size of data (KB) read, the total size of data output, the total time taken, the total memory usage, etc.

该查询步骤的执行过程可以反映执行该查询步骤耗费了多少代价，因此，通过获取该查询步骤的执行结果信息，选取一个或多个执行过程数据，作为执行该查询步骤的实际执行代价。The execution process of the query step can reflect the cost of executing the query step. Therefore, by obtaining the execution result information of the query step, one or more execution process data are selected as the actual execution cost of executing the query step.

步骤106、更新该查询步骤特征对应的参照执行代价。Step 106: Update the reference execution cost corresponding to the query step feature.

其中，根据步骤104确定的实际执行地代价，对该查询步骤特征对应的参照执行代价进行更新。Among them, the reference execution cost corresponding to the query step feature is updated according to the actual execution cost determined in step 104.

实际应用中，可以只更新查询步骤特征与参照执行代价的对应关系，如下表，为更新前的对应关系：In practical applications, you can only update the corresponding relationship between the query step characteristics and the reference execution cost. The following table shows the corresponding relationship before update:

当接收到一个查询步骤特征为特征A的查询步骤后，该查询步骤的实际执行代价为代价A2，那么更新之后，对应关系变为：After receiving a query step whose feature is feature A, the actual execution cost of the query step is cost A2, then after the update, the corresponding relationship becomes:

也可以在构建对应关系是，构建实际执行代价以及时间戳与所述查询步骤特征的对应关系，此时可按时间戳的时间顺序，依次保存每次获得该查询步骤特征的实际执行代价（即，也更新实际执行代价以及时间戳与所述查询步骤特征的对应关系），并将时间最新的对应的实际执行代价作为该查询步骤的参考实际执行代价，如下表，为更新前的对应关系：You can also construct the corresponding relationship between the actual execution cost and the corresponding relationship between the timestamp and the query step characteristics. At this time, the actual execution cost of each time the query step characteristics are obtained can be saved sequentially in the chronological order of the timestamp (i.e. , also update the actual execution cost and the corresponding relationship between the timestamp and the characteristics of the query step), and use the latest corresponding actual execution cost as the reference actual execution cost of the query step, as shown in the following table, which is the corresponding relationship before update:

当接收到一个查询步骤特征为特征A的查询步骤后，该查询步骤的实际执行代价为实际代价A3，时间戳为TimeA3，然后将该实际执行代价以及所述完成时间戳增加至该查询步骤特征对应的标识，那么更新之后，对应关系变为：After receiving a query step whose characteristic is characteristic A, the actual execution cost of the query step is the actual cost A3 and the timestamp is TimeA3, and then the actual execution cost and the completion timestamp are added to the query step characteristic. Corresponding identification, then after the update, the corresponding relationship becomes:

其中，每个查询步骤特征对应的参照执行代价的值为时间戳最近一个对应的实际执行代价的值，如表3中，特征A最近一个时间戳对应的实际执行代价为A2，因此，特征A对应的参照执行代价为参照代价A2（也就是说，参照代价A2=实际代价A2），特征B最近一个时间戳对应的实际执行代价为，2，因此，特征B对应的参照执行代价为参照代价B2，又如表4，特征A最近一个时间戳对应的实际参照代价为参照代价A3，因此，特征A对应的参照执行代价为参照代价A3。Among them, the value of the reference execution cost corresponding to the feature of each query step is the value of the actual execution cost corresponding to the latest timestamp. As shown in Table 3, the actual execution cost corresponding to the latest timestamp of feature A is A2. Therefore, feature A The corresponding reference execution cost is the reference cost A2 (that is, the reference cost A2 = the actual cost A2). The actual execution cost corresponding to the latest timestamp of feature B is, 2. Therefore, the reference execution cost corresponding to feature B is the reference cost. B2, as shown in Table 4, the actual reference cost corresponding to the latest timestamp of feature A is reference cost A3. Therefore, the reference execution cost corresponding to feature A is reference cost A3.

实际应用中，若不加以限制的保存每个查询步骤执行完成后的实际执行代价与时间戳，长时间积累下来，数据量会相当庞大，因此，在本说明书一个或多个实施例中，可设置保存条件，例如，若任一特征对应的实际执行代价超过20条，则删除前10条后再保存；或者若时间戳距现在的时间戳的时长超过4h，则删除对应的实际执行代价及对应的时间戳。In practical applications, if the actual execution cost and timestamp after the execution of each query step are saved without restrictions, the amount of data will be quite large if accumulated over a long period of time. Therefore, in one or more embodiments of this specification, it is possible to Set the saving conditions. For example, if the actual execution cost corresponding to any feature exceeds 20 items, delete the first 10 items before saving; or if the time stamp is more than 4 hours from the current timestamp, delete the corresponding actual execution cost and the corresponding timestamp.

本说明书示出的是将查询步骤特征与参照执行代价、实际执行代价、时间戳保存在一个表中，事实上，可将查询步骤特征与参照执行代价建立一个对应关系表，将查询步骤特征与实际执行代价、时间戳建立一个对应关系表。What this description shows is that the query step characteristics, the reference execution cost, the actual execution cost, and the timestamp are stored in a table. In fact, a correspondence table can be established between the query step characteristics and the reference execution cost, and the query step characteristics and the reference execution cost can be established in a table. A corresponding relationship table is established between the actual execution cost and timestamp.

实际应用中，一些情况下，确定的某查询步骤对应的查询步骤特征在对应关系里并没有查到（如，构建对应关系开始时，没有任何查询步骤特征，或者之前都没有执行过类似的查询步骤），此时可以在对应关系中增加该查询步骤特征的标识，并将该查询步骤对应的实际执行代价作为参照执行代价。In actual applications, in some cases, the query step characteristics corresponding to a certain query step are not found in the corresponding relationship (for example, when building the corresponding relationship, there are no query step characteristics, or similar queries have not been executed before. Step), at this time, you can add the identification of the query step characteristics in the corresponding relationship, and use the actual execution cost corresponding to the query step as the reference execution cost.

实际应用中，对应关系可以以表的形式进行存储，也可以以索引的方法进行保存，只需能依据查询步骤特找到对应的参照执行代价即可。存储时，可以如基本统计信息一样，存储至数据库的系统表，也可以存储至内存，还可以存储至硬盘等。In practical applications, the corresponding relationship can be stored in the form of a table or saved as an index, as long as the corresponding reference execution cost can be found according to the query steps. When storing, it can be stored in the system table of the database like basic statistical information, or it can be stored in the memory, or it can also be stored in the hard disk, etc.

以上是对优化查询计划方法的说明，构建并更新查询步骤特征与参照执行代价的对应关系，相对于基本统计信息，参考意义更具体，例如，在进行表连接的时候，A表有20行，B表有10行，那么进行内连接后，最多有10行，优化器引擎在进行评估时，也只会评估为10行或者一个大致的数字。而参考执行代价，是利用A表与B表进行内连接后的实际执行结果（即，实际执行代价）获得的，比如A表与B表实际进行内连接后是4行，那么之后评估A表与B表内连接的执行代价后，即可确定A表与B表内连接后会输出4行。并且，由于获取的是执行器引擎执行完任一查询步骤后的执行结果信息（即，获取的是执行引擎原本就需执行的查询步骤的执行结果信息），并不会占用数据库的查询资源，并不会数据库造成查询负载（不会进行与业务无关的查询）。The above is an explanation of the method of optimizing the query plan. It constructs and updates the corresponding relationship between the query step characteristics and the reference execution cost. Compared with the basic statistical information, the reference meaning is more specific. For example, when performing table connection, table A has 20 rows. Table B has 10 rows, so after performing the inner join, there will be at most 10 rows. When the optimizer engine performs evaluation, it will only evaluate it to 10 rows or a rough number. The reference execution cost is obtained by using the actual execution result (that is, the actual execution cost) after the inner connection between table A and table B. For example, after the actual inner connection between table A and table B, there are 4 rows, then table A will be evaluated later. After calculating the execution cost of the join within table B, it can be determined that 4 rows will be output after the join between table A and table B. Moreover, since what is obtained is the execution result information after the executor engine executes any query step (that is, what is obtained is the execution result information of the query step that the execution engine originally needs to execute), it will not occupy the query resources of the database. It will not cause query load to the database (no queries unrelated to the business will be performed).

下面对最优查询计划的确定方法进行说明，其中，该方法使用了如上述更新方法更新的查询步骤特征与参照执行代价的对应关系。上述的更新方法提高了查询步骤特征的参考执行代价的准确性（相对于基本统计信息，参考意义更具体且更新速度快，因此更准确），因此，无论是技术人员获取各查询步骤的参考执行代价确定数据库运行状况，还是优化器引擎利用参考执行代价生成最优查询计划，参考执行代价的参考意义都比基本统计信息高。The method for determining the optimal query plan is described below. This method uses the correspondence between the query step characteristics updated by the above update method and the reference execution cost. The above update method improves the accuracy of the reference execution cost of query step features (compared to basic statistical information, the reference meaning is more specific and the update speed is faster, so it is more accurate). Therefore, whether technical personnel obtain the reference execution cost of each query step Whether the cost determines the database running status or the optimizer engine uses the reference execution cost to generate the optimal query plan, the reference significance of the reference execution cost is higher than the basic statistical information.

如图3所示，为本说明书示出的一种最优查询计划的确定方法的流程示意图，该方法应用于优化器引擎，包括以下步骤：As shown in Figure 3, it is a schematic flow chart of a method for determining the optimal query plan shown in this specification. This method is applied to the optimizer engine and includes the following steps:

步骤302、解析查询计划中的各个查询步骤。Step 302: Analyze each query step in the query plan.

优化器引擎接收到一个查询计划后，会解析出查询计划中的各个查询步骤，形如树状。After the optimizer engine receives a query plan, it will parse out each query step in the query plan, shaped like a tree.

步骤304、针对任一查询步骤，确定该查询步骤的可替代查询步骤。Step 304: For any query step, determine alternative query steps for the query step.

一些查询步骤是具有可替代查询步骤的，如上述，两个表进行连接的时候，可以左连接，也可以右连接等。Some query steps have alternative query steps. As mentioned above, when two tables are connected, they can be left joined or right joined, etc.

步骤306、针对任一可替代查询步骤，确定该可替代查询步骤的特征，根据如上述的对应关系，查询与该可替代查询步骤对应的参照执行代价。Step 306: For any alternative query step, determine the characteristics of the alternative query step, and query the reference execution cost corresponding to the alternative query step according to the above-mentioned correspondence relationship.

优化器引擎优化各个查询步骤时，是基于代价的，因此，需要评估执行每个查询步骤的代价，以选出执行代价最小的可替代查询步骤。When the optimizer engine optimizes each query step, it is based on cost. Therefore, the cost of executing each query step needs to be evaluated to select the alternative query step with the smallest execution cost.

本说明书中，优化器引擎优化评估执行每个查询步骤的代价，是查询如上述的查询步骤特征与参照执行代价的对应关系，得到各个查询步骤的执行代价。In this specification, the optimizer engine optimizes and evaluates the cost of executing each query step by querying the correspondence between the above query step characteristics and the reference execution cost to obtain the execution cost of each query step.

其中，查询步骤与参照执行代价的对应关系为如上述更新方法得到的对应关系。Among them, the corresponding relationship between the query step and the reference execution cost is the corresponding relationship obtained by the above update method.

步骤308、选取各查询步骤的执行代价最小的可替代查询步骤，确定最优查询计划。Step 308: Select the alternative query step with the smallest execution cost of each query step to determine the optimal query plan.

实际应用中，当需要确定某查询步骤的参考执行代价时，会发现该查询步骤的查询步骤特征在对应关系里并没有（也就是说之前没有类似的查询步骤执行过），此时，可参考基本统计信息作为评估依据，如此，该查询步骤执行完成后，便可得到实际执行代价，作为后续相同或相似的查询步骤的参考执行代价。In practical applications, when it is necessary to determine the reference execution cost of a certain query step, it will be found that the query step characteristics of the query step do not exist in the corresponding relationship (that is, no similar query step has been executed before). In this case, you can refer to Basic statistical information is used as the basis for evaluation. In this way, after the query step is executed, the actual execution cost can be obtained, which can be used as the reference execution cost for subsequent identical or similar query steps.

使用如上述的优化查询计划方法以及确定方法，可以很大程度的提高优化器引擎在评估各查询步骤时的准确性，但在一些情况下，数据变化快，使得实际执行代价前后差异也较大，基于此，本说明书提供了用于预测参照执行代价稳定度的模型构建方法。Using the above optimized query plan method and determination method can greatly improve the accuracy of the optimizer engine in evaluating each query step. However, in some cases, the data changes rapidly, making the actual execution cost significantly different. , Based on this, this specification provides a model construction method for predicting the stability of the reference execution cost.

下面对用于预测参照执行代价稳定度的模型构建方法进行详细的说明，如图4所示，为本说明书示出的用于预测参照执行代价稳定度的模型构建方法的流程示意图，包括以下步骤：The following is a detailed description of the model construction method for predicting the stability of the reference execution cost. As shown in Figure 4, it is a schematic flow chart of the model construction method for predicting the stability of the reference execution cost shown in this specification, including the following step:

步骤402、获取查询步骤数据样本。Step 402: Obtain the query step data sample.

获取数据样本时，需要根据的预设的时长T1，获取时长T1的历史时段内执行完成的查询步骤数据样本集合，其中，任一查询步骤数据样本包括：该查询步骤基本信息、该查询步骤对应的实际执行代价。When obtaining data samples, it is necessary to obtain a data sample set of query steps completed within the historical period of time T1 based on the preset time length T1. Among them, the data sample of any query step includes: the basic information of the query step, the corresponding query step actual execution cost.

由于数据变化是有时间局部性与空间局部性，所以，构建模型所选取的查询步骤数据样本需要是近一段时间内的样本，比如近30mis以内等。对于很久之前的查询数据样本，对模型构建的意义不大，例如，6:00-6:10期间某些有大量的数据改动操作，使得部分查询步骤的参考执行代价非常不稳定，若使用6:00-6:10这期间查询步骤数据样本构建模型，则预测一些查询步骤的参照执行代价时，会得到不稳定的预测结果。如果16:00-16:10期间数据改动并不多，也就是说各查询步骤的参考执行代价是稳定的，如果使用6:00-6:10构建的模型，会得到错误的预测结果。因此，在构建模型时，需要获取时长T1的历史时段内执行完成的查询步骤数据样本集合。Since data changes are localized in time and space, the data samples selected for the query step to build the model need to be samples within a recent period of time, such as within the past 30ms. For query data samples long ago, it is of little significance to model construction. For example, there are a large number of data modification operations during 6:00-6:10, making the reference execution cost of some query steps very unstable. If 6 is used :00-6:10 During this period, query step data samples are used to build a model. When predicting the reference execution cost of some query steps, unstable prediction results will be obtained. If there are not many data changes between 16:00 and 16:10, which means that the reference execution cost of each query step is stable, if you use the model built from 6:00 to 6:10, you will get wrong prediction results. Therefore, when building the model, it is necessary to obtain a collection of data samples of query steps completed within the historical period of time T1.

步骤404、构建训练样本集合。Step 404: Construct a training sample set.

其中，针对任一查询步骤数据样本，通过以下方式将该查询步骤数据样本转换为训练样本：Among them, for any query step data sample, the query step data sample is converted into a training sample in the following way:

根据该查询步骤的基本信息，确定该查询步骤的特征以及完成时间戳，通过查询如表3、表4所示的对应关系，得到该查询步骤评估时的参照执行代价；According to the basic information of the query step, determine the characteristics and completion timestamp of the query step, and obtain the reference execution cost when evaluating the query step by querying the corresponding relationships shown in Table 3 and Table 4;

对所述数据样本集合遍历执行所述转换步骤，得到训练样本集合。The conversion step is traversed and performed on the data sample set to obtain a training sample set.

其中，参照执行代价是随着时间更新的，因此，确定每个查询步骤的参照执行代价时，需要确定该查询步骤的时间，时间不同，评估执行代价时的参照执行代价不同，如表4所示，假设某查询步骤的完成时间戳为Time5，那么该查询步骤在评估执行代价时，使用的是Time2对应的数据集2。Among them, the reference execution cost is updated with time. Therefore, when determining the reference execution cost of each query step, it is necessary to determine the time of the query step. The time is different, and the reference execution cost when evaluating the execution cost is different, as shown in Table 4. As shown, assuming that the completion timestamp of a certain query step is Time5, then when evaluating the execution cost of this query step, the data set 2 corresponding to Time2 is used.

如图5所示，为本说明书示出的一种预测模型训练方法的流程示意图，任一查询步骤数据样本包括该查询步骤的基本信息以及该步骤执行完成后的实际执行代价，通过该查询步骤的基本信息，提取出该查询步骤的特征，通过该查询步骤的执行结果信息，确定实际执行该查询步骤的实际执行代价。As shown in Figure 5, it is a flow chart of a prediction model training method shown in this specification. The data sample of any query step includes the basic information of the query step and the actual execution cost after the step is completed. Through the query step Based on the basic information, the characteristics of the query step are extracted, and the actual execution cost of actually executing the query step is determined through the execution result information of the query step.

以执行代价为输入行数与输出行数为例，令StatsDiff = （PlanNodeStates-OperatorStates）/Max（PlanNodeStates，OperatorStates），其中，StatsDiff用于表征实际执行代价与参照执行代价的不同度，PlanNodeStates表示获取的参照执行代价，若以1-StatsDiff为标签，那么标签越接近于1，参照执行代价越稳定，越接近于0，参照执行代价越不稳定。Taking the execution cost as the number of input rows and the number of output rows as an example, let StatsDiff = (PlanNodeStates-OperatorStates)/Max (PlanNodeStates, OperatorStates), where StatsDiff is used to represent the difference between the actual execution cost and the reference execution cost, and PlanNodeStates represents the acquisition The reference execution cost of , if 1-StatsDiff is used as the label, then the closer the label is to 1, the more stable the reference execution cost is, and the closer it is to 0, the more unstable the reference execution cost is.

此外，在本说明书一个或多个实施例中，可以在每个查询步骤完成时，既保存该查询步骤在评估时的参照执行代价以及该查询步骤的实际执行代价，作为一个查询步骤数据样本。In addition, in one or more embodiments of this specification, when each query step is completed, both the reference execution cost of the query step during evaluation and the actual execution cost of the query step can be saved as a query step data sample.

步骤406、训练用于预测任一查询步骤的参考执行代价稳定度的模型。Step 406: Train a model for predicting the stability of the reference execution cost of any query step.

利用构建好的训练样本集合，训练用于预测任一查询步骤的参考执行代价稳定度的模型。其中，可基于不同算法构建如上述用于预测参照执行代价稳定度的模型，如可以基于支持向量机的方式构建模型，基于神经网络的方式构建模型，基于随机森林的方式构建模型等，构建的模型能够实现本说明书所示的目的即可。Use the constructed training sample set to train a model for predicting the stability of the reference execution cost of any query step. Among them, the above-mentioned models for predicting reference execution cost stability can be constructed based on different algorithms. For example, the model can be constructed based on support vector machine, the model can be constructed based on neural network, the model can be constructed based on random forest, etc. It is sufficient that the model can achieve the purpose shown in this manual.

以随机森林为例：Take random forest as an example:

假设构建的训练样本集合的样本数量为N，对于任一颗树，随机且有放回地从训练样本集合中抽取n（n＜N）个训练样本，作为该树的训练样本集合。Assume that the number of samples in the constructed training sample set is N. For any tree, n (n < N) training samples are randomly selected from the training sample set with replacement as the training sample set of the tree.

如果每个样本的特征维度为M，指定一个常数m＜＜M，随机地从M个特征猴子那个选取m个特征子集，每次树进行分裂是，从这m个特征汇总选择最优的。If the feature dimension of each sample is M, specify a constant m<<M, and randomly select m feature subsets from the M feature monkeys. Each time the tree is split, the optimal one is selected from these m feature summaries. .

每棵树尽最大程度的生长，不对其进行剪枝过程。Each tree is grown to its maximum extent without any pruning process.

实际应用中，可在预设条件下进行模型的构建，例如，预设条件可以是在预设周期的时间点，进行模型的构建，每经过一个时间周期，构建一次预测模型。还可以是构建时长，当确定该模型已长时间未更新了，为保证模型预测的准确性，需要对模型进行更新，即，重新构建。In practical applications, the model can be constructed under preset conditions. For example, the preset condition can be that the model is constructed at a time point in a preset period, and a prediction model is constructed once every time period passes. It can also be the construction time. When it is determined that the model has not been updated for a long time, in order to ensure the accuracy of model prediction, the model needs to be updated, that is, rebuilt.

以上是对用于预测参照执行代价稳定度的模型构建方法的说明，基于上述的模型构建方法，本说明还提供了相应的模型使用方法，即，预测参考执行代价稳定度的方法。接下来对预测参考执行代价稳定度的方法进行详细的说明。The above is an explanation of the model construction method for predicting the stability of the reference execution cost. Based on the above model construction method, this explanation also provides a corresponding method of using the model, that is, a method of predicting the stability of the reference execution cost. Next, the method of predicting the stability of the reference execution cost is explained in detail.

实际应用中，数据库的一些表、列的数据会发生急剧变化，而涉及这些表或列的查询步骤的参照执行代价也会发生很大的改变。由于数据快速急剧的变化，使得此次执行完后的实际执行代价与上一次执行完成后的实际执行代价也相差甚远。In actual applications, the data of some tables and columns in the database will change drastically, and the reference execution cost of query steps involving these tables or columns will also change greatly. Due to the rapid and drastic changes in data, the actual execution cost after this execution is also far different from the actual execution cost after the last execution.

如图6所示，为本说明示出的预测参照执行代价稳定度的方法的流程示意图，包括以下步骤：As shown in Figure 6, a schematic flowchart of the method for predicting reference execution cost stability shown in this description includes the following steps:

步骤602、确定查询步骤的特征。Step 602: Determine the characteristics of the query step.

其中，接收的查询步骤可以是由优化器引擎发送来的，也可是测试人员输入用来测试的，根据实际应用场景而定。Among them, the received query steps can be sent by the optimizer engine, or they can be input by testers for testing, depending on the actual application scenario.

查询步骤的特征是基于查询步骤的基本信息确定的，与上述确定查询步骤特征的方法相同。The characteristics of the query step are determined based on the basic information of the query step, which is the same as the method of determining the characteristics of the query step mentioned above.

步骤604、将查询步骤的特征输入至模型。Step 604: Input the features of the query step into the model.

其中，预测模型使用如上述的用于预测参照执行代价稳定度的模型构建方法，此处不进行详细说明。The prediction model uses the above-mentioned model construction method for predicting reference execution cost stability, which will not be described in detail here.

步骤606、根据模型的输出，预测所述查询步骤的参照执行代价稳定度。Step 606: According to the output of the model, predict the stability of the reference execution cost of the query step.

由于构建预测模型的数据样本是预设历史时长内的查询步骤，若数据库近一段历史时间内数据稳定，那么各查询步骤的参照执行代价也是稳定的，使用模型预测的未来近一段时间内的参照执行代价也是稳定的。若数据库近一段历史时间内数据不稳定，那么各查询步骤的参照执行代价也是不稳定的，使用模型预测的未来近一段时间内的参照执行代价也是不稳定的。Since the data samples used to build the prediction model are query steps within a preset historical period, if the database data is stable in the recent historical period, then the reference execution cost of each query step will also be stable. Use the model to predict the reference in the near future. The execution cost is also stable. If the database data in the recent historical period is unstable, then the reference execution cost of each query step is also unstable, and the reference execution cost predicted by the model in the near future is also unstable.

在本说明书一个或多个实施例中，在使用预测模型进行预测时，可以确定预测模型的构建时长，若该预测模型构建时长很久了，说明该模型长时间没有进行更新，此时使用该模型预测参照执行代价稳定度，得到的结果并不具有很大的参考意义，因此，在确定预测模型的构建时长超过预设时长时，可对预测模型进行更新。In one or more embodiments of this specification, when using a prediction model for prediction, the construction time of the prediction model can be determined. If the prediction model has been constructed for a long time, it means that the model has not been updated for a long time. At this time, the model is used The prediction refers to the execution cost stability, and the results obtained do not have great reference significance. Therefore, when it is determined that the construction time of the prediction model exceeds the preset time, the prediction model can be updated.

以上是对预测参考执行代价稳定度的方法的说明，实际应用中，在利用上述的参照执行代价确定方法确定参照执行代价前，还可以使用上述的预测参照执行代价稳定度的方法预测参照执行代价是否稳定，基于此，本说明书还提供了一种参照执行代价确定方法，接下来对其进行详细的说明。The above is an explanation of the method of predicting the stability of the reference execution cost. In practical applications, before using the above-mentioned method of determining the reference execution cost to determine the reference execution cost, the above-mentioned method of predicting the stability of the reference execution cost can also be used to predict the reference execution cost. Whether it is stable or not, based on this, this specification also provides a method for determining the execution cost by reference, which will be described in detail next.

如图7所示，为本说明书示出的另一种最优查询计划的确定方法，该方法应用于优化器执行引擎，包括以下步骤：As shown in Figure 7, it is another method of determining the optimal query plan shown in this specification. This method is applied to the optimizer execution engine and includes the following steps:

步骤702、解析查询计划中的各个查询步骤。Step 702: Analyze each query step in the query plan.

步骤704、针对任一查询步骤，确定该查询步骤的可替代查询步骤。Step 704: For any query step, determine alternative query steps for the query step.

步骤706、针对任一可替代查询步骤，利用如上述的预测参照执行代价稳定度的方法，预测该可替代查询步骤对应的参照执行代价稳定度；若该可替代查询步骤对应的参照执行代价稳定度大于预设值，则确定该可替代查询步骤的特征，根据如上述的对应关系，查询与该可替代查询步骤对应的参照执行代价。Step 706: For any alternative query step, use the method of predicting reference execution cost stability as described above to predict the reference execution cost stability corresponding to the alternative query step; if the reference execution cost corresponding to the alternative query step is stable If the degree is greater than the preset value, then the characteristics of the alternative query step are determined, and based on the above-mentioned corresponding relationship, the reference execution cost corresponding to the alternative query step is queried.

步骤708、选取各查询步骤的执行代价最小的可替代查询步骤，确定最优查询计划。Step 708: Select the alternative query step with the smallest execution cost of each query step to determine the optimal query plan.

若稳定度大于预设值，说明近一段历史时间内，该查询步骤特征对应的参考执行代价比较稳定，因此，此时保存的参照执行代价可信度高（准确性高），可直接查询获取。If the stability is greater than the preset value, it means that the reference execution cost corresponding to the query step feature is relatively stable in the recent historical period. Therefore, the reference execution cost saved at this time has high credibility (high accuracy) and can be directly queried and obtained. .

其中，获取所述查询步骤的参考执行代价是，使用如上述的参考执行代价确定方法。Wherein, obtaining the reference execution cost of the query step is to use the reference execution cost determination method as mentioned above.

若稳定度小于预测值，说明近一段历史时间内，该查询步骤特征对应的参考执行代价比较不稳定，因此，此时保存的参照执行代价可信度不高（准确性低），参考意义不大。If the stability is less than the predicted value, it means that the reference execution cost corresponding to the query step feature is relatively unstable in the recent historical period. Therefore, the credibility of the saved reference execution cost at this time is not high (low accuracy) and the reference significance is not high. big.

实际应用中，若稳定度小于预设值时，则更新用于预测参照执行代价稳定度的模型。In actual applications, if the stability is less than the preset value, the model used to predict the stability of the reference execution cost is updated.

在本说明书一个或多个实施例中，用于预测参照执行代价稳定度的模型的预测结果与构建模型使用的样本数据直接相关，若构建预测模型的训练样本是不稳定的，那么得到的预测结果也是不稳定的，若构建预测模型的训练样本是稳定的，那么得到的预测结果也是稳定的。因此，在确定稳定度小于时，说明涉及该查询步骤的数据是不稳定的，等涉及该查询步骤的数据稳定了，然后利用稳定后的查询步骤数据样本构建模型，那么使用更新后的模型预测该查询步骤的参考执行代价也是稳定的。In one or more embodiments of this specification, the prediction results of the model used to predict the stability of the reference execution cost are directly related to the sample data used to build the model. If the training samples used to build the prediction model are unstable, then the obtained prediction The results are also unstable. If the training samples used to build the prediction model are stable, then the prediction results obtained will also be stable. Therefore, when the stability is determined to be less than The reference execution cost of this query step is also stable.

此外，实际应用中，优化器引擎在针对任一可替代查询步骤，确定参照执行代价和预测参考执行代价稳定度的先后顺序可以不是一定的，可以先获取参考执行代价，然后预测参照执行代价的稳定度，由优化器引擎确定是否需要获取的参考执行代价。In addition, in actual applications, the order in which the optimizer engine determines the reference execution cost and predicts the stability of the reference execution cost for any alternative query step may not be certain. It can first obtain the reference execution cost, and then predict the reference execution cost. Stability, the reference execution cost determined by the optimizer engine whether it needs to be obtained.

图8为本说明书示出的一种参照执行代价确定方法的实施例框架示意图，接下来对其进行详细的说明。FIG. 8 is a schematic diagram of an embodiment of a reference execution cost determination method shown in this specification, which will be described in detail next.

数据采集模块负责在数据库执行器引擎采集数据，包括任一查询步骤的执行结果信息，可以以[QueryId，TimeStamp，PlanInfo，ExecutionInfo]的形式进行组织，其中QueryId为每个查询计划的唯一标识，TimeStamp为查询提交的时间戳，PlanInfo为该查询计划的计划信息，包括多个PlanNodeInfo（查询步骤信息），ExecutionInfo为执行结果信息，包括各个查询步骤执行完成后的执行结果信息。The data collection module is responsible for collecting data in the database executor engine, including the execution result information of any query step, which can be organized in the form of [QueryId, TimeStamp, PlanInfo, ExecutionInfo], where QueryId is the unique identifier of each query plan, and TimeStamp is the timestamp of query submission, PlanInfo is the plan information of the query plan, including multiple PlanNodeInfo (query step information), and ExecutionInfo is the execution result information, including the execution result information after each query step is completed.

其中，每个PlanNodeInfo该查询步骤的唯一标识PlanNodeId，该查询步骤的类型PlanNodeType，以及该查询步骤的参照执行代价PlanNodeStats，该查询步骤的表达式列表List＜ExpressionType, List＜Symbol＞＞。以参照执行代价为输入行数以及输出行数为例，PlanNodeStats=[InputRowCount，OutputRowCount]，其中以输入数据行数InputRowCount，输出数据行数OutputRowCount为例。Among them, each PlanNodeInfo has the unique identifier PlanNodeId of the query step, the type of the query step PlanNodeType, the reference execution cost of the query step PlanNodeStats, the expression list of the query step List<ExpressionType, List<Symbol>>. Taking the reference execution cost as the number of input rows and the number of output rows as an example, PlanNodeStats=[InputRowCount, OutputRowCount], where the number of input data rows InputRowCount and the number of output data rows OutputRowCount are taken as an example.

上述的Symbol可以包含SchemaId, TableId, ColumnId，分别为数据库、表、列的唯一标识符,ExpressionType为数据库支持的各类表达式运算符。The above Symbol can include SchemaId, TableId, and ColumnId, which are the unique identifiers of the database, table, and column respectively. ExpressionType is the various expression operators supported by the database.

ExecutionInfo包括每个查询步骤完成后的实际执行代价，与参照实行代价相对应，实际执行代价OperatorStats=[InputRowCount, OutputRowCount]。ExecutionInfo includes the actual execution cost after each query step is completed, corresponding to the reference execution cost, the actual execution cost OperatorStats=[InputRowCount, OutputRowCount].

数据存储模块对数据采集模块采集的数据进行存储，以每个查询步骤的PlanNodeId为唯一标识进行存储，并以[QueryId, TimeStamp, PlanNodeType, List＜ExpressionType, List＜Symbol＞＞, PlanNodeStats, OperatorStats]为结构进行存储。The data storage module stores the data collected by the data collection module, using the PlanNodeId of each query step as the unique identifier, and uses [QueryId, TimeStamp, PlanNodeType, List＜ExpressionType, List＜Symbol＞＞, PlanNodeStats, OperatorStats] as structure for storage.

智能决策模块可以控制数据采集模块的数据采集（是否采集以及采集抽样比），还可以利用数据存储模块存储的数据进行模型训练（其中，模型需要一直更新，以确保预测的准确性），数据库优化器引擎向智能决策模块查询任一查询步骤的参照执行代价时，一方面，智能决策模块从数据存储模块找到该查询步骤对应的参照执行代价，另一方面智能决策模块利用模型预测该查询步骤的参照执行代价是否稳定。The intelligent decision-making module can control the data collection of the data collection module (whether to collect and the sampling ratio), and can also use the data stored in the data storage module for model training (where the model needs to be updated all the time to ensure the accuracy of prediction), database optimization When the server engine queries the intelligent decision-making module for the reference execution cost of any query step, on the one hand, the intelligent decision-making module finds the reference execution cost corresponding to the query step from the data storage module, and on the other hand, the intelligent decision-making module uses the model to predict the cost of the query step. Refer to whether the execution cost is stable.

本说明书还提供了上述方法对应的装置、电子设备及计算机可读存储介质，以下是对上述方法对应的装置、设备及存储介质的说明。This specification also provides devices, electronic equipment, and computer-readable storage media corresponding to the above methods. The following is a description of the devices, equipment, and storage media corresponding to the above methods.

本说明书提供了一种优化查询计划装置，如图9所示，以查询步骤的特征为标识，构建查询步骤特征与参照执行代价的对应关系，所述方法包括：This specification provides an optimized query plan device. As shown in Figure 9, the characteristics of the query steps are used as identifiers to construct a corresponding relationship between the characteristics of the query steps and the reference execution cost. The method includes:

执行结果信息获取模型902，用于获取数据库执行引擎完成任一查询步骤后的执行结果信息；Execution result information acquisition model 902 is used to obtain execution result information after the database execution engine completes any query step;

特征及实际执行代价确定模块904，用于根据该查询步骤的基本信息确定该查询步骤的特征，并根据执行结果信息确定该查询步骤信息的实际执行代价；The characteristic and actual execution cost determination module 904 is used to determine the characteristics of the query step based on the basic information of the query step, and determine the actual execution cost of the query step information based on the execution result information;

更新模块906，用于根据该查询步骤的实际执行代价，对该查询步骤特征对应的参照执行代价进行更新；查询步骤特征与参照执行代价的对应关系，用于优化查询计划时评估任一查询步骤的代价。The update module 906 is used to update the reference execution cost corresponding to the query step characteristics according to the actual execution cost of the query step; the correspondence between the query step characteristics and the reference execution cost is used to evaluate any query step when optimizing the query plan. price.

其中，所述执行代价为可从执行结果信息中获取的执行过程信息。The execution cost is execution process information that can be obtained from execution result information.

所述对应关系还包括：实际执行代价以及时间戳与所述查询步骤特征的对应关系，此时所述优化查询计划装置还可以包括完成时间戳确定装置（未示出），用于根据该查询步骤的基本信息确定该查询步骤的完成时间戳；此时，所述更新模块906，用于将所述实际执行代价以及所述完成时间戳增加至该查询步骤特征对应的标识，并将该查询步骤特征对应的参照执行代价的值更新为所述实际执行代价的值。The corresponding relationship also includes: the actual execution cost and the corresponding relationship between the timestamp and the query step characteristics. At this time, the optimized query plan device may also include a completion timestamp determining device (not shown) for determining according to the query. The basic information of the step determines the completion timestamp of the query step; at this time, the update module 906 is used to add the actual execution cost and the completion timestamp to the identifier corresponding to the query step feature, and add the query step The value of the reference execution cost corresponding to the step feature is updated to the value of the actual execution cost.

本说明书提供了一种参照执行代价确定装置，如图10所示，用于确定任一查询步骤的参照执行代价，包括：This specification provides a reference execution cost determination device, as shown in Figure 10, used to determine the reference execution cost of any query step, including:

解析模块1002，用于解析查询计划中的各个查询步骤；The parsing module 1002 is used to parse each query step in the query plan;

确定模块1004，针对任一查询步骤，确定该查询步骤的可替代查询步骤；The determination module 1004 determines, for any query step, alternative query steps for the query step;

参照执行代价确定模块1006，针对任一可替代查询步骤，确定该可替代查询步骤的特征，根据如权利要求1所述的对应关系，查询与该可替代查询步骤对应的参照执行代价；The reference execution cost determination module 1006 determines the characteristics of any alternative query step for any alternative query step, and queries the reference execution cost corresponding to the alternative query step according to the corresponding relationship as claimed in claim 1;

最优查询计划确定模块1008，选取各查询步骤的执行代价最小的可替代查询步骤，得到最优查询计划。The optimal query plan determination module 1008 selects the alternative query step with the smallest execution cost of each query step to obtain the optimal query plan.

本说明书提供了一种用于预测参照执行代价稳定度的模型构建装置，如图11所示，包括：This specification provides a model construction device for predicting the stability of reference execution costs, as shown in Figure 11, including:

数据样本获取模块1102，用于根据的预设的时长T1，获取时长T1的历史时段内执行完成的查询步骤数据样本集合，任一查询步骤数据样本包括：该查询步骤基本信息、该查询步骤对应的实际执行代价；The data sample acquisition module 1102 is used to obtain a data sample set of query steps completed within the historical period of time T1 according to the preset time length T1. The data sample of any query step includes: the basic information of the query step, the corresponding query step actual execution cost;

训练样本确定模块1104，包括三个子单元，用于针对任一查询步骤数据样本，通过以下方式将该查询步骤数据样本转换为训练样本：The training sample determination module 1104 includes three sub-units, which are used to convert any query step data sample into a training sample in the following manner:

查询子单元1114，用于根据该查询步骤的基本信息，确定该查询步骤的特征以及完成时间戳，通过查询如上述的对应关系，得到该查询步骤评估时的参照执行代价；The query subunit 1114 is used to determine the characteristics and completion timestamp of the query step based on the basic information of the query step, and obtain the reference execution cost when evaluating the query step by querying the above-mentioned correspondence;

稳定度计算子单元1124，用于根据实际执行代价和参照执行代价，计算得到该查询步骤对应的参照执行代价稳定度；The stability calculation subunit 1124 is used to calculate the stability of the reference execution cost corresponding to the query step based on the actual execution cost and the reference execution cost;

训练样本确定子单元1134，用于以该查询步骤特征为特征值、以该查询步骤的参照执行代价稳定度为标签，得到该查询步骤对应的训练样本；The training sample determination subunit 1134 is used to obtain the training sample corresponding to the query step using the feature of the query step as the feature value and the stability of the reference execution cost of the query step as the label;

训练样本集合确定模块1106，用于对所述数据样本集合遍历执行所述转换步骤，得到训练样本集合；The training sample set determination module 1106 is used to traverse and perform the conversion step on the data sample set to obtain a training sample set;

模型训练模块1108，用于利用所述训练样本集合训练得到模型，所述模型用于：根据任一查询步骤的特征，预测该查询步骤的参照执行代价稳定度。The model training module 1108 is used to train a model using the training sample set, and the model is used to predict the stability of the reference execution cost of any query step based on the characteristics of the query step.

本说明提供了一种预测参照执行代价稳定度的装置，如图12所示，用于预测任一查询步骤对应的参照执行代价稳定度，包括：This description provides a device for predicting the stability of the reference execution cost, as shown in Figure 12. It is used to predict the stability of the reference execution cost corresponding to any query step, including:

特征确定模块1202，用于接收查询步骤，并确定所述查询步骤的特征；Feature determination module 1202, used to receive the query step and determine the characteristics of the query step;

输入模块1204，用于将所述查询步骤的特征输入至用于预测参照执行代价稳定度的模型；所述模型使用如上述的模型构建装置构建得到；The input module 1204 is used to input the characteristics of the query step into a model used to predict the stability of the reference execution cost; the model is constructed using the above-mentioned model construction device;

预测结果获取模块1206，用于根据所述模型的输出，预测该查询步骤对应的参照执行代价稳定度。The prediction result acquisition module 1206 is used to predict the stability of the reference execution cost corresponding to the query step according to the output of the model.

本说明书还提供了另一种参照执行代价确定装置，如图13所示，包括：This specification also provides another reference execution cost determination device, as shown in Figure 13, including:

解析模块1302，用于解析查询计划中的各个查询步骤；The parsing module 1302 is used to parse each query step in the query plan;

确定模块1304，用于针对任一查询步骤，确定该查询步骤的可替代查询步骤；The determination module 1304 is used to determine, for any query step, alternative query steps for the query step;

参照执行代价确定模块1306，用于针对任一可替代查询步骤，利用如上述的预测参照执行代价稳定度的装置，预测该可替代查询步骤对应的参照执行代价稳定度；若该可替代查询步骤对应的参照执行代价稳定度大于预设值，则确定该可替代查询步骤的特征，根据如上述的对应关系，查询与该可替代查询步骤对应的参照执行代价；The reference execution cost determination module 1306 is used for predicting the reference execution cost stability corresponding to the alternative query step using the above-mentioned device for predicting the stability of the reference execution cost for any alternative query step; if the alternative query step If the stability of the corresponding reference execution cost is greater than the preset value, then the characteristics of the alternative query step are determined, and based on the above-mentioned corresponding relationship, the reference execution cost corresponding to the alternative query step is queried;

最优查询计划确定模块1308，用于选取各查询步骤的执行代价最小的可替代查询步骤，得到最优查询计划。The optimal query plan determination module 1308 is used to select alternative query steps with the smallest execution cost of each query step to obtain the optimal query plan.

其中，该装置还可以包括更新模块（未示出），用于若该查询步骤对应的参照执行代价稳定度小于预设值，则更新用于预测参照执行代价稳定度的模型。The device may further include an update module (not shown), configured to update a model used to predict the stability of the reference execution cost if the stability of the reference execution cost corresponding to the query step is less than a preset value.

上述实施例阐明的装置、模块或单元，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机，计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, a laptop, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, or a game controller. desktop, tablet, wearable device, or a combination of any of these devices.

本说明书提供了一种电子设备，包括：This instruction manual provides an electronic device, including:

处理器；processor;

用于存储处理器可执行指令的存储器；Memory used to store instructions executable by the processor;

其中，所述处理器通过运行所述可执行指令以实现优化查询计划的方法，以查询步骤的特征为标识，构建查询步骤特征与参照执行代价的对应关系，所述方法至少包括：Wherein, the processor implements a method of optimizing the query plan by running the executable instructions, using the characteristics of the query step as an identifier, and constructing a corresponding relationship between the characteristics of the query step and the reference execution cost. The method at least includes:

本说明书还提供了一种电子设备，包括：This instruction manual also provides an electronic device, including:

处理器；processor;

其中，所述处理器通过运行所述可执行指令以实现最优查询计划的确定方法，所述方法至少包括：Wherein, the processor implements a method for determining the optimal query plan by running the executable instructions, and the method at least includes:

针对任一可替代查询步骤，确定该可替代查询步骤的特征，根据查询步骤特征与参照执行代价的对应关系，查询与该可替代查询步骤对应的参照执行代价；For any alternative query step, determine the characteristics of the alternative query step, and query the reference execution cost corresponding to the alternative query step according to the corresponding relationship between the query step characteristics and the reference execution cost;

处理器；processor;

其中，所述处理器通过运行所述可执行指令以实现用于预测参照执行代价稳定度的模型构建方法，所述方法至少包括：Wherein, the processor executes the executable instructions to implement a model construction method for predicting reference execution cost stability, and the method at least includes:

根据该查询步骤的基本信息，确定该查询步骤的特征以及完成时间戳，通过查询查询步骤特征与参照执行代价的对应关系，得到该查询步骤评估时的参照执行代价；According to the basic information of the query step, determine the characteristics and completion timestamp of the query step, and obtain the reference execution cost when evaluating the query step by querying the correspondence between the query step characteristics and the reference execution cost;

处理器；processor;

其中，所述处理器通过运行所述可执行指令以实现预测参照执行代价稳定度的方法，用于预测任一查询步骤对应的参照执行代价稳定度，所述方法至少包括：Wherein, the processor implements a method of predicting reference execution cost stability by running the executable instructions, and is used to predict the reference execution cost stability corresponding to any query step. The method at least includes:

处理器；processor;

针对任一可替代查询步骤，预测该可替代查询步骤对应的参照执行代价稳定度；若该可替代查询步骤对应的参照执行代价稳定度大于预设值，则确定该可替代查询步骤的特征，根据查询步骤特征与参照执行代价的对应关系，查询与该可替代查询步骤对应的参照执行代价；For any alternative query step, predict the stability of the reference execution cost corresponding to the alternative query step; if the stability of the reference execution cost corresponding to the alternative query step is greater than the preset value, determine the characteristics of the alternative query step, According to the corresponding relationship between the query step characteristics and the reference execution cost, query the reference execution cost corresponding to the alternative query step;

图14是一示例性实施例提供的一种设备的示意结构图。请参考图14，在硬件层面，该设备包括处理器1402、内部总线1404、网络接口1406、内存1408以及非易失性存储器1410，当然还可能包括其他业务所需要的硬件。本说明书一个或多个实施例可以基于软件方式来实现，比如由处理器1402从非易失性存储器1410中读取对应的计算机程序到内存1408中然后运行。当然，除了软件实现方式之外，本说明书一个或多个实施例并不排除其他实现方式，比如逻辑器件抑或软硬件结合的方式等等，也就是说以下处理流程的执行主体并不限定于各个逻辑单元，也可以是硬件或逻辑器件。Figure 14 is a schematic structural diagram of a device provided by an exemplary embodiment. Please refer to Figure 14. At the hardware level, the device includes a processor 1402, an internal bus 1404, a network interface 1406, a memory 1408, and a non-volatile memory 1410. Of course, it may also include other hardware required by the business. One or more embodiments of this specification may be implemented based on software. For example, the processor 1402 reads the corresponding computer program from the non-volatile memory 1410 into the memory 1408 and then runs it. Of course, in addition to software implementation, one or more embodiments of this specification do not exclude other implementations, such as logic devices or a combination of software and hardware, etc. That is to say, the execution subject of the following processing flow is not limited to each A logic unit can also be a hardware or logic device.

其中，如图9–图13的装置可以应用于如图14所示的设备中，以实现本说明书的技术方案。Among them, the devices shown in Figures 9 to 13 can be applied to the equipment shown in Figure 14 to implement the technical solution of this specification.

在一个典型的配置中，计算机包括一个或多个处理器 (CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器 (RAM) 和/或非易失性内存等形式，如只读存储器 (ROM) 或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer-readable media, random access memory (RAM), and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

本说明书提供了一种计算机可读存储介质，其上存储有计算机指令，该指令被处理器执行时，实现优化查询计划的方法，以查询步骤的特征为标识，构建查询步骤特征与参照执行代价的对应关系，所述方法包括：This specification provides a computer-readable storage medium on which computer instructions are stored. When the instructions are executed by a processor, a method for optimizing a query plan is implemented. The characteristics of the query steps are used as identifiers to construct the characteristics of the query steps and the reference execution cost. corresponding relationship, the method includes:

本说明书还提供了一种计算机可读存储介质，其上存储有计算机指令，该指令被处理器执行时，实现最优查询计划的确定方法，所述方法至少包括：This specification also provides a computer-readable storage medium on which computer instructions are stored. When the instructions are executed by the processor, a method for determining the optimal query plan is implemented. The method at least includes:

本说明书还提供了一种计算机可读存储介质，其上存储有计算机指令，该指令被处理器执行时，实现用于预测参照执行代价稳定度的模型构建方法，所述方法至少包括：This specification also provides a computer-readable storage medium on which computer instructions are stored. When the instructions are executed by a processor, a model construction method for predicting reference execution cost stability is implemented. The method at least includes:

本说明书还提供了一种计算机可读存储介质，其上存储有计算机指令，该指令被处理器执行时，实现预测参照执行代价稳定度的方法，用于预测任一查询步骤对应的参照执行代价稳定度，所述方法至少包括：This specification also provides a computer-readable storage medium on which computer instructions are stored. When the instructions are executed by the processor, a method for predicting the stability of the reference execution cost is used to predict the reference execution cost corresponding to any query step. Stability, the method at least includes:

本说明书还提供了一种计算机可读存储介质，其上存储有计算机指令，该指令被处理器执行时，可执行指令以实现最优查询计划的确定方法，所述方法至少包括：This specification also provides a computer-readable storage medium on which computer instructions are stored. When the instructions are executed by the processor, the instructions can be executed to achieve a method for determining the optimal query plan. The method at least includes:

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存 (PRAM)、静态随机存取存储器 (SRAM)、动态随机存取存储器 (DRAM)、其他类型的随机存取存储器 (RAM)、只读存储器 (ROM)、电可擦除可编程只读存储器 (EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘 (DVD) 或其他光学存储、磁盒式磁带、磁盘存储、量子存储器、基于石墨烯的存储介质或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium, can be used to store information that can be accessed by computing devices. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

此外，本说明还提供了一种计算机程序，该计算机程序被一个或多个处理器执行时，用于实现如上述任一所述的方法。In addition, this description also provides a computer program, which, when executed by one or more processors, is used to implement any of the above methods.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprises," "comprises," or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements not only includes those elements, but also includes Other elements are not expressly listed or are inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.

在本说明书一个或多个实施例使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in one or more embodiments of this specification is for the purpose of describing particular embodiments only and is not intended to limit the one or more embodiments of this specification. As used in one or more embodiments of this specification and the appended claims, the singular forms "a," "the" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.

应当理解，尽管在本说明书一个或多个实施例可能采用术语第一、第二、第三等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本说明书一个或多个实施例范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although one or more embodiments of this specification may use the terms first, second, third, etc. to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other. For example, without departing from the scope of one or more embodiments of this specification, the first information may also be called second information, and similarly, the second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to determining."

以上所述仅为本说明书一个或多个实施例的较佳实施例而已，并不用以限制本说明书一个或多个实施例，凡在本说明书一个或多个实施例的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本说明书一个或多个实施例保护的范围之内。The above are only preferred embodiments of one or more embodiments of this specification, and are not intended to limit one or more embodiments of this specification. Within the spirit and principles of one or more embodiments of this specification, Any modifications, equivalent substitutions, improvements, etc. shall be included in the scope of protection of one or more embodiments of this specification.

Claims

1. A model building method for predicting reference execution cost stability, comprising:

according to a preset time length T1, acquiring a query step data sample set which is completed in the historical time period of the time length T1, wherein any query step data sample comprises: basic information of the query step and actual execution cost corresponding to the query step;

for any query step data sample, the query step data sample is converted into a training sample by:

determining the characteristics and the completion time stamp of the query step according to the basic information of the query step, and obtaining the reference execution cost when the query step is evaluated through the corresponding relation between the characteristics of the query step and the reference execution cost;

according to the actual execution cost and the reference execution cost, calculating to obtain the stability of the reference execution cost corresponding to the query step;

Taking the characteristic of the query step as a characteristic value and the stability of the reference execution cost of the query step as a label to obtain a training sample corresponding to the query step;

performing the conversion step on the data sample set traversal to obtain a training sample set;

training with the training sample set to obtain a model, wherein the model is used for: predicting the stability of the reference execution cost of any query step according to the characteristics of the query step;

the correspondence is updated based on a method for optimizing a query plan, and the method for optimizing the query plan comprises the following steps:

acquiring execution result information after the database execution engine completes any query step;

determining the characteristics of the query step according to the basic information of the query step, and determining the actual execution cost of the query step information according to the execution result information;

and updating the reference execution cost corresponding to the characteristic of the query step according to the actual execution cost of the query step.

2. The model construction method according to claim 1, the correspondence further comprising: the method for optimizing the query plan further comprises the following steps of:

Determining a completion time stamp of the query step according to the basic information of the query step;

updating the reference execution cost corresponding to the query step feature according to the actual execution cost of the query step, including:

and adding the actual execution cost and the completion time stamp to an identifier corresponding to the query step feature, and updating the value of the reference execution cost corresponding to the query step feature to the value of the actual execution cost.

3. A method of determining an optimal query plan, the method being applied to a database optimization engine, the method comprising:

analyzing each query step in the query plan;

determining an alternative query step to the query step for any query step;

for any alternative query step, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation as set forth in claim 1;

and selecting the alternative query step with the minimum execution cost of each query step, and determining the optimal query plan.

4. A method for predicting reference execution cost stability corresponding to any query step, comprising:

Receiving a query step and determining characteristics of the query step;

inputting the characteristics of the query step into a model for predicting the stability of the execution cost of the reference; the model is constructed by using the construction method according to claim 1;

and predicting the stability of the reference execution cost corresponding to the query step according to the output of the model.

5. A method of determining an optimal query plan, the method being applied to an optimizer execution engine, the method comprising:

analyzing each query step in the query plan;

determining an alternative query step to the query step for any query step;

for any alternative query step, predicting the reference execution cost stability corresponding to the alternative query step by using the method as claimed in claim 4; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation as set forth in claim 1;

6. The method of claim 5, further comprising:

And if the stability of the reference execution cost corresponding to the query step is smaller than a preset value, updating a model for predicting the stability of the reference execution cost.

7. A model construction apparatus for predicting reference execution cost stability, comprising:

the data sample acquisition module is configured to acquire a query step data sample set that is completed in a history period of the duration T1 according to a preset duration T1, where any query step data sample includes: basic information of the query step and actual execution cost corresponding to the query step;

the training sample determining module comprises three subunits, and is used for converting any query step data sample into a training sample by the following modes:

the query subunit is used for determining the characteristics and the completion time stamp of the query step according to the basic information of the query step, and obtaining the reference execution cost when the query step is evaluated through the corresponding relation between the characteristics of the query step and the reference execution cost;

the stability calculating subunit is used for calculating the stability of the reference execution cost corresponding to the query step according to the actual execution cost and the reference execution cost;

The training sample determining subunit is used for obtaining a training sample corresponding to the query step by taking the characteristic of the query step as a characteristic value and taking the reference execution cost stability of the query step as a label;

the training sample set determining module is used for performing the conversion step on the data sample set in a traversing way to obtain a training sample set;

the model training module is used for training to obtain a model by using the training sample set, and the model is used for: predicting the stability of the reference execution cost of any query step according to the characteristics of the query step;

8. A device for predicting reference execution cost stability, configured to predict reference execution cost stability corresponding to any query step, including:

The characteristic determining module is used for receiving the query step and determining the characteristics of the query step;

the input module is used for inputting the characteristics of the query step into a model for predicting the stability of the reference execution cost; the model is constructed by using the model construction method according to claim 1;

and the prediction result acquisition module is used for predicting the stability of the reference execution cost corresponding to the query step according to the output of the model.

9. A device for determining an optimal query plan, the device comprising:

the analysis module is used for analyzing each query step in the query plan;

the determining module is used for determining an alternative query step of any query step;

a reference execution cost determining module, configured to predict, for any alternative query step, a reference execution cost stability corresponding to the alternative query step by using the method as set forth in claim 4; if the stability of the reference execution cost corresponding to the alternative query step is greater than a preset value, determining the characteristics of the alternative query step, and querying the reference execution cost corresponding to the alternative query step according to the corresponding relation as set forth in claim 1;

And the optimal query plan determining module is used for selecting the alternative query step with the minimum execution cost of each query step and determining the optimal query plan.

10. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the method of any of claims 1-6 by executing the executable instructions.

11. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1-6.