CN116431448A - Evaluation method and device of execution cost and electronic equipment - Google Patents

Evaluation method and device of execution cost and electronic equipment Download PDF

Info

Publication number
CN116431448A
CN116431448A CN202210002360.XA CN202210002360A CN116431448A CN 116431448 A CN116431448 A CN 116431448A CN 202210002360 A CN202210002360 A CN 202210002360A CN 116431448 A CN116431448 A CN 116431448A
Authority
CN
China
Prior art keywords
execution
target
cost
plan
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210002360.XA
Other languages
Chinese (zh)
Inventor
张健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202210002360.XA priority Critical patent/CN116431448A/en
Publication of CN116431448A publication Critical patent/CN116431448A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides an execution cost evaluation method, an execution cost evaluation device and electronic equipment, which are applied to computing nodes in a distributed storage system; the distributed storage system further includes a plurality of storage nodes; the method comprises the following steps: determining a target execution plan of the execution cost to be evaluated, wherein the target plan is an execution plan generated for the SQL statement; for each execution action included in the target execution plan, if the execution action belongs to an access operation for a target data table, determining the execution cost of the execution action according to target table information of each designated sub-table and meta-information of the target data table related to the access operation; based on the determined execution costs of the respective execution actions, an execution cost of the target execution plan is determined. By the scheme, the accuracy of the execution cost evaluation of the SQL sentence can be improved.

Description

Evaluation method and device of execution cost and electronic equipment
Technical Field
The present invention relates to the field of database technologies, and in particular, to a method and an apparatus for evaluating execution cost, and an electronic device.
Background
When accessing a database based on an SQL (Structured Query Language ) statement, in order to improve the execution efficiency of the SQL statement, a plurality of execution plans are usually generated for the SQL statement before the SQL statement is executed, and the execution cost of each execution plan is analyzed, so that an execution plan with lower execution cost is selected as an execution plan relied upon when the SQL statement is executed. Wherein, the execution plans are descriptions of various execution actions in the process of executing SQL sentences, and the execution cost of each execution plan is: cost estimation values reflecting I/O (Input/Output), CPU (central processing unit ), network resources, etc. to be consumed when the execution plan is adopted are obtained, wherein the execution cost is smaller and the execution efficiency of the SQL statement is higher.
In the related art, for a distributed storage system, when processing an SQL statement, only static meta information such as a table size, a table row length, table row data, and table data distribution is considered to calculate an execution cost of each execution plan of the SQL statement.
However, because the considered information is single static information, the accuracy of the estimated execution cost is low, and the execution efficiency of the SQL statement is finally affected.
Disclosure of Invention
The embodiment of the invention aims to provide an execution cost evaluation method, an execution cost evaluation device and electronic equipment so as to improve the accuracy of execution cost evaluation. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides an execution cost evaluation method, which is applied to a computing node in a distributed storage system; the distributed storage system further includes a plurality of storage nodes; the method comprises the following steps:
determining a target execution plan of an execution cost to be evaluated, wherein the target plan is an execution plan generated for a Structured Query Language (SQL) statement;
for each execution action included in the target execution plan, if the execution action belongs to an access operation for a target data table, determining the execution cost of the execution action according to target table information of each designated sub-table and meta-information of the target data table related to the access operation; wherein each designated sub-table is a sub-table in each storage node concerning the target data table, and target table information of each designated sub-table is used for characterizing: time consuming when the access operation is performed on the designated sub-table;
Based on the determined execution cost of each execution action, an execution cost of the target execution plan is determined.
Optionally, before determining the execution cost of the execution action according to the target table information of each designated sub-table and the meta information of the target data table related to the access operation, the method further includes:
requesting target table information of each designated sub-table and meta-information of the target data table related to the access operation from the preset global management node;
the global management node stores target table information of sub-tables of each data table and meta-information of each data table related to the access operation in advance.
Optionally, the generating manner of the target table information of each designated sub-table includes:
performing the access operation on each designated sub-table;
and determining target table information of each designated sub-table based on time consumption generated in the access operation execution process.
Optionally, the determining the execution cost of the execution action according to the target table information of each designated sub-table and the meta information of the target data table related to the access operation includes:
If the access operation is a first type read operation, determining the execution cost of the execution action according to the read cost factors of each designated sub-table and the total row number of the target data table; wherein the first type of read operation is a read operation for the entire table of the target data table; the read cost factor for each designated sub-table is used to characterize: time consuming reading data of a specified line number from the specified sub-table;
if the access operation is a second type of read operation, determining the execution cost of the execution action according to a column interval factor to be utilized and the number of rows and the data size to be read in the column value domain; wherein the second type of read operation is a read operation directed to the target data list value field; the column interval factors to be utilized are column interval factors about the column value range, wherein the column interval factors are of a designated sub-table with the column value range; the column interval factor is used to characterize: time consuming reading data of a specified number of rows from the column value field.
Optionally, the determining the execution cost of the execution action according to the read cost factor of each designated sub-table and the total row number of the target data table includes:
determining the maximum value from the read cost factors of each designated sub-table as the read cost factor to be utilized;
Substituting the to-be-utilized reading cost factor and the total number of rows of the target data table into a preset first price calculation formula, and calculating the execution cost of the execution action, wherein the first price calculation formula is a formula set for the reading operation;
the determining the execution cost of the execution action according to the column interval factor to be utilized and the number of rows and the data size to be read in the column value domain comprises the following steps:
substituting a column interval factor to be utilized, the number of lines to be read in the column value domain and the data size into a preset second cost calculation formula, and calculating the execution cost of the execution action, wherein the second cost calculation formula is a formula set for the reading operation of the column value domain.
In a second aspect, an embodiment of the present invention provides a response method for an access request, which is applied to a computing node in a distributed storage system; the method comprises the following steps:
when receiving a Structured Query Language (SQL) statement sent by an access terminal, generating each target execution plan of the SQL statement;
according to the evaluation method of the execution cost, the execution cost of each target execution plan is evaluated, and the execution cost of each target execution plan is obtained;
Selecting a target execution plan with the minimum execution cost from all the target execution plans;
executing the SQL sentence according to the selected target execution plan to obtain an execution result;
and feeding back the execution result to the access terminal.
In a third aspect, an embodiment of the present invention provides an evaluation apparatus for execution cost, which is applied to a computing node in a distributed storage system; the distributed storage system further includes a plurality of storage nodes; the device comprises:
the first determining module is used for determining a target execution plan of the execution cost to be evaluated, wherein the target plan is an execution plan generated for a Structured Query Language (SQL) statement;
the second determining module is configured to determine, for each execution action included in the target execution plan, if the execution action belongs to an access operation for a target data table, an execution cost of the execution action according to target table information of each designated sub-table and meta-information of the target data table related to the access operation; wherein each designated sub-table is a sub-table in each storage node concerning the target data table, and target table information of each designated sub-table is used for characterizing: time consuming when the access operation is performed on the designated sub-table;
And a third determining module, configured to determine an execution cost of the target execution plan based on the determined execution cost of each execution action.
In a fourth aspect, an embodiment of the present invention provides a response apparatus for an access request, which is applied to a computing node in a distributed storage system; the device comprises:
the generation module is used for generating each target execution plan of the SQL statement when receiving the structured query language SQL statement sent by the access terminal;
the evaluation module is used for evaluating the execution cost of each target execution plan according to the evaluation method of the execution cost to obtain the execution cost of each target execution plan;
the selecting module is used for selecting a target execution plan with the minimum execution cost from all the target execution plans;
the execution module is used for executing the SQL sentence according to the selected target execution plan to obtain an execution result;
and the feedback module is used for feeding back the execution result to the access terminal.
In a fifth aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
A memory for storing a computer program;
and the processor is used for realizing the step of the execution cost evaluation method or the response method for the access request when executing the program stored in the memory.
In a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of the above-described execution cost evaluation method or response method to an access request.
The embodiment of the invention also provides a computer program product containing instructions, which when run on a computer, cause the computer to execute the above-mentioned evaluation method of the execution cost or the response method for the access request.
The embodiment of the invention has the beneficial effects that:
according to the method for evaluating the execution cost, provided by the embodiment of the invention, for each execution action included in a target execution plan, if the execution action belongs to an access operation for a target data table, target table information of each designated sub-table and meta-information of the target data table related to the access operation are comprehensively considered, and the execution cost of the execution action is determined; wherein each designated sub-table is a sub-table in each storage node concerning the target data table, and target table information of each designated sub-table is used for characterizing: time consuming when performing access operations on the specified sub-table; and determining the execution cost of the target execution plan based on the determined execution cost of each execution action. In the scheme, when the execution cost of each execution action is determined, the time consumption of executing the access operation on the designated sub-table and the meta-information related to the access operation of the target data table are comprehensively considered, and the time consumption of executing the access operation on the designated sub-table is influenced by the load of the computing node and the storage node, the communication link state and other factors, namely, the determination process of the execution cost simultaneously considers the static information and the dynamic information. Therefore, through the scheme, the accuracy of SQL statement execution cost evaluation can be improved.
Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other embodiments may be obtained according to these drawings to those skilled in the art.
FIG. 1 is a flowchart of a method for evaluating an execution cost according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a distributed storage system according to an embodiment of the present invention;
FIG. 3 is a flow chart of a response method for an access request according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a distribution of two data tables in a distributed storage system according to an embodiment of the present invention;
FIG. 5 (a) is the statistics of the Commodity table;
FIG. 5 (b) is the statistics of an Order table;
FIG. 6 is a plan tree of an un-optimized SQL statement according to an embodiment of the invention;
FIG. 7 is a flow chart of a response method for an access request according to an embodiment of the present invention;
Fig. 8 is a schematic structural diagram of an apparatus for evaluating execution cost according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a response device for an access request according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, those of ordinary skill in the art will be able to devise all other embodiments that are obtained based on this application and are within the scope of the present invention.
Currently, in a distributed storage system, when executing an SQL statement, an optimizer is used to reduce the execution cost of SQL as much as possible according to certain rules and statistical information. Among them, the optimizers generally use RBO (Rule-Based optimizers) and CBO (Cost-Based optimizers). Specifically, the RBO contains a set of optimization rules, the optimization is not influenced by the data distribution condition in the read table, and an execution plan is generated according to SQL sentences only according to the preset rules, so that the RBO is a relatively rough optimization strategy. The CBO selects an execution plan with the smallest cost to execute the SQL statement according to the statistical information which is pre-stored in the database and updated in real time, and generates a group of execution plans which can be used according to the statistical information, and further evaluates the cost of each plan, thereby selecting the execution with the smallest cost, wherein the statistical information of the table generally has meta-information such as table size, row number, single row length, single column data distribution condition, index condition and the like. In practice, the RBO and CBO are generally combined.
In the related art, for a distributed storage system, two schemes are generally adopted to perform SQL execution optimization, namely, an optimal (the execution cost is minimum) execution plan is identified:
(1) Establishing a global index and improving the data retrieval efficiency by combining with RBO rules;
(2) Based on meta information (such as size, table number, single line length, single line distribution, etc.) of the data table, the execution plan tree of the SQL sentence is scanned from top to bottom, the execution cost is evaluated, and the execution plan tree with the minimum cost is selected. Wherein, the execution plan tree is: the representation of the tree structure of the plan is performed.
In the first mode, the global index-based mode can greatly improve the data retrieval efficiency, but can influence the data updating efficiency, and can bring additional storage cost; the second way is to evaluate the execution cost based on the meta information of the data table, and the considered information is single static information, and the comprehensive cost consideration is not performed by combining the information such as the load of the storage node, the state of the communication link, the load of the computing node, and the like, so that the evaluated cost is low in accuracy, and the execution efficiency of the SQL sentence is finally affected.
Aiming at the problem of low accuracy of the evaluation cost, the embodiment of the invention provides an evaluation method, an apparatus, an electronic device and a readable storage medium of the execution cost, and the evaluation method of the execution cost provided by the embodiment of the invention is first described below.
The method for evaluating the execution cost is applied to the computing nodes in the distributed storage system; the distributed storage system also includes a plurality of storage nodes.
It will be appreciated that for a distributed storage system, at least one computing node, as well as a plurality of storage nodes, may be included in the system. The computing node is used for receiving an access request of an access terminal, namely an SQL statement, and responding to the access request based on data stored in each storage node; and sub-tables of the respective data tables are stored in the respective storage nodes, and the storage nodes may also be referred to as database instances.
In addition, the functional software for implementing the execution cost evaluation method provided by the embodiment of the invention can be CBO in a database computing engine running in a computing node. It should be noted that CBOs in a compute node may also be referred to as cost computation engines.
The method for evaluating the execution cost provided by the embodiment of the invention can comprise the following steps:
determining a target execution plan of an execution cost to be evaluated, wherein the target plan is an execution plan generated for a Structured Query Language (SQL) statement;
For each execution action included in the target execution plan, if the execution action belongs to an access operation for a target data table, determining the execution cost of the execution action according to target table information of each designated sub-table and meta-information of the target data table related to the access operation; wherein each designated sub-table is a sub-table in each storage node concerning the target data table, and target table information of each designated sub-table is used for characterizing: time consuming when the access operation is performed on the designated sub-table;
based on the determined execution cost of each execution action, an execution cost of the target execution plan is determined.
In the scheme, for each execution action included in a target execution plan, if the execution action belongs to an access operation for a target data table, comprehensively considering target table information of each designated sub-table and meta-information of the target data table related to the access operation, and determining the execution cost of the execution action; wherein each designated sub-table is a sub-table in each storage node concerning the target data table, and target table information of each designated sub-table is used for characterizing: time consuming when performing access operations on the specified sub-table; and determining the execution cost of the target execution plan based on the determined execution cost of each execution action. In the scheme, when determining the execution cost of each execution action, the time consumption of executing the access operation on the designated sub-table and the meta-information of the target data table related to the access operation are comprehensively considered, and the time consumption of executing the access operation on the designated sub-table is influenced by the load of the computing node and the storage node, the state of the communication link and other factors, namely, the determination process of the execution cost simultaneously considers the static information and the dynamic information. Therefore, through the scheme, the accuracy of the execution cost evaluation of the SQL statement can be improved.
An evaluation method for execution cost according to an embodiment of the present invention is described below with reference to the accompanying drawings.
As shown in fig. 1, the embodiment of the invention provides an execution cost evaluation method, which is applied to computing nodes in a distributed storage system; the distributed storage system further includes a plurality of storage nodes; the method comprises the following steps:
s101, determining a target execution plan of an execution cost to be evaluated, wherein the target plan is an execution plan generated for a Structured Query Language (SQL) statement;
after receiving the SQL sentence, the computing node can generate at least one execution plan aiming at the SQL sentence, and each execution plan to be evaluated for the execution cost can be used as a target execution plan. Wherein each execution of the execution plan acts as a physical step, e.g., reading data in a data table, writing data in a data table, etc. The execution plan is a plan for executing each execution action in a certain order, and if the execution plan is characterized by a tree structure, the execution plan may be referred to as an execution plan tree.
When generating each execution plan for the SQL statement, the SQL statement may be parsed first to generate a syntax tree, and then each execution plan may be generated based on the syntax tree. The specific process of generating each execution plan based on the syntax tree may be the same as the process in the prior art, and is not described herein in detail and is not a limiting description.
In addition, the SQL statement according to the embodiment of the invention is a query type statement. That is, when a content query requirement exists, a plurality of execution plans are determined according to a query statement corresponding to the content query requirement, so that the execution cost of each execution plan can be determined through the scheme, and therefore an execution plan with low execution cost is selected and used as a plan on which a query statement depends to respond, and finally the SQL statement execution efficiency is ensured.
S102, for each execution action included in the target execution plan, if the execution action belongs to an access operation for a target data table, determining the execution cost of the execution action according to target table information of each designated sub-table and meta-information of the target data table related to the access operation; wherein each designated sub-table is a sub-table in each storage node concerning the target data table, and target table information of each designated sub-table is used for characterizing: time consuming when the access operation is performed on the designated sub-table;
the target execution plan includes a plurality of execution actions, and the execution cost of the target execution plan is a summary of the execution cost of each execution action included in the target execution plan, so after the target execution plan is determined, the execution cost can be calculated for each execution action included in the target execution plan. And, it can be understood that when any execution action is an access operation for the target data table, the execution action consumes various software and hardware resources, so that the execution cost of the execution action can be calculated.
It should be noted that the target data table may be any data table existing in the storage node. In a distributed storage system, a data table is generally split into a plurality of sub-tables according to a predetermined rule, such as hash distribution, and is stored in a plurality of storage nodes respectively. For convenience of description, in this embodiment, the sub-tables related to the target data table in each storage node are named as each designated sub-table.
In this embodiment, in order to improve the accuracy of the evaluation of the execution cost of the SQL statement, if any execution action belongs to the access operation for the target data table, not only the meta information related to the access operation of the target data table is considered, but also the target table information of each designated sub-table is considered. Because the time consumption of executing the access operation on the designated sub-table is influenced by the load of the computing node and the storage node, the state of the communication link and other factors, the static information and the dynamic information are simultaneously considered in the process of determining the execution cost, and the considered information is richer.
Wherein the target table information of each designated sub-table is used to characterize: the time consuming operation is performed on the designated sub-table, for example, the time consuming operation of reading data of a predetermined size, the time consuming operation of reading data of a predetermined number of lines, and the like. Illustratively, a target data table is split into two sub-tables, sub-table 1 and sub-table 2, sub-table 1 being located at storage node 1 and sub-table 2 being located at storage node 2; then, the target table information of sub-table 1 is used for characterization, the time consumption of performing the access operation on sub-table 1 in storage node 1; the target table information of the sub-table 2 is used for characterization, and the time consumed for performing said access operation on the sub-table 2 in the storage node 2. In addition, meta information of the target data table related to the access operation may include one or more of the following information: total number and total size of target data table; the number of rows, the size, etc. of each sub-table. Moreover, since the target table information of each of the designated sub-tables is used for evaluating the execution cost of a certain execution action, the related access operation cannot be truly executed in the evaluation process to obtain the time consumption of the execution action, and thus the target table information of each of the designated sub-tables can be generated in advance. The generation mode of the target table information of each designated sub-table comprises the following steps of:
A1, performing the access operation on each designated sub-table;
a2, determining target table information of each designated sub-table based on time consumption generated in the execution process of the access operation.
It will be appreciated that the generation of the target table information for each of the designated sub-tables may be accomplished by any computing node in the distributed storage system.
The access operation may be a read operation for the entire target data table or a read operation for a column value field of the target data table, which is a certain section range of the data column. And, different access operations, the target table information of each designated sub-table is different, and the specific process of calculating the execution cost can be different. For convenience of solution clarity and layout clarity, a specific implementation procedure for determining the execution cost of the execution action according to the target table information of each designated sub-table and the meta information of the target data table related to the access operation is described below.
S103, determining the execution cost of the target execution plan based on the determined execution cost of each execution action.
After determining the execution costs of the plurality of execution actions included in the target execution plan, the execution costs of the execution actions may be comprehensively calculated, so as to obtain the execution costs of the target execution plan. For example, the execution cost of each execution action in the target execution plan can be added together to obtain the execution cost of the target execution plan; or weighting the execution cost of each execution action in the target execution plan to obtain the execution cost of the target execution plan.
In the scheme, aiming at each execution action included in a target execution plan, if the execution action belongs to access operation aiming at a target data table, comprehensively considering target table information of each designated sub-table and meta-information related to the access operation of the target data table, and determining the execution cost of the execution action; wherein each designated sub-table is a sub-table in each storage node concerning the target data table, and target table information of each designated sub-table is used for characterizing: time consuming when performing access operations on the specified sub-table; and determining the execution cost of the target execution plan based on the determined execution cost of each execution action. In the scheme, when determining the execution cost of each execution action, the time consumption of executing the access operation on the designated sub-table and the meta-information of the target data table related to the access operation are comprehensively considered, and the time consumption of executing the access operation on the designated sub-table is influenced by factors such as loads of the computing node and the storage node, the communication link state and the like, namely, the determination process of the execution cost simultaneously considers static information and dynamic information. Therefore, through the scheme, the accuracy of the execution cost evaluation of the SQL statement can be improved.
Optionally, in another embodiment of the present invention, before determining the execution cost of the execution action according to the target table information of each designated sub-table and the meta information of the target data table related to the access operation, the method further includes:
requesting target table information of each designated sub-table and meta-information of the target data table related to the access operation from the preset global management node; the global management node stores target table information of sub-tables of each data table and meta-information of each data table related to the access operation in advance.
In one implementation, as shown in fig. 2, there are multiple computing nodes in the distributed storage system, where each computing node is configured to read and write data from each storage node, and asynchronously and dynamically count, before evaluating execution costs, target table information of sub-tables of each data table, and meta-information of each data table related to access operations, and upload and summarize the meta-information to a global management node. And, the computing node can update and report in real time: target table information for a sub-table of each data table and meta information for each data table associated with an access operation. When the execution cost is estimated, the information stored in the global management node can be cached in a designated caching unit, and the computing node acquires the target table information of each designated sub-table and the meta-information of the target data table related to the access operation from the caching unit, so that the efficiency of the execution cost estimation is improved, and the pressure of the computing node to acquire the target table information and the meta-information on the global management node is reduced.
In addition, it can be understood that the global management node stores target table information related to various access operations of sub-tables of each data table and meta-information related to various access operations of each data table in advance. Various types of access operations may include: a read operation for the data table, a read operation for each column value range of the data table.
In this embodiment, the global management node stores the target table information of the sub-table of each data table and the meta information related to the access operation of each data table in advance, so that the computing node can conveniently and fast acquire: target table information of each designated sub-table and meta-information of the target data table related to the access operation, thereby ensuring a fast evaluation of the execution cost.
In order to facilitate understanding of the solution, a specific implementation procedure of determining the execution cost of the execution action according to the target table information of each designated sub-table and the meta information of the target data table related to the access operation is described in the following by way of example in connection with different access operations.
For example, if the access operation is a first type of read operation, the first type of read operation is a read operation for the entire table of the target data table, and accordingly, the target table information of each designated sub-table may be a read cost factor, and the read cost factor of each designated sub-table is used for representing: time consuming reading data of a specified line number from the specified sub-table. For example, the specified row number may be 100 rows, and the time taken to read 100 rows of data from the specified sub-table is 3 seconds, at this time, the read cost factor may be 3 seconds/hundred rows; the number of designated rows is 50 rows, the time taken to read 50 rows of data from the designated sub-table is 2s, at this time, the read cost factor is 2 seconds/50 rows, and so on. In addition, when the reading cost factor is generated, the data of the designated line number can be read from the designated sub-table for multiple times, and the average value of multiple time consumption is taken as the reading cost factor.
Accordingly, if the access operation is a first type of read operation, the meta information of the target data table related to the access operation may be: and the total number of rows of the target data table. The first type of read operation is an operation that reads all rows of the target data table.
Correspondingly, the determining the execution cost of the execution action according to the target table information of each designated sub-table and the meta-information of the target data table related to the access operation includes:
and if the access operation is a read operation aiming at the target data table, determining the execution cost of the execution action according to the read cost factors of all the designated sub-tables and the total row number of the target data table.
Optionally, in an implementation manner, determining the execution cost of the execution action according to the read cost factor of each designated sub-table and the total number of rows of the target data table may include:
determining the maximum value from the read cost factors of each designated sub-table as the read cost factor to be utilized;
substituting the to-be-utilized reading cost factor and the total number of rows of the target data table into a preset first price calculation formula, and calculating the execution cost of the execution action, wherein the first price calculation formula is a formula set for the reading operation.
Because the target data table is split into each designated sub-table, the read cost factors of each designated sub-table may also be different, in this implementation manner, the data access cost is measured from the short-plate effect in the global range, i.e. the maximum value in the read cost factors of each designated sub-table is selected as the read cost factor to be utilized, so that the actual cost consumption is more similar. And substituting the read cost factor to be utilized and the total number of rows of the target data table into a preset first price calculation formula, and calculating the execution cost of the execution action, wherein the first price calculation formula can be set according to actual conditions. Illustratively, the first price calculation formula may be as follows:
cost of execution of an execution action Read operation =ScanFactor*R max /1000*(ROW/N 2 +1); the ScanFactor is a preset weight factor for a read operation, and belongs to a constant, for example: 0.2 and 0.25; ROW is the total number of ROWs of the target data table; r is R max For the read cost factor to be utilized, N 2 To specify the number of rows.
The above specific implementation manner of determining the execution cost of the execution action according to the reading cost factor of each designated sub-table and the total number of rows of the target data table is merely an example, and should not be construed as limiting the embodiment of the present invention.
For example, if the access operation is a second type of read operation, the second type of read operation is a read operation for the target data list value field, and accordingly, the target list information of each designated sub-table is: column interval factors for the column value range for which there is a designated sub-table of the column value range; the column interval factor is used to characterize: time consuming reading data of a specified number of rows from the column value field. For ease of reference, the column interval factor for the column value range for which a specified sub-table of the column value range exists may be referred to as the column interval factor to be utilized.
Considering that possible data to be read is only partial data in the target data table, for evaluation accuracy, each designated sub-table of the target data table can be divided into a plurality of column value fields, for example, a designated sub-table records related data of 8 months and 9 months, and the designated sub-table can be divided into two column value fields of 8 months and 9 months according to months; the column interval factors of the column value fields are counted separately, and the column interval factors can be time consumption for reading one hundred rows of data in the column value fields.
Correspondingly, if the second type of read operation is performed, at this time, the meta information of the target data table related to the access operation is: the number of rows and the data size to be read within the column value field. The second type of read operation may be an operation that reads data specifying the number of rows within the column value field.
Correspondingly, the determining the execution cost of the execution action according to the target table information of each designated sub-table and the meta-information of the target data table related to the access operation may include:
and determining the execution cost of the execution action according to the column interval factor to be utilized and the number of rows and the data size to be read in the column value domain.
Optionally, in an implementation manner, determining the execution cost of the execution action according to the column interval factor to be utilized and the number of rows and the data size to be read in the column value domain may include:
substituting a column interval factor to be utilized, the number of rows to be read and the data size in the column value domain into a preset second cost calculation formula, and calculating the execution cost of the execution action, wherein the second cost calculation formula is a formula set for the reading operation of the column value domain.
Because the data to be read is part of the data of the target data table, for the execution cost of the read operation in the value range, the column interval factor of the column value range and the number of rows and the data size to be read in the column value range can be substituted into a preset second cost calculation formula to calculate the execution cost of the execution action.
The second cost calculation formula may be set according to the actual situation. Illustratively, the second cost calculation formula may be as follows:
cost of execution of an execution action Read operation of column value fields =scanfactor x M (N/100+1)/1000+x/machine memory size MemFactor;
wherein ScanFactor is a preset weight factor (constant) for a read operation, memfactor is a memory consumption factor (constant), for example, 0.3, m is a column interval factor to be utilized, N is the number of rows to be read, and X is the size of data to be read.
The above specific implementation manner of determining the execution cost of the execution action according to the column interval factor to be utilized and the total number of rows of the target data table is merely an example, and should not be construed as limiting the embodiment of the present invention.
It should be noted that, in some scenarios, the access operation may also involve a temporary table write operation, for example: an execution action in the execution plan is as follows: pushing down a specified table to a storage node, where the target table information for each specified table may also include a write cost factor, where the write cost factor for each specified table is used to characterize: time consuming writing of the specified table data as a temporary calculation object to the storage node. Accordingly, if an access operation is a write operation for a temporary table, the meta information of the target data table related to the access operation may be: the amount of data to be written with respect to the target data table. The acquisition of the write cost factor and the manner of estimating the cost for the write operation are similar to the processing procedure for the read operation described above, and will not be described in detail herein.
Based on the evaluation method of the execution cost, the invention also provides a response method aiming at the access request, which is applied to the computing nodes in the distributed storage system; as shown in fig. 3, the method includes:
s301, when receiving an SQL sentence sent by an access terminal, generating each target execution plan of the SQL sentence;
the access terminal may be a user terminal, such as an app (application program) terminal, that queries data in the distributed storage system. The access terminal generates an SQL statement for the data to be queried in the distributed storage system and sends the SQL statement to the release database. When the computing node receives the SQL statement sent by the access terminal, each target execution plan of the SQL statement is generated. The target execution plan generation process has been previously described and will not be described in detail herein.
S302, executing the cost evaluation method, and evaluating the execution cost of each target execution plan to obtain the execution cost of each target execution plan;
s303, selecting a target execution plan with the minimum execution cost from all target execution plans;
the selected target execution plan with the smallest execution cost means that the resource cost required by the target execution plan is the lowest and the execution efficiency is the highest after evaluation.
S304, executing the SQL sentence according to the selected target execution plan to obtain an execution result;
and sequentially executing each execution action contained in the target execution plan according to the selected target execution plan to finally obtain an execution result. The execution result may be related data that the access terminal needs to query.
S305, feeding back the execution result to the access terminal.
It can be understood that the execution result of the SQL statement, that is, the result to be queried by the access terminal, is irrelevant to the target execution plans, each target execution plan only affects the specific process of executing the SQL statement, and the execution results obtained by different target execution plans are the same.
In this embodiment, the execution cost of each target execution plan is evaluated, so as to obtain the execution cost of each target execution plan; selecting a target execution plan with the minimum execution cost from all the target execution plans; and executing the SQL statement according to the selected target execution plan to obtain an execution result. It can be seen that, in this embodiment, the execution cost evaluation method of the present invention is used to evaluate each target execution plan, and the target execution plan with the smallest execution cost is selected from each target execution plan. Therefore, by the scheme, the execution cost can be evaluated more accurately, so that a target execution plan with the minimum execution cost is selected, and the efficiency of executing SQL sentences is improved.
To facilitate an understanding of the method of responding to an access request of the present invention, an exemplary description is provided below in connection with fig. 4-7.
For example, in a distributed storage system, there are two data tables: commodity and Order form Order.
Commodity table
a_id type region Describtion ……
000001 Type_1 bj item1 ……
000002 Type_2 gz item2 ……
000003 Type_3 sh item3 ……
000004 Type_4 cs item4 ……
000005 Type_5 wh item5 ……
…… …… …… …… ……
Order table
o_id c_id count time ……
1 c_123 10 20210720 ……
2 c_456 8 20210720 ……
3 c_789 50 20210721 ……
4 c_234 30 20210721 ……
5 c_567 20 20210722 ……
…… …… …… …… ……
Commodity table is a hash dispersion sub-table according to a_id column, and Order table is sub-table in month unit according to time column. Their distribution in a distributed storage system is shown in fig. 4, in which Commodity sub-table 1 is located in storage node DB1, commodity sub-table 2 is located in storage node DB2, order table sub-table 1 and Order table sub-table 2 are located in storage node DB1, and Order table sub-table 3 and Order table sub-table 4 are located in storage node DB 2. The statistics of these two tables include: the target table information of each of the designated sub-tables, and meta information, such as statistics of the Commodity table in FIG. 5 (a), and statistics of the Order table in FIG. 5 (b).
At this time, an SQL statement:
select*from Commodity c join Order o on c.a_id=o.a_id where TO_DAYS(now())-TO_DAYS(o.time)<2
Order by o.count;
that is, the number of commodity orders count in these two days is queried according to the a_id column. Executing the non-optimized plan tree (an optional plan tree) of the SQL statement, as shown in FIG. 6, the computing node scans the full table data of the Commodity and Order tables from the database (according to the where filtering condition) and stores the full table data in the memory of the computing node; performing join calculation by using scanned data; and sequencing and outputting a calculation result output to the SQL request terminal according to the join calculation result and the count column. Therefore, the process has longer time consumption for scanning large table data, larger Memory consumption Of the computing node, and easy OOM (Out Memory used up) under the condition Of large concurrency, and if a conventional Memory use limiting means is used, the computing efficiency is influenced, and the process has large consumption on the CPU, and the efficiency Of other service requests is influenced; in addition, the process also fails to fully exploit the parallel computing power of multiple database partitions.
Thus, there is a need to optimize the response to SQL statement access requests. The response to this access request by this scheme is shown in fig. 7. For the SQL statement, two execution plans (two alternative execution plans) may be generated, namely, the SQL statement is executed by pushing down the column value fields of the small table or the large table, and the execution actions include: identifying the small table, reading the small table, pushing down the small table or inquiring the column value field of the large table, and summarizing the result. The specific steps and cost evaluation manner of each execution action are exemplarily described below; wherein, the two plan trees are mainly characterized by reading data in a target column value domain of an Order table and pushing down a Commodity table; the following are provided:
and (3) identifying the small table, and acquiring the table number and the data volume from the meta information: order table 100 ten thousand rows, 100GB size, commodity table 100 rows, 10MB size. Taking a table with standard line number smaller than 10000 or data size smaller than 50MB as a small table, and determining the Commodity table as the small table at the moment;
reading a Commodity table from a database, wherein the Commodity table relates to 2 specified sub-tables, and a cost calculation formula is as follows: cost=scanfactor MAX (DB 1 read cost factor, DB2 read cost factor)/1000 [ Commodity line number/100+1 ], where ScanFactor is a weight factor of the scan table, e.g., a constant of 0.2;
Reading data in a target column value domain of an Order table, calculating a column value domain corresponding TO a storage node where corresponding data is located according TO a where condition (where to_days (now ()) -to_days (o.time)), at this time, acquiring a column interval factor M (ms/hundred rows) corresponding TO the column value domain of the data TO be read from statistical information, and estimating a data result number N and a data size X according TO a data amount and a sub-table number, wherein cost=scanfactor M (N/100+1)/1000+x/machine memory size MemFactor, wherein ScanFactor is a scan table weight factor (constant), and MemFactor is a memory consumption factor (constant, e.g. 0.3);
pushing down a Commodity table, which involves performing a write operation to write the Commodity table to the Order tables of two DB storage nodes, with cost = PushFactor size MB MAX (DB 1 write cost factor, DB2 write cost factor)/1000, where PushFactor is a push down SQL cost factor (constant, e.g., 0.2);
the join result summary of the designated sub-table, that is, the read of the join result of the designated sub-table, assumes the number N (N1 in DB1, N2 in DB2, n1+n2=n) and the data size X of the data result corresponding to the Order table, and the cost cost=mergefactor (N1×db1 read cost factor, N2×db2 read cost factor)/1000+x/compute node memory size MemFactor at this stage, where MergeFactor is a scanning weight factor (constant, e.g., 0.1).
Summarizing the costs of the steps of the two planning trees, selecting a target execution plan with the minimum execution cost from the two planning trees, and finally, outputting the result after execution to the access terminal.
The cost calculation formulas and constants are preset according to actual conditions.
In this embodiment, when the execution cost of the execution plan is evaluated, the cost of pushing down the Commodity table and reading the Order table value field is compared, so that an execution plan with smaller cost is selected.
In this embodiment, the execution cost of each target execution plan is evaluated, so as to obtain the execution cost of each target execution plan; selecting a target execution plan with the minimum execution cost from all the target execution plans; and executing the SQL statement according to the selected target execution plan to obtain an execution result. It can be seen that, in this embodiment, the execution cost evaluation method of the present invention is used to evaluate each target execution plan, and the target execution plan with the smallest execution cost is selected from each target execution plan. Therefore, by the scheme, the execution cost can be evaluated more accurately, so that a target execution plan with the minimum execution cost is selected, and the efficiency of executing SQL sentences is improved.
The embodiment of the invention also provides an evaluation device of the execution cost, which is applied to the computing nodes in the distributed storage system; the distributed storage system further includes a plurality of storage nodes; as shown in fig. 8, the apparatus includes:
a first determining module 810, configured to determine a target execution plan of an execution cost to be evaluated, where the target plan is an execution plan generated for a structured query language SQL statement;
a second determining module 820, configured to determine, for each execution action included in the target execution plan, if the execution action belongs to an access operation for a target data table, an execution cost of the execution action according to target table information of each designated sub-table and meta information of the target data table related to the access operation; wherein each designated sub-table is a sub-table in each storage node concerning the target data table, and target table information of each designated sub-table is used for characterizing: time consuming when the access operation is performed on the designated sub-table;
a third determining module 830, configured to determine an execution cost of the target execution plan based on the determined execution cost of each execution action.
Optionally, the apparatus further comprises:
the request module is used for requesting the target table information of each designated sub-table and the meta-information of the target data table related to the access operation from the preset global management node before the second determination module determines the execution cost of the execution action according to the target table information of each designated sub-table and the meta-information of the target data table related to the access operation;
the global management node stores target table information of sub-tables of each data table in advance, and meta-information of the target data table related to the access operation.
Optionally, the generating manner of the target table information of each designated sub-table includes:
performing the access operation on each designated sub-table;
and determining target table information of each designated sub-table based on time consumption generated in the access operation execution process.
Optionally, the second determining module determines, according to the target table information of each designated sub-table and the meta information of the target data table related to the access operation, an execution cost of the execution action, including:
If the access operation is a first type read operation, determining the execution cost of the execution action according to the read cost factors of each designated sub-table and the total row number of the target data table; wherein the first type of read operation is a read operation for the entire table of the target data table; the read cost factor for each designated sub-table is used to characterize: time consuming reading data of a specified line number from the specified sub-table;
if the access operation is a second type of read operation, determining the execution cost of the execution action according to a column interval factor to be utilized and the number of rows and the data size to be read in the column value domain; wherein the second type of read operation is a read operation directed to the target data list value field; the column interval factors to be utilized are column interval factors about the column value range, wherein the column interval factors are of a designated sub-table with the column value range; the column interval factor is used to characterize: time consuming reading data of a specified number of rows from the column value field.
Optionally, the second determining module determines the execution cost of the execution action according to the read cost factor of each designated sub-table and the total number of rows of the target data table, including:
Determining the maximum value from the read cost factors of each designated sub-table as the read cost factor to be utilized;
substituting the to-be-utilized reading cost factor and the total number of rows of the target data table into a preset first price calculation formula, and calculating the execution cost of the execution action, wherein the first price calculation formula is a formula set for the reading operation;
the second determining module determines the execution cost of the execution action according to the column interval factor to be utilized and the number of rows and the data size to be read in the column value domain, and the second determining module comprises the following steps:
substituting a column interval factor to be utilized, the number of lines to be read in the column value domain and the data size into a preset second cost calculation formula, and calculating the execution cost of the execution action, wherein the second cost calculation formula is a formula set for the reading operation of the column value domain.
The embodiment of the invention also provides a response device aiming at the access request, which is applied to the computing nodes in the distributed storage system; as shown in fig. 9, the apparatus includes:
the generating module 910 is configured to generate each target execution plan of an SQL statement when receiving a structured query language SQL statement sent by an access terminal;
The evaluation module 920 is configured to evaluate the execution cost of each target execution plan according to the above-mentioned execution cost evaluation method, so as to obtain the execution cost of each target execution plan;
a selecting module 930, configured to select a target execution plan with the smallest execution cost from the target execution plans;
the execution module 940 is configured to execute the target SQL according to the selected target execution plan to obtain an execution result;
and a feedback module 950, configured to feed back the execution result to the access terminal.
The embodiment of the invention also provides an electronic device, as shown in fig. 10, which comprises a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 complete communication with each other through the communication bus 1004,
a memory 1003 for storing a computer program;
the processor 1001 is configured to implement the above-described step of the execution cost evaluation method or the response method to the access request when executing the program stored in the memory 1003.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, there is also provided a computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of the above-described execution cost evaluation method or response method to an access request.
In a further embodiment of the present invention, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of evaluating the performance costs or the method of responding to an access request of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (10)

1. The evaluation method of the execution cost is characterized by being applied to computing nodes in a distributed storage system; the distributed storage system further includes a plurality of storage nodes; the method comprises the following steps:
determining a target execution plan of an execution cost to be evaluated, wherein the target plan is an execution plan generated for a Structured Query Language (SQL) statement;
for each execution action included in the target execution plan, if the execution action belongs to an access operation for a target data table, determining the execution cost of the execution action according to target table information of each designated sub-table and meta-information of the target data table related to the access operation; wherein each designated sub-table is a sub-table in each storage node concerning the target data table, and target table information of each designated sub-table is used for characterizing: time consuming when the access operation is performed on the designated sub-table;
Based on the determined execution cost of each execution action, an execution cost of the target execution plan is determined.
2. The method of claim 1, wherein before determining the execution cost of the execution action based on the target table information of each of the designated sub-tables and the meta-information of the target data table associated with the access operation, the method further comprises:
requesting target table information of each designated sub-table and meta-information of the target data table related to the access operation from the preset global management node;
the global management node stores target table information of sub-tables of each data table and meta-information of each data table related to the access operation in advance.
3. The method of claim 1, wherein the generating manner of the target table information of each of the designated sub-tables includes:
performing the access operation on each designated sub-table;
and determining target table information of each designated sub-table based on time consumption generated in the access operation execution process.
4. A method according to any one of claims 1-3, wherein said determining the execution cost of the execution action based on the target table information of the respective designated sub-table and the meta-information of the target data table associated with the access operation comprises:
If the access operation is a first type read operation, determining the execution cost of the execution action according to the read cost factors of each designated sub-table and the total row number of the target data table; wherein the first type of read operation is a read operation for the entire table of the target data table; the read cost factor for each designated sub-table is used to characterize: time consuming reading data of a specified line number from the specified sub-table;
if the access operation is a second type of read operation, determining the execution cost of the execution action according to a column interval factor to be utilized and the number of rows and the data size to be read in the column value domain; wherein the second type of read operation is a read operation directed to the target data list value field; the column interval factors to be utilized are column interval factors about the column value range, wherein the column interval factors are of a designated sub-table with the column value range; the column interval factor is used to characterize: time consuming reading data of a specified number of rows from the column value field.
5. The method of claim 4, wherein determining the execution cost of the execution action based on the read cost factor for each of the designated sub-tables and the target data table total number of rows comprises:
Determining the maximum value from the read cost factors of each designated sub-table as the read cost factor to be utilized;
substituting the to-be-utilized reading cost factor and the total number of rows of the target data table into a preset first price calculation formula, and calculating the execution cost of the execution action, wherein the first price calculation formula is a formula set for the reading operation;
the determining the execution cost of the execution action according to the column interval factor to be utilized and the number of rows and the data size to be read in the column value domain comprises the following steps:
substituting a column interval factor to be utilized, the number of lines to be read in the column value domain and the data size into a preset second cost calculation formula, and calculating the execution cost of the execution action, wherein the second cost calculation formula is a formula set for the reading operation of the column value domain.
6. The response method for the access request is characterized by being applied to the computing nodes in the distributed storage system; the method comprises the following steps:
when receiving a Structured Query Language (SQL) statement sent by an access terminal, generating each target execution plan of the SQL statement;
The method according to any one of claims 1-5, wherein the execution cost of each target execution plan is evaluated to obtain the execution cost of each target execution plan;
selecting a target execution plan with the minimum execution cost from all the target execution plans;
executing the SQL sentence according to the selected target execution plan to obtain an execution result;
and feeding back the execution result to the access terminal.
7. An evaluation device of execution cost is characterized by being applied to computing nodes in a distributed storage system; the distributed storage system further includes a plurality of storage nodes; the device comprises:
the first determining module is used for determining a target execution plan of the execution cost to be evaluated, wherein the target plan is an execution plan generated for a Structured Query Language (SQL) statement;
the second determining module is configured to determine, for each execution action included in the target execution plan, if the execution action belongs to an access operation for a target data table, an execution cost of the execution action according to target table information of each designated sub-table and meta-information of the target data table related to the access operation; wherein each designated sub-table is a sub-table in each storage node concerning the target data table, and target table information of each designated sub-table is used for characterizing: time consuming when the access operation is performed on the designated sub-table;
And a third determining module, configured to determine an execution cost of the target execution plan based on the determined execution cost of each execution action.
8. A response means for an access request, characterized by being applied to a compute node in a distributed storage system; the device comprises:
the generation module is used for generating each target execution plan of the SQL statement when receiving the structured query language SQL statement sent by the access terminal;
an evaluation module, configured to evaluate the execution cost of each target execution plan according to the method of any one of claims 1-5, to obtain the execution cost of each target execution plan;
the selecting module is used for selecting a target execution plan with the minimum execution cost from all the target execution plans;
the execution module is used for executing the SQL sentence according to the selected target execution plan to obtain an execution result;
and the feedback module is used for feeding back the execution result to the access terminal.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
A memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-6 when executing a program stored on a memory.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.
CN202210002360.XA 2022-01-04 2022-01-04 Evaluation method and device of execution cost and electronic equipment Pending CN116431448A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210002360.XA CN116431448A (en) 2022-01-04 2022-01-04 Evaluation method and device of execution cost and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210002360.XA CN116431448A (en) 2022-01-04 2022-01-04 Evaluation method and device of execution cost and electronic equipment

Publications (1)

Publication Number Publication Date
CN116431448A true CN116431448A (en) 2023-07-14

Family

ID=87084212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210002360.XA Pending CN116431448A (en) 2022-01-04 2022-01-04 Evaluation method and device of execution cost and electronic equipment

Country Status (1)

Country Link
CN (1) CN116431448A (en)

Similar Documents

Publication Publication Date Title
US20220405284A1 (en) Geo-scale analytics with bandwidth and regulatory constraints
US7383247B2 (en) Query routing of federated information systems for fast response time, load balance, availability, and reliability
US10509804B2 (en) Method and apparatus for storing sparse graph data as multi-dimensional cluster
US8732163B2 (en) Query optimization with memory I/O awareness
US9875186B2 (en) System and method for data caching in processing nodes of a massively parallel processing (MPP) database system
US20070185912A1 (en) Off-loading I/O and computationally intensive operations to secondary systems
US8868595B2 (en) Enhanced control to users to populate a cache in a database system
US9652498B2 (en) Processing queries using hybrid access paths
US20140006379A1 (en) Efficient partitioned joins in a database with column-major layout
US20100228764A1 (en) Offline Validation of Data in a Database System for Foreign Key Constraints
CN102479239A (en) Method and device for per-storing RDF ternary data
CN102436494A (en) Device and method for optimizing execution plan and based on practice testing
US20070219973A1 (en) Dynamic statement processing in database systems
CN109885585B (en) Distributed database system and method supporting stored procedures, triggers and views
CN114356921A (en) Data processing method, device, server and storage medium
US10877973B2 (en) Method for efficient one-to-one join
EP3940547A1 (en) Workload aware data partitioning
CN114860764A (en) Optimization method and system for distributed database query and electronic equipment
US8548980B2 (en) Accelerating queries based on exact knowledge of specific rows satisfying local conditions
CN116431448A (en) Evaluation method and device of execution cost and electronic equipment
US20070220058A1 (en) Management of statistical views in a database system
US8046394B1 (en) Dynamic partitioning for an ordered analytic function
EP4174678A1 (en) Cloud analysis scenario-based hybrid query method and system, and storage medium
CN114238387A (en) Data query method and device, electronic equipment and storage medium
CN113076332A (en) Execution method of database precompiled query statement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination