CN110069522A - Data query method and apparatus - Google Patents

Data query method and apparatus Download PDF

Info

Publication number
CN110069522A
CN110069522A CN201711105172.5A CN201711105172A CN110069522A CN 110069522 A CN110069522 A CN 110069522A CN 201711105172 A CN201711105172 A CN 201711105172A CN 110069522 A CN110069522 A CN 110069522A
Authority
CN
China
Prior art keywords
execution
cost
scanned
determining
execution plan
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711105172.5A
Other languages
Chinese (zh)
Inventor
周正中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201711105172.5A priority Critical patent/CN110069522A/en
Publication of CN110069522A publication Critical patent/CN110069522A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24549Run-time optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present invention provides a kind of data query method and apparatus, this method comprises: obtaining multiple executive plans corresponding with query statement;Obtain the corresponding hardware environment index of multiple executive plans;The corresponding executory cost of multiple executive plans is determined according to hardware environment index;Query statement is responded with the minimum executive plan of executory cost in multiple executive plans, obtains query result.In cost calculating process by the way that hardware environment factor to be considered to executive plan, how to change from but regardless of hardware environment, the cost for each executive plan determined all matches with involved hardware environment when executing, and enables the cost calculated consumed cost when really reflecting practical execution.

Description

Data query method and device
Technical Field
The invention relates to the technical field of internet, in particular to a data query method and a data query device.
Background
When executing a Structured Query Language (SQL) statement, a relational database may obtain Query results through various execution plans, for example, through index scanning or through full-table scanning. The consumed costs are often different for different execution plans, such as the consumed IO cost, the CPU cost, and the like, and therefore, it is generally necessary to select the execution plan with the lowest consumed cost from a plurality of execution plans of the SQL statement.
Currently, a plurality of execution plans corresponding to SQL statements are generated by using a Cost-Based Optimizer (CBO) and an execution plan with the minimum execution Cost is selected from the plurality of execution plans. The CBO may obtain influencing factors influencing the execution of each execution plan and estimate the cost that may be consumed by the execution plans based on the weights of the influencing factors. However, the CBO may preset the weight values of various influencing factors, and the preset weight values may not truly reflect the actual consumption of various resources, so the execution cost estimated by the CBO is only a hypothetical cost, which is probably not consistent with the actual consumption cost.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data query method and apparatus, so as to improve accuracy of estimating execution plan cost.
In a first aspect, an embodiment of the present invention provides a data query method, including:
acquiring a plurality of execution plans corresponding to the query statement;
acquiring hardware environment indexes corresponding to the execution plans;
determining the execution cost corresponding to each of the plurality of execution plans according to the hardware environment index;
responding to the query statement with an execution plan of the plurality of execution plans that has a lowest execution cost.
In a second aspect, an embodiment of the present invention provides a data query apparatus, including:
a first obtaining module, configured to obtain multiple execution plans corresponding to the query statement;
a second obtaining module, configured to obtain hardware environment indicators corresponding to the multiple execution plans;
the determining module is used for determining the execution cost corresponding to each of the plurality of execution plans according to the hardware environment index;
and the execution module is used for responding to the query statement by using the execution plan with the lowest execution cost in the plurality of execution plans.
In a possible design, the data query apparatus includes a processor and a memory, the memory is used for storing a program that supports the data query apparatus to execute the data query method in the first aspect, and the processor is configured to execute the program stored in the memory. The data query apparatus may further include a communication interface for the data query apparatus to communicate with other devices or a communication network.
In a third aspect, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for a data query device, which contains a program for executing the data query method in the first aspect.
According to the data query method and device provided by the embodiment of the invention, aiming at the SQL query statement input by a user, a plurality of execution plans corresponding to the query statement are firstly obtained, then hardware environment indexes corresponding to the execution plans are further obtained, the execution cost corresponding to each of the execution plans is determined by combining the hardware environment indexes corresponding to each of the current execution plans, and finally the execution plan with the lowest execution cost in the execution plans responds to the query statement to obtain a query result. By considering the hardware environment factors in the cost calculation process of the execution plan, the determined cost of each execution plan is matched with the hardware environment involved in execution no matter how the hardware environment changes, so that the calculated cost can truly reflect the cost consumed in actual execution.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a first embodiment of a data query method according to the present invention;
FIG. 2 is a diagram illustrating a database storage situation according to an embodiment of the present invention;
FIG. 3 is a flow chart of one implementation of step 103 in the embodiment shown in FIG. 1;
FIG. 4 is a flow chart of another implementation of step 103 in the embodiment shown in FIG. 1;
fig. 5 is a schematic structural diagram of a data query apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device corresponding to a data query apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and "a" and "an" generally include at least two, but do not exclude at least one, unless the context clearly dictates otherwise.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.
In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.
Fig. 1 is a flowchart of a first embodiment of a data query method according to an embodiment of the present invention, where the data query method provided in this embodiment may be executed by a data query device, and the data query device may be implemented as software or as a combination of software and hardware, and the data query device may be disposed in a server. As shown in fig. 1, the method comprises the steps of:
101. a plurality of execution plans corresponding to the query statement are obtained.
The query statement in this embodiment may be an SQL query statement input by a user. Because the SQL query statement is a high-level abstract declarative language, it merely defines what data to retrieve from the database, and does not tell how to retrieve the data, or the method and steps of retrieving the data, are not defined. Therefore, when receiving an SQL query statement, first, an execution plan corresponding to the query statement is generated, and the execution plan describes how to execute the query statement. The generation process of the execution plan can be implemented by referring to the existing CBO generation mode, which is not described in detail in the embodiment of the present invention.
102. And acquiring hardware environment indexes corresponding to the multiple execution plans.
Each generated execution plan describes respective operations required to execute the query statement, and the hardware environment of the operations corresponding to each execution plan may be different. In the embodiment of the present invention, the influence of the actual hardware environment on the execution cost of the execution plan is considered, and therefore, a hardware environment index corresponding to each execution plan needs to be acquired.
The influence of the hardware environment on the execution cost of the execution plan can be embodied as two levels: first, the server is upgraded by the user, so as to change the hardware composition of the server, for example, the mechanical hard disk is updated to a Solid State Drive (SSD); the server is replaced, and the CPU dominant frequency and the core number of the server are improved. Therefore, from the perspective before and after the hardware upgrade, the execution cost consumption situation may be different for the same execution plan due to different hardware environments. Secondly, for a plurality of execution plans corresponding to one query statement, the hardware environment involved in the execution of each execution plan may also be different, which may also result in different execution costs for the execution plans.
Optionally, the hardware environment involved in the embodiment of the present invention may include a storage carrier storing each table in the database, that is, storing the data block corresponding to each table, and/or a processor responding to the query statement, for example, a CPU in the server.
As shown in fig. 2, assuming that the server receiving the query statement of the user is a, the CPU in the server a is the processor. Assuming that the database corresponds to three tables, namely table 1, table 2 and table 3, the three tables may or may not be all stored in server a, for example, it is assumed in fig. 2 that table 1 and table 2 are stored in server a and table 3 is stored in server B. In addition, one or more storage hard disks may be deployed in the same server, and the types and the performances of the storage hard disks may be the same or different, and it is assumed that table 1 is stored in the mechanical hard disk of server a, table 2 is stored in the SSD hard disk of server a, and table 3 is stored in the SSD hard disk of server B. Thus, in this example, the mechanical hard disk of server a, the SSD hard disk, and the SSD hard disk of server B are all storage carriers.
In addition, the storage carrier in the embodiment of the present invention may include, in addition to the hard disk, a memory in the server that processes the query statement, that is, executes the multiple execution plans, because in some cases, the data block to be scanned corresponding to a certain execution plan may partially exist in the memory.
Based on this, obtaining the hardware environment index corresponding to each of the plurality of execution plans may be implemented as:
determining a storage carrier where a data block to be scanned corresponding to each of the plurality of execution plans is located;
a preset performance index of the storage carrier and/or a preset performance index of a processor executing a plurality of execution plans is obtained.
Thus, for each execution plan, the hardware environment index corresponding to the execution plan is a preset performance index of the storage carrier on which the corresponding data block to be scanned is located and/or a preset performance index of the processor executing the execution plan. Multiple execution plans are executed by the same processor. Assuming that the data blocks to be scanned corresponding to a certain execution plan are all the data blocks corresponding to table 1 in the above example, the hardware environment index corresponding to the execution plan may be: the method comprises the following steps that a preset performance index of a processor in a server A and/or a preset performance index of a mechanical hard disk in the server A are/is assumed, and at the moment, all data blocks to be scanned are supposed to be stored in the mechanical hard disk in the server A. The determination process of the data block to be scanned corresponding to each execution plan will be described later.
Optionally, the preset performance index of the hard disk in the storage carrier includes, for example, an IOPS index, i.e., the number of times that read and write can be performed per second, and a throughput, i.e., the amount of data transmitted per second, where the throughput is IOPS × data block size, and the data block size is a known amount. The preset performance indexes of the memory in the storage carrier include, for example, a main frequency and a capacity of the memory, where the main frequency of the memory determines a scanning speed of the data block, and the capacity of the memory determines a hit rate of the data block to be scanned in the memory. The preset performance index of the processor may include, for example, the number of CPU cores and CPU dominant frequency. In addition, optionally, the preset performance index corresponding to the storage carrier may further include a network card rate corresponding to the device where the storage carrier is located, and the network card rate may be embodied as: corresponding weighting coefficients are set according to different network card rate levels, and the influence of the weighting coefficients on the reading and writing cost can be as follows: multiplied by IOPS or throughput.
103. And determining the execution cost corresponding to each of the plurality of execution plans according to the hardware environment index corresponding to each of the plurality of execution plans.
In the embodiment of the present invention, the execution cost may include two types of costs: the read-write cost may alternatively be referred to as the IO cost and the arithmetic cost or may alternatively be referred to as the CPU cost. The read-write cost and the operation cost may be specifically expressed as time consumed by read-write and time consumed by operation.
And the influence of the hardware environment index on the execution cost of the execution plan can be represented as: for a certain execution plan, the number of data blocks to be scanned and the number of records to be operated or referred to as the number of rows to be operated are determined, so that the preset performance index of the storage carrier corresponding to the execution plan will act on the number of data blocks to be scanned to obtain the read-write cost of the execution plan, and the preset performance index of the processor will act on the number of records to be operated to obtain the operation cost of the execution plan. Thus, the execution cost of the execution plan may appear as an accumulation of the read-write cost and the computational cost.
Alternatively, only the preset performance index of the storage carrier may be considered, and at this time, the influence on the read-write cost of the execution plan is considered; alternatively, only the preset performance index of the processor may be considered, and at this time, the influence on the operation cost of the execution plan is considered. Optionally, the preset performance index of the storage carrier may be considered as well as the preset performance index of the processor, and at this time, the influence on the operation cost and the read-write cost of the execution plan is considered.
The function of the preset performance index of the storage carrier on the number of data blocks to be scanned is represented as follows: determining a read-write cost required for scanning a data block according to a preset performance index of the storage carrier, so that the read-write cost of the execution plan can be expressed as: the number of data blocks to be scanned is the read-write cost required for scanning one data block. In practical application, taking a storage carrier as a hard disk as an example, the read-write cost corresponding to the hard disk can also be expressed as: data block size/hard disk throughput to be scanned.
Similarly, the action of the preset performance index of the processor on the number of records to be operated is embodied as follows: the computation cost required to compute a record is determined according to the preset performance index of the processor, and thus, the computation cost of executing the plan can be expressed as: the number of records to be operated is the operation cost required for operating one record.
The determination process of the number of data blocks to be scanned and the number of records to be calculated will be described in the following embodiments, and in this embodiment, only how the hardware environment index acts in the cost calculation process of the execution plan is emphasized.
In addition, it should be noted that different operation types may be preset with different weighting values, and therefore, when calculating the operation cost, the influence of the operation type may also be considered, and thus, the operation cost of the execution plan may also be expressed as: the record number to be calculated calculates the type weighted value and calculates the calculation cost required by one record. Each execution plan includes information of operation types, such as sorting, comparison, and arithmetic of four rules, and the operation types may also be referred to as operators.
104. The query statement is responded to with the execution plan having the lowest execution cost among the plurality of execution plans.
And after obtaining the execution cost corresponding to each of the plurality of execution plans, selecting the execution plan with the minimum execution cost from the plurality of execution plans, responding to the query statement by using the execution plan, and obtaining a corresponding query result to feed back to the user.
In summary, for an SQL query statement input by a user, first, a plurality of execution plans corresponding to the query statement are obtained, and then, hardware environment indexes corresponding to the plurality of execution plans are obtained, an execution cost corresponding to each of the plurality of execution plans is determined by combining the hardware environment indexes corresponding to each of the plurality of current execution plans, and finally, the execution plan with the lowest execution cost in the plurality of execution plans responds to the query statement to obtain a query result. By considering the hardware environment factors in the cost calculation process of the execution plan, the determined cost of each execution plan is matched with the hardware environment involved in execution no matter how the hardware environment changes, so that the calculated cost can truly reflect the cost consumed in actual execution.
The following describes the above determination process of the number of data blocks to be scanned and the number of records to be operated in detail with reference to the embodiments shown in fig. 3 and 4.
For a plurality of execution plans obtained, the scanning modes adopted by different execution plans may be different, and generally, there are two scanning modes: the full-table scan mode and the index scan mode may be alternatively referred to as a sequential scan mode and a discrete scan mode. The determination methods of the number of the data blocks to be scanned and the number of records to be operated corresponding to different scanning methods are also different.
Fig. 3 illustrates how to determine the number of data blocks to be scanned, the number of records to be operated, and how to perform the cost calculation in the full-table scanning mode, including:
301. and if the plurality of execution plans comprise a first execution plan corresponding to the full-table scanning mode, determining the read-write cost of the first execution plan according to the total number of the data blocks contained in the first table to be scanned corresponding to the first execution plan and the preset performance index of the storage carrier where the data blocks contained in the first table to be scanned are located.
302. And determining the operation cost of the first execution plan according to the total number of records contained in the first table to be scanned and the preset performance index of the processor.
It is assumed in this embodiment that the preset performance level of the storage carrier and the preset performance level of the processor are taken into account.
For an execution plan adopting a full-table scanning mode, all data blocks of a first table to be scanned corresponding to the execution plan need to be scanned, and correspondingly, all records contained in the first table to be scanned need to be operated. At this time, the number of data blocks to be scanned corresponding to the first execution plan is the total number of data blocks corresponding to the first table to be scanned, and the number of records to be calculated is the total number of records, which is the number of rows included in the first table to be scanned.
In addition, for the same hard disk, the performance indexes corresponding to different scanning modes are different, so for an execution plan adopting the full-table scanning mode, it is necessary to obtain the performance index of the full-table scanning mode corresponding to the hard disk where the first table to be scanned of the execution plan is located.
Assuming that a first table to be scanned corresponding to a first execution plan includes 1000 data blocks, and that 800 data blocks are stored in the memory and 200 data blocks are stored in a hard disk, the read-write cost at this time includes: the memory read-write cost + the hard disk read-write cost, where the memory read-write cost may be represented as 800 × the read-write cost required for reading one data block in the memory, and the hard disk read-write cost may be represented as the read-write cost required for reading one data block in the 800 × hard disk full-table scanning mode.
Fig. 4 illustrates how to determine the number of data blocks to be scanned, the number of records to be operated, and how to perform the cost calculation in the index scanning mode, including:
401. and if the plurality of execution plans comprise a second execution plan corresponding to the index scanning mode, acquiring the statistical information of a second table to be scanned corresponding to the second execution plan.
Furthermore, the execution cost corresponding to the second execution plan is determined according to the obtained statistical information and the hardware environment index corresponding to the second execution plan, which is specifically as follows:
402. and determining the number of the data blocks to be scanned and the number of records to be operated corresponding to the second execution plan according to the statistical information.
403. And determining the read-write cost of the second execution plan according to the number of the data blocks to be scanned and the preset performance index of the storage carrier where the data blocks to be scanned are located.
404. And determining the operation cost of the second execution plan according to the number of records to be operated and the preset performance index of the processor.
It is assumed in this embodiment that the preset performance level of the storage carrier and the preset performance level of the processor are taken into account.
According to the foregoing embodiment, for an execution plan adopting an index scanning manner, it is necessary to obtain a performance index of a hard disk in which a data block to be scanned of the execution plan is located, where the hard disk corresponds to the index scanning manner, so as to calculate the read-write cost of the execution plan based on the performance index.
In addition, in the index scanning mode, the number of data blocks to be scanned and the number of records to be operated are not all data blocks and all records corresponding to the second table to be scanned corresponding to the execution plan, but the number of data blocks to be scanned and the number of records to be operated at this time need to be determined by combining the statistical information of the second table to be scanned.
The statistical information of the second table to be scanned mainly reflects the distribution characteristics of the data in the table, and may include the total number of records in the table, i.e., the total number of rows, a histogram of columns, the ratio of high-frequency words and high-frequency words in the columns, the linear correlation between the data writing sequence and the row number updating sequence, and so on.
The process of determining the number of data blocks to be scanned and the number of records to be operated according to the statistical information of the table can refer to the implementation process of CBO. In the present embodiment, only a simple example is shown: assuming that the index condition of a certain execution plan includes a user name a, a is a high-frequency word, and the ratio corresponding to the high-frequency word is 1/1000, the total number of rows included in the second to-be-scanned table corresponding to the execution plan is assumed to be 10000 times the ratio, so that the number of records to be calculated is 10. In addition, it is assumed that each data block includes 100 records, and the linear correlation value is 1, which indicates that the 10 records to be operated are located in one data block, and the number of the data blocks to be scanned is 1. Thus, after obtaining the number of data blocks to be scanned and the number of records to be operated, the execution cost of the execution plan can be determined based on the performance index of the storage carrier and the performance index of the processor.
The data query device of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these data querying devices may each be configured using commercially available hardware components through the steps taught in this scheme.
Fig. 5 is a schematic structural diagram of a data query apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus includes: the device comprises a first acquisition module 11, a second acquisition module 12, a determination module 13 and an execution module 14.
The first obtaining module 11 is configured to obtain a plurality of execution plans corresponding to the query statement.
A second obtaining module 12, configured to obtain a hardware environment index corresponding to each of the multiple execution plans.
And the determining module 13 is configured to determine, according to the hardware environment index, an execution cost corresponding to each of the plurality of execution plans.
And the execution module 14 is used for responding to the query statement by using the execution plan with the lowest execution cost in the plurality of execution plans.
Optionally, the second obtaining module 12 is specifically configured to:
determining a storage carrier where a data block to be scanned corresponding to each of the plurality of execution plans is located;
and acquiring preset performance indexes of the storage carrier and/or preset performance indexes of a processor executing the multiple execution plans as the hardware environment indexes.
Optionally, the determining module 13 includes: a first determining unit 131 and a second determining unit 132.
A first determining unit 131, configured to determine, if the plurality of execution plans includes a first execution plan corresponding to a full-table scanning manner, a read-write cost of the first execution plan according to a total number of data blocks included in a first table to be scanned corresponding to the first execution plan and a preset performance index of a storage carrier in which the data blocks included in the first table to be scanned are located; and determining the operation cost of the first execution plan according to the total number of records contained in the first table to be scanned and a preset performance index of the processor.
A second determining unit 132, configured to obtain statistical information of a second table to be scanned corresponding to a second execution plan if the plurality of execution plans include the second execution plan corresponding to the index scanning manner;
and determining the execution cost corresponding to the second execution plan according to the statistical information and the hardware environment index corresponding to the second execution plan.
Optionally, the second determining unit 132 is specifically configured to:
determining the number of data blocks to be scanned and the number of records to be operated corresponding to the second execution plan according to the statistical information;
determining the read-write cost of the second execution plan according to the number of the data blocks to be scanned and the preset performance index of the storage carrier where the data blocks to be scanned are located;
and determining the operation cost of the second execution plan according to the number of the records to be operated and the preset performance index of the processor.
The apparatus shown in fig. 5 can perform the method of the embodiments shown in fig. 1, fig. 3, and fig. 4, and the related descriptions of the embodiments shown in fig. 1, fig. 3, and fig. 4 can be referred to for the parts of this embodiment not described in detail. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1, fig. 3, and fig. 4, and are not described herein again.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The internal functions and structures of the data query device are described above, and in one possible design, the structure of the data query device may be implemented as an electronic device, such as a server, as shown in fig. 6, and the electronic device may include: a processor 21 and a memory 22. Wherein the memory 22 is used for storing a program for supporting a data query device to execute the data query method provided in any one of the above embodiments, and the processor 21 is configured to execute the program stored in the memory 22.
The program comprises one or more computer instructions which, when executed by the processor 21, are capable of performing the steps of:
acquiring a plurality of execution plans corresponding to the query statement;
acquiring hardware environment indexes corresponding to the execution plans;
determining the execution cost corresponding to each of the plurality of execution plans according to the hardware environment index;
responding to the query statement with an execution plan of the plurality of execution plans that has a lowest execution cost.
Optionally, the processor 21 is further configured to perform all or part of the steps of the foregoing methods.
The data query apparatus may further include a communication interface 23, configured to communicate with other devices or a communication network.
In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for a data query apparatus, which includes a program for executing the data query method in the foregoing method embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for querying data, comprising:
acquiring a plurality of execution plans corresponding to the query statement;
acquiring hardware environment indexes corresponding to the execution plans;
determining the execution cost corresponding to each of the plurality of execution plans according to the hardware environment index;
responding to the query statement with an execution plan of the plurality of execution plans that has a lowest execution cost.
2. The method of claim 1, wherein obtaining hardware environment metrics corresponding to each of the plurality of execution plans comprises:
determining a storage carrier where a data block to be scanned corresponding to each of the plurality of execution plans is located;
and acquiring preset performance indexes of the storage carrier and/or preset performance indexes of a processor executing the multiple execution plans as the hardware environment indexes.
3. The method of claim 2, wherein if the hardware environment indicator includes a preset performance indicator of the storage carrier and a preset performance indicator of a processor, the determining the execution cost corresponding to each of the plurality of execution plans according to the hardware environment indicator includes:
if the plurality of execution plans include a first execution plan corresponding to a full-table scanning mode, determining the read-write cost of the first execution plan according to the total number of data blocks contained in a first table to be scanned corresponding to the first execution plan and a preset performance index of a storage carrier where the data blocks contained in the first table to be scanned are located;
and determining the operation cost of the first execution plan according to the total number of records contained in the first table to be scanned and a preset performance index of the processor.
4. The method of claim 2, wherein determining the execution cost for each of the plurality of execution plans based on the hardware environment metrics comprises:
if the plurality of execution plans comprise a second execution plan corresponding to an index scanning mode, acquiring statistical information of a second table to be scanned corresponding to the second execution plan;
and determining the execution cost corresponding to the second execution plan according to the statistical information and the hardware environment index corresponding to the second execution plan.
5. The method according to claim 4, wherein if the hardware environment index includes a preset performance index of the storage carrier and a preset performance index of a processor, the determining the execution cost corresponding to the second execution plan according to the statistical information and the hardware environment index corresponding to the second execution plan includes:
determining the number of data blocks to be scanned and the number of records to be operated corresponding to the second execution plan according to the statistical information;
determining the read-write cost of the second execution plan according to the number of the data blocks to be scanned and the preset performance index of the storage carrier where the data blocks to be scanned are located;
and determining the operation cost of the second execution plan according to the number of the records to be operated and the preset performance index of the processor.
6. A data query apparatus, comprising:
a first obtaining module, configured to obtain multiple execution plans corresponding to the query statement;
a second obtaining module, configured to obtain hardware environment indicators corresponding to the multiple execution plans;
the determining module is used for determining the execution cost corresponding to each of the plurality of execution plans according to the hardware environment index;
and the execution module is used for responding to the query statement by using the execution plan with the lowest execution cost in the plurality of execution plans.
7. The apparatus of claim 6, wherein the second obtaining module is specifically configured to:
determining a storage carrier where a data block to be scanned corresponding to each of the plurality of execution plans is located;
and acquiring preset performance indexes of the storage carrier and/or preset performance indexes of a processor executing the multiple execution plans as the hardware environment indexes.
8. The apparatus of claim 7, wherein the determining module comprises:
a first determining unit, configured to determine, if the plurality of execution plans include a first execution plan corresponding to a full-table scanning manner, a read-write cost of the first execution plan according to a total number of data blocks included in a first table to be scanned corresponding to the first execution plan and a preset performance index of a storage carrier in which the data blocks included in the first table to be scanned are located; determining the operation cost of the first execution plan according to the total number of records contained in the first table to be scanned and a preset performance index of the processor;
a second determining unit, configured to, if a second execution plan corresponding to an index scanning manner is included in the multiple execution plans, obtain statistical information of a second table to be scanned corresponding to the second execution plan; determining the number of data blocks to be scanned and the number of records to be operated corresponding to the second execution plan according to the statistical information; determining the read-write cost of the second execution plan according to the number of the data blocks to be scanned and the preset performance index of the storage carrier where the data blocks to be scanned are located; and determining the operation cost of the second execution plan according to the number of the records to be operated and the preset performance index of the processor.
9. An electronic device comprising a memory and a processor; wherein,
the memory is to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the data query method of any one of claims 1 to 5.
10. A computer-readable storage medium storing a computer program, wherein the computer program causes a computer to implement the data query method according to any one of claims 1 to 5 when executed.
CN201711105172.5A 2017-11-10 2017-11-10 Data query method and apparatus Pending CN110069522A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711105172.5A CN110069522A (en) 2017-11-10 2017-11-10 Data query method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711105172.5A CN110069522A (en) 2017-11-10 2017-11-10 Data query method and apparatus

Publications (1)

Publication Number Publication Date
CN110069522A true CN110069522A (en) 2019-07-30

Family

ID=67364506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711105172.5A Pending CN110069522A (en) 2017-11-10 2017-11-10 Data query method and apparatus

Country Status (1)

Country Link
CN (1) CN110069522A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639096A (en) * 2020-05-29 2020-09-08 北京奇艺世纪科技有限公司 SQL statement execution method and device, electronic equipment and storage medium
WO2021044263A1 (en) * 2019-09-03 2021-03-11 International Business Machines Corporation Reducing temp size over base table
CN112699140A (en) * 2019-10-23 2021-04-23 阿里巴巴集团控股有限公司 Data processing method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436494A (en) * 2011-11-11 2012-05-02 中国工商银行股份有限公司 Device and method for optimizing execution plan and based on practice testing
CN103176974A (en) * 2011-12-20 2013-06-26 国际商业机器公司 Method and device used for optimizing access path in data base
US20150254295A1 (en) * 2014-03-04 2015-09-10 International Business Machines Corporation Regression testing of sql execution plans for sql statements
CN105122239A (en) * 2013-03-13 2015-12-02 华为技术有限公司 System and method for adaptive vector size selection for vectorized query execution
CN106897343A (en) * 2016-07-20 2017-06-27 阿里巴巴集团控股有限公司 The lookup method of executive plan, storage method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436494A (en) * 2011-11-11 2012-05-02 中国工商银行股份有限公司 Device and method for optimizing execution plan and based on practice testing
CN103176974A (en) * 2011-12-20 2013-06-26 国际商业机器公司 Method and device used for optimizing access path in data base
CN105122239A (en) * 2013-03-13 2015-12-02 华为技术有限公司 System and method for adaptive vector size selection for vectorized query execution
US20150254295A1 (en) * 2014-03-04 2015-09-10 International Business Machines Corporation Regression testing of sql execution plans for sql statements
CN106897343A (en) * 2016-07-20 2017-06-27 阿里巴巴集团控股有限公司 The lookup method of executive plan, storage method and device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021044263A1 (en) * 2019-09-03 2021-03-11 International Business Machines Corporation Reducing temp size over base table
US11023464B2 (en) 2019-09-03 2021-06-01 International Business Machines Corporation Reducing temp size over a base table
CN114341833A (en) * 2019-09-03 2022-04-12 国际商业机器公司 Reducing TEMP size on base tables
GB2603361A (en) * 2019-09-03 2022-08-03 Ibm Reducing temp size over base table
GB2603361B (en) * 2019-09-03 2023-01-04 Ibm Reducing temp size over base table
CN114341833B (en) * 2019-09-03 2023-09-19 国际商业机器公司 Reducing TEMP size on base table
CN112699140A (en) * 2019-10-23 2021-04-23 阿里巴巴集团控股有限公司 Data processing method, device, equipment and storage medium
CN112699140B (en) * 2019-10-23 2023-12-26 阿里巴巴集团控股有限公司 Data processing method, device, equipment and storage medium
CN111639096A (en) * 2020-05-29 2020-09-08 北京奇艺世纪科技有限公司 SQL statement execution method and device, electronic equipment and storage medium
CN111639096B (en) * 2020-05-29 2024-03-08 北京奇艺世纪科技有限公司 SQL sentence execution method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
JP5088668B2 (en) Computer load estimation system, computer load estimation method, computer load estimation program
US20170161323A1 (en) Automatic generation of sub-queries
CN110704336B (en) Data caching method and device
CN111930848B (en) Data partition storage method, device and system
US20200057782A1 (en) Optimized navigable key-value store
CN115129782A (en) Partition level connection method and device for distributed database
WO2014077807A1 (en) Updating statistics in distributed databases
US8560506B2 (en) Automatic selection of blocking column for de-duplication
CN110069522A (en) Data query method and apparatus
CN109117433B (en) Index tree object creation and index method and related device thereof
US20150019528A1 (en) Prioritization of data from in-memory databases
CN103793439A (en) Real-time retrieval information acquisition method, real-time retrieval device, and real-time retrieval server
CN110737717A (en) database migration method and device
CN114238389A (en) Database query optimization method, apparatus, electronic device, medium, and program product
CN110362569A (en) The method of calibration and device of tables of data, electronic equipment, storage medium
US20190101911A1 (en) Optimization of virtual sensing in a multi-device environment
CN107193857B (en) Method and equipment for database traversal
CN111737266B (en) Block data access method, block data storage method and device
US20140310261A1 (en) Database index profile based weights for cost computation
CN117763024A (en) Data fragment extraction method and device
US20230153286A1 (en) Method and system for hybrid query based on cloud analysis scene, and storage medium
CN108021464B (en) Bottom-pocketing processing method and device for application response data
CN108241622B (en) Query script generation method and device
CN114818458A (en) System parameter optimization method, device, computing equipment and medium
US10262035B2 (en) Estimating data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210907

Address after: Room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: ALIBABA GROUP HOLDING Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211117

Address after: No.12, Zhuantang science and technology economic block, Xihu District, Hangzhou City, Zhejiang Province, 310012

Applicant after: Aliyun Computing Co.,Ltd.

Address before: 310052 room 508, 5th floor, building 4, No. 699 Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant before: Alibaba (China) Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20190730

RJ01 Rejection of invention patent application after publication