Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and "a" and "an" generally include at least two, but do not exclude at least one, unless the context clearly dictates otherwise.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.
In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.
Fig. 1 is a flowchart of a first embodiment of a data query method according to an embodiment of the present invention, where the data query method provided in this embodiment may be executed by a data query device, and the data query device may be implemented as software or as a combination of software and hardware, and the data query device may be disposed in a server. As shown in fig. 1, the method comprises the steps of:
101. a plurality of execution plans corresponding to the query statement are obtained.
The query statement in this embodiment may be an SQL query statement input by a user. Because the SQL query statement is a high-level abstract declarative language, it merely defines what data to retrieve from the database, and does not tell how to retrieve the data, or the method and steps of retrieving the data, are not defined. Therefore, when receiving an SQL query statement, first, an execution plan corresponding to the query statement is generated, and the execution plan describes how to execute the query statement. The generation process of the execution plan can be implemented by referring to the existing CBO generation mode, which is not described in detail in the embodiment of the present invention.
102. And acquiring hardware environment indexes corresponding to the multiple execution plans.
Each generated execution plan describes respective operations required to execute the query statement, and the hardware environment of the operations corresponding to each execution plan may be different. In the embodiment of the present invention, the influence of the actual hardware environment on the execution cost of the execution plan is considered, and therefore, a hardware environment index corresponding to each execution plan needs to be acquired.
The influence of the hardware environment on the execution cost of the execution plan can be embodied as two levels: first, the server is upgraded by the user, so as to change the hardware composition of the server, for example, the mechanical hard disk is updated to a Solid State Drive (SSD); the server is replaced, and the CPU dominant frequency and the core number of the server are improved. Therefore, from the perspective before and after the hardware upgrade, the execution cost consumption situation may be different for the same execution plan due to different hardware environments. Secondly, for a plurality of execution plans corresponding to one query statement, the hardware environment involved in the execution of each execution plan may also be different, which may also result in different execution costs for the execution plans.
Optionally, the hardware environment involved in the embodiment of the present invention may include a storage carrier storing each table in the database, that is, storing the data block corresponding to each table, and/or a processor responding to the query statement, for example, a CPU in the server.
As shown in fig. 2, assuming that the server receiving the query statement of the user is a, the CPU in the server a is the processor. Assuming that the database corresponds to three tables, namely table 1, table 2 and table 3, the three tables may or may not be all stored in server a, for example, it is assumed in fig. 2 that table 1 and table 2 are stored in server a and table 3 is stored in server B. In addition, one or more storage hard disks may be deployed in the same server, and the types and the performances of the storage hard disks may be the same or different, and it is assumed that table 1 is stored in the mechanical hard disk of server a, table 2 is stored in the SSD hard disk of server a, and table 3 is stored in the SSD hard disk of server B. Thus, in this example, the mechanical hard disk of server a, the SSD hard disk, and the SSD hard disk of server B are all storage carriers.
In addition, the storage carrier in the embodiment of the present invention may include, in addition to the hard disk, a memory in the server that processes the query statement, that is, executes the multiple execution plans, because in some cases, the data block to be scanned corresponding to a certain execution plan may partially exist in the memory.
Based on this, obtaining the hardware environment index corresponding to each of the plurality of execution plans may be implemented as:
determining a storage carrier where a data block to be scanned corresponding to each of the plurality of execution plans is located;
a preset performance index of the storage carrier and/or a preset performance index of a processor executing a plurality of execution plans is obtained.
Thus, for each execution plan, the hardware environment index corresponding to the execution plan is a preset performance index of the storage carrier on which the corresponding data block to be scanned is located and/or a preset performance index of the processor executing the execution plan. Multiple execution plans are executed by the same processor. Assuming that the data blocks to be scanned corresponding to a certain execution plan are all the data blocks corresponding to table 1 in the above example, the hardware environment index corresponding to the execution plan may be: the method comprises the following steps that a preset performance index of a processor in a server A and/or a preset performance index of a mechanical hard disk in the server A are/is assumed, and at the moment, all data blocks to be scanned are supposed to be stored in the mechanical hard disk in the server A. The determination process of the data block to be scanned corresponding to each execution plan will be described later.
Optionally, the preset performance index of the hard disk in the storage carrier includes, for example, an IOPS index, i.e., the number of times that read and write can be performed per second, and a throughput, i.e., the amount of data transmitted per second, where the throughput is IOPS × data block size, and the data block size is a known amount. The preset performance indexes of the memory in the storage carrier include, for example, a main frequency and a capacity of the memory, where the main frequency of the memory determines a scanning speed of the data block, and the capacity of the memory determines a hit rate of the data block to be scanned in the memory. The preset performance index of the processor may include, for example, the number of CPU cores and CPU dominant frequency. In addition, optionally, the preset performance index corresponding to the storage carrier may further include a network card rate corresponding to the device where the storage carrier is located, and the network card rate may be embodied as: corresponding weighting coefficients are set according to different network card rate levels, and the influence of the weighting coefficients on the reading and writing cost can be as follows: multiplied by IOPS or throughput.
103. And determining the execution cost corresponding to each of the plurality of execution plans according to the hardware environment index corresponding to each of the plurality of execution plans.
In the embodiment of the present invention, the execution cost may include two types of costs: the read-write cost may alternatively be referred to as the IO cost and the arithmetic cost or may alternatively be referred to as the CPU cost. The read-write cost and the operation cost may be specifically expressed as time consumed by read-write and time consumed by operation.
And the influence of the hardware environment index on the execution cost of the execution plan can be represented as: for a certain execution plan, the number of data blocks to be scanned and the number of records to be operated or referred to as the number of rows to be operated are determined, so that the preset performance index of the storage carrier corresponding to the execution plan will act on the number of data blocks to be scanned to obtain the read-write cost of the execution plan, and the preset performance index of the processor will act on the number of records to be operated to obtain the operation cost of the execution plan. Thus, the execution cost of the execution plan may appear as an accumulation of the read-write cost and the computational cost.
Alternatively, only the preset performance index of the storage carrier may be considered, and at this time, the influence on the read-write cost of the execution plan is considered; alternatively, only the preset performance index of the processor may be considered, and at this time, the influence on the operation cost of the execution plan is considered. Optionally, the preset performance index of the storage carrier may be considered as well as the preset performance index of the processor, and at this time, the influence on the operation cost and the read-write cost of the execution plan is considered.
The function of the preset performance index of the storage carrier on the number of data blocks to be scanned is represented as follows: determining a read-write cost required for scanning a data block according to a preset performance index of the storage carrier, so that the read-write cost of the execution plan can be expressed as: the number of data blocks to be scanned is the read-write cost required for scanning one data block. In practical application, taking a storage carrier as a hard disk as an example, the read-write cost corresponding to the hard disk can also be expressed as: data block size/hard disk throughput to be scanned.
Similarly, the action of the preset performance index of the processor on the number of records to be operated is embodied as follows: the computation cost required to compute a record is determined according to the preset performance index of the processor, and thus, the computation cost of executing the plan can be expressed as: the number of records to be operated is the operation cost required for operating one record.
The determination process of the number of data blocks to be scanned and the number of records to be calculated will be described in the following embodiments, and in this embodiment, only how the hardware environment index acts in the cost calculation process of the execution plan is emphasized.
In addition, it should be noted that different operation types may be preset with different weighting values, and therefore, when calculating the operation cost, the influence of the operation type may also be considered, and thus, the operation cost of the execution plan may also be expressed as: the record number to be calculated calculates the type weighted value and calculates the calculation cost required by one record. Each execution plan includes information of operation types, such as sorting, comparison, and arithmetic of four rules, and the operation types may also be referred to as operators.
104. The query statement is responded to with the execution plan having the lowest execution cost among the plurality of execution plans.
And after obtaining the execution cost corresponding to each of the plurality of execution plans, selecting the execution plan with the minimum execution cost from the plurality of execution plans, responding to the query statement by using the execution plan, and obtaining a corresponding query result to feed back to the user.
In summary, for an SQL query statement input by a user, first, a plurality of execution plans corresponding to the query statement are obtained, and then, hardware environment indexes corresponding to the plurality of execution plans are obtained, an execution cost corresponding to each of the plurality of execution plans is determined by combining the hardware environment indexes corresponding to each of the plurality of current execution plans, and finally, the execution plan with the lowest execution cost in the plurality of execution plans responds to the query statement to obtain a query result. By considering the hardware environment factors in the cost calculation process of the execution plan, the determined cost of each execution plan is matched with the hardware environment involved in execution no matter how the hardware environment changes, so that the calculated cost can truly reflect the cost consumed in actual execution.
The following describes the above determination process of the number of data blocks to be scanned and the number of records to be operated in detail with reference to the embodiments shown in fig. 3 and 4.
For a plurality of execution plans obtained, the scanning modes adopted by different execution plans may be different, and generally, there are two scanning modes: the full-table scan mode and the index scan mode may be alternatively referred to as a sequential scan mode and a discrete scan mode. The determination methods of the number of the data blocks to be scanned and the number of records to be operated corresponding to different scanning methods are also different.
Fig. 3 illustrates how to determine the number of data blocks to be scanned, the number of records to be operated, and how to perform the cost calculation in the full-table scanning mode, including:
301. and if the plurality of execution plans comprise a first execution plan corresponding to the full-table scanning mode, determining the read-write cost of the first execution plan according to the total number of the data blocks contained in the first table to be scanned corresponding to the first execution plan and the preset performance index of the storage carrier where the data blocks contained in the first table to be scanned are located.
302. And determining the operation cost of the first execution plan according to the total number of records contained in the first table to be scanned and the preset performance index of the processor.
It is assumed in this embodiment that the preset performance level of the storage carrier and the preset performance level of the processor are taken into account.
For an execution plan adopting a full-table scanning mode, all data blocks of a first table to be scanned corresponding to the execution plan need to be scanned, and correspondingly, all records contained in the first table to be scanned need to be operated. At this time, the number of data blocks to be scanned corresponding to the first execution plan is the total number of data blocks corresponding to the first table to be scanned, and the number of records to be calculated is the total number of records, which is the number of rows included in the first table to be scanned.
In addition, for the same hard disk, the performance indexes corresponding to different scanning modes are different, so for an execution plan adopting the full-table scanning mode, it is necessary to obtain the performance index of the full-table scanning mode corresponding to the hard disk where the first table to be scanned of the execution plan is located.
Assuming that a first table to be scanned corresponding to a first execution plan includes 1000 data blocks, and that 800 data blocks are stored in the memory and 200 data blocks are stored in a hard disk, the read-write cost at this time includes: the memory read-write cost + the hard disk read-write cost, where the memory read-write cost may be represented as 800 × the read-write cost required for reading one data block in the memory, and the hard disk read-write cost may be represented as the read-write cost required for reading one data block in the 800 × hard disk full-table scanning mode.
Fig. 4 illustrates how to determine the number of data blocks to be scanned, the number of records to be operated, and how to perform the cost calculation in the index scanning mode, including:
401. and if the plurality of execution plans comprise a second execution plan corresponding to the index scanning mode, acquiring the statistical information of a second table to be scanned corresponding to the second execution plan.
Furthermore, the execution cost corresponding to the second execution plan is determined according to the obtained statistical information and the hardware environment index corresponding to the second execution plan, which is specifically as follows:
402. and determining the number of the data blocks to be scanned and the number of records to be operated corresponding to the second execution plan according to the statistical information.
403. And determining the read-write cost of the second execution plan according to the number of the data blocks to be scanned and the preset performance index of the storage carrier where the data blocks to be scanned are located.
404. And determining the operation cost of the second execution plan according to the number of records to be operated and the preset performance index of the processor.
It is assumed in this embodiment that the preset performance level of the storage carrier and the preset performance level of the processor are taken into account.
According to the foregoing embodiment, for an execution plan adopting an index scanning manner, it is necessary to obtain a performance index of a hard disk in which a data block to be scanned of the execution plan is located, where the hard disk corresponds to the index scanning manner, so as to calculate the read-write cost of the execution plan based on the performance index.
In addition, in the index scanning mode, the number of data blocks to be scanned and the number of records to be operated are not all data blocks and all records corresponding to the second table to be scanned corresponding to the execution plan, but the number of data blocks to be scanned and the number of records to be operated at this time need to be determined by combining the statistical information of the second table to be scanned.
The statistical information of the second table to be scanned mainly reflects the distribution characteristics of the data in the table, and may include the total number of records in the table, i.e., the total number of rows, a histogram of columns, the ratio of high-frequency words and high-frequency words in the columns, the linear correlation between the data writing sequence and the row number updating sequence, and so on.
The process of determining the number of data blocks to be scanned and the number of records to be operated according to the statistical information of the table can refer to the implementation process of CBO. In the present embodiment, only a simple example is shown: assuming that the index condition of a certain execution plan includes a user name a, a is a high-frequency word, and the ratio corresponding to the high-frequency word is 1/1000, the total number of rows included in the second to-be-scanned table corresponding to the execution plan is assumed to be 10000 times the ratio, so that the number of records to be calculated is 10. In addition, it is assumed that each data block includes 100 records, and the linear correlation value is 1, which indicates that the 10 records to be operated are located in one data block, and the number of the data blocks to be scanned is 1. Thus, after obtaining the number of data blocks to be scanned and the number of records to be operated, the execution cost of the execution plan can be determined based on the performance index of the storage carrier and the performance index of the processor.
The data query device of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these data querying devices may each be configured using commercially available hardware components through the steps taught in this scheme.
Fig. 5 is a schematic structural diagram of a data query apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus includes: the device comprises a first acquisition module 11, a second acquisition module 12, a determination module 13 and an execution module 14.
The first obtaining module 11 is configured to obtain a plurality of execution plans corresponding to the query statement.
A second obtaining module 12, configured to obtain a hardware environment index corresponding to each of the multiple execution plans.
And the determining module 13 is configured to determine, according to the hardware environment index, an execution cost corresponding to each of the plurality of execution plans.
And the execution module 14 is used for responding to the query statement by using the execution plan with the lowest execution cost in the plurality of execution plans.
Optionally, the second obtaining module 12 is specifically configured to:
determining a storage carrier where a data block to be scanned corresponding to each of the plurality of execution plans is located;
and acquiring preset performance indexes of the storage carrier and/or preset performance indexes of a processor executing the multiple execution plans as the hardware environment indexes.
Optionally, the determining module 13 includes: a first determining unit 131 and a second determining unit 132.
A first determining unit 131, configured to determine, if the plurality of execution plans includes a first execution plan corresponding to a full-table scanning manner, a read-write cost of the first execution plan according to a total number of data blocks included in a first table to be scanned corresponding to the first execution plan and a preset performance index of a storage carrier in which the data blocks included in the first table to be scanned are located; and determining the operation cost of the first execution plan according to the total number of records contained in the first table to be scanned and a preset performance index of the processor.
A second determining unit 132, configured to obtain statistical information of a second table to be scanned corresponding to a second execution plan if the plurality of execution plans include the second execution plan corresponding to the index scanning manner;
and determining the execution cost corresponding to the second execution plan according to the statistical information and the hardware environment index corresponding to the second execution plan.
Optionally, the second determining unit 132 is specifically configured to:
determining the number of data blocks to be scanned and the number of records to be operated corresponding to the second execution plan according to the statistical information;
determining the read-write cost of the second execution plan according to the number of the data blocks to be scanned and the preset performance index of the storage carrier where the data blocks to be scanned are located;
and determining the operation cost of the second execution plan according to the number of the records to be operated and the preset performance index of the processor.
The apparatus shown in fig. 5 can perform the method of the embodiments shown in fig. 1, fig. 3, and fig. 4, and the related descriptions of the embodiments shown in fig. 1, fig. 3, and fig. 4 can be referred to for the parts of this embodiment not described in detail. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1, fig. 3, and fig. 4, and are not described herein again.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The internal functions and structures of the data query device are described above, and in one possible design, the structure of the data query device may be implemented as an electronic device, such as a server, as shown in fig. 6, and the electronic device may include: a processor 21 and a memory 22. Wherein the memory 22 is used for storing a program for supporting a data query device to execute the data query method provided in any one of the above embodiments, and the processor 21 is configured to execute the program stored in the memory 22.
The program comprises one or more computer instructions which, when executed by the processor 21, are capable of performing the steps of:
acquiring a plurality of execution plans corresponding to the query statement;
acquiring hardware environment indexes corresponding to the execution plans;
determining the execution cost corresponding to each of the plurality of execution plans according to the hardware environment index;
responding to the query statement with an execution plan of the plurality of execution plans that has a lowest execution cost.
Optionally, the processor 21 is further configured to perform all or part of the steps of the foregoing methods.
The data query apparatus may further include a communication interface 23, configured to communicate with other devices or a communication network.
In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for a data query apparatus, which includes a program for executing the data query method in the foregoing method embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.