CN114443680A

CN114443680A - Database management system, related apparatus, method and medium

Info

Publication number: CN114443680A
Application number: CN202111660373.8A
Authority: CN
Inventors: 黄柏彤; 陈唯; 王博; 渠大川; 翁良贵; 谢德军; 丁博麟; 林伟; 周靖人
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-05-06

Abstract

The present disclosure provides a database management system, related apparatus, methods and media. The database management system includes: the search space exploration unit is used for analyzing the semantics and grammar of the current query statement according to the first scheduling request to generate a plurality of first candidate query plans; the metadata management unit is used for acquiring the service condition of the computing resource and first data reaching the database in the related data queried by the current query statement at the predicted execution time according to the second scheduling request; the rule unit is used for generating a second candidate query plan based on a plurality of first candidate query plan expansion according to the first data; and the query optimization unit is used for selecting a target query plan from the plurality of first candidate query plans and second candidate query plans based on the execution cost under the condition that the execution time is predicted and the computing resources are sufficient, wherein the database is queried through executing the target query plan. The method and the device improve the database query efficiency and the computing resource utilization rate.

Description

Database management system, related apparatus, method and medium

Technical Field

The present disclosure relates to the field of computer technology, and more particularly, to a database management system, related apparatus, methods, and media.

Background

The development of database technology provides greater and greater data storage capacity, and users can inquire mass data storage through a network and the like and obtain required data. As the amount of data storage increases day by day, the user's query requirements for data in the database become more complex, often requiring continuous queries for query results for specific events that satisfy a range of conditions (e.g., a time range). In a scenario of continuous query of a database, a client may continuously send a plurality of query statements to a database management server, and the database management server may generate a corresponding query plan according to each query statement, execute the query plan to perform a query operation on the database, access the database to obtain related data, manipulate the related data to generate a query result corresponding to the query, and return the query result to the client. The query plan generated according to each query statement is usually fixed, and the related data queried by the query statement gradually reaches the database over time and the usage of the computing resources of the database management server also changes over time, so that the efficiency and/or the computational power of the database management server for executing the query plan are not high and/or insufficient, which reduces the query efficiency of the database and also reduces the utilization of the computing resources of the database query.

Disclosure of Invention

In view of the above, the present disclosure is directed to improving database query efficiency and computational resource utilization in a database continuous query scenario.

To achieve this object, according to one aspect of the present disclosure, there is provided a database management system including:

the search space exploration unit is used for analyzing the semantics and the grammar of the current query statement according to the first scheduling request to generate a plurality of first candidate query plans;

a metadata management unit, configured to obtain, according to a second scheduling request, usage of a computing resource and first data that reaches a database in related data queried by the current query statement at a predicted execution time, where the predicted execution time is a predicted execution time of each query step in a plurality of query steps of one query plan among the plurality of first candidate query plans;

a rule unit, configured to generate a second candidate query plan based on the plurality of first candidate query plan expansions according to the first data;

a query optimization unit, configured to select a target query plan from the plurality of first candidate query plans and the second candidate query plan based on an execution cost when the execution time is predicted and the computing resources are sufficient, where a database is queried by executing the target query plan.

Optionally, the database management system further includes:

and the execution unit is used for executing the target query plan corresponding to the current query statement by adopting a mode of executing the query steps in an increment mode based on the query result of the query step of the target query plan corresponding to the last query statement so as to generate the query result corresponding to the current query statement.

Optionally, the target query plan corresponding to the current query statement includes a plurality of query steps, the plurality of query steps includes a first query step and a second query step, the first query step is a query step executed in the target query plan corresponding to the previous query statement, the second query step is a query step not executed in the target query plan corresponding to the previous query statement, and the database management system further includes:

and the storage unit is used for storing a first query result obtained by executing the first query step in the target query plan corresponding to the last query statement.

Optionally, the query optimization unit comprises a first reading module, an execution cost estimation module and a query plan selection module,

the first reading module is used for reading the first query result when the first query step needs to be executed;

the execution cost estimation module is used for estimating the execution costs of the plurality of first candidate query plans and the second candidate query plan by using an execution cost estimation method based on the first query result, wherein the first query step is skipped by reading the first query result in the execution cost estimation process;

the query plan selection module is used for selecting a query plan with the minimum execution cost from the plurality of first candidate query plans and the second candidate query plan as a target query plan.

Optionally, the first candidate query plan comprises a plurality of query steps, the rule unit comprises a query step selection module and a query plan expansion module,

the query step selection module is used for selecting a third query step which takes the first data as input data from the first candidate query plan;

and the query plan expansion module is used for adjusting an operator for executing the third query step and an operator execution sequence to generate the second candidate query plan.

Optionally, the execution unit includes a second reading module, a query plan execution module and a query result generation module,

the second reading module is used for reading the first query result when the first query step needs to be executed;

the query plan executing module is used for executing the plurality of query steps according to a target query plan corresponding to the current query statement, wherein the second query step is executed to obtain a second query result, and the first query result is read to skip the first query step;

the query result generation module is used for generating a query result corresponding to the current query statement based on the first query result and the second query result.

Optionally, the database management system further includes:

the operation management unit is used for registering the database query operation corresponding to the current query statement;

and the scheduling unit is used for sending the first scheduling request after the database query operation corresponding to the current query statement is successfully registered.

Optionally, the predicted execution time includes multiple predicted execution times, and the scheduling unit is further configured to preset the multiple predicted execution times, and send the second scheduling request when the predicted execution times are sufficient in computing resources.

According to an aspect of the present disclosure, there is provided a database management engine comprising:

the search space exploration unit is used for analyzing the semantics and grammar of the current query statement according to the first scheduling request to generate a plurality of first candidate query plans;

a metadata management unit, configured to obtain, according to a second scheduling request, usage of a computing resource and first data that reaches a database in related data queried by the current query statement at a predicted execution time, where the predicted execution time is a predicted execution time of each query step in a plurality of query steps of one query plan of the plurality of first candidate query plans;

According to an aspect of the present disclosure, there is provided a computing apparatus including:

a memory for storing computer executable code;

a processor for executing the computer executable code; and

a database management system as in any above.

According to an aspect of the present disclosure, there is provided a system on a chip, including:

a memory for storing computer executable code;

a processor for executing the computer executable code; and

a database management system as in any above.

According to an aspect of the present disclosure, there is provided a database management method including:

analyzing the semantics and grammar of the current query statement according to the first scheduling request to generate a plurality of first candidate query plans;

acquiring the use condition of the computing resource and first data reaching the database in the related data queried by the current query statement at a predicted execution time according to a second scheduling request, wherein the predicted execution time is the predicted execution time of each query step in a plurality of query steps of one query plan in the plurality of first candidate query plans;

generating a second candidate query plan based on the plurality of first candidate query plan expansion according to the first data;

selecting a target query plan from the plurality of first candidate query plans and the second candidate query plan based on an execution cost in a case where the execution time is predicted and the computational resources are sufficient, wherein a database is queried by executing the target query plan.

According to an aspect of the present disclosure, there is provided a computer-readable medium comprising computer-executable code which, when executed by a processor, implements the method described above.

In the embodiment of the disclosure, by utilizing the characteristic that in a continuous query database scenario, related data queried by a current query statement sequentially arrive at a database in a time sequence and the usage of computing resources of a database management system changes with time, semantics and syntax of the current query statement are analyzed according to a first scheduling request to generate a plurality of first candidate query plans, the predicted execution time of each query step in a plurality of query steps of one first candidate query plan (which may be an optimal query plan in the plurality of first candidate query plans) obtains the usage of the computing resources and first data arriving at the database in the related data queried by the current query statement according to a second scheduling request, then according to the first data, a second candidate query plan is generated based on expansion of the plurality of first candidate query plans, and under the condition of the predicted execution time and sufficient computing resources, the target query plan is selected from the plurality of first candidate query plans and the second candidate query plan based on the execution cost, so that the target query plan at the current moment is respectively searched at a plurality of predicted execution moments in the process of executing the current query statement, the execution cost of the whole query process of executing the current query statement is reduced, the query efficiency of the database is improved, the target query plans at the current moments respectively searched at the predicted execution moments are executed under the condition that the current computing resources of the database management server are sufficient, the normal execution of the current query statement is guaranteed, and the utilization rate of the computing resources is improved.

Drawings

The foregoing and other objects, features, and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which refers to the accompanying drawings in which:

FIG. 1 illustrates an internal block diagram of a database access system to which embodiments of the present disclosure are applied;

FIG. 2 illustrates an internal block diagram of a database management server according to an embodiment of the present disclosure;

FIG. 3 illustrates an internal block diagram of a database management system according to an embodiment of the present disclosure;

FIG. 4 shows schematic diagrams of a first candidate query plan and a second candidate query plan, in accordance with embodiments of the present disclosure;

FIG. 5 illustrates a schematic diagram of executing a target query plan at multiple predicted execution times, in accordance with an embodiment of the present disclosure;

FIG. 6 shows a flow diagram of a database management method according to an embodiment of the present disclosure.

Detailed Description

The present disclosure is described below based on examples, but the present disclosure is not limited to only these examples. In the following detailed description of the present disclosure, some specific details are set forth in detail. It will be apparent to those skilled in the art that the present disclosure may be practiced without these specific details. Well-known methods, procedures, and procedures have not been described in detail so as not to obscure the present disclosure. The figures are not necessarily drawn to scale.

The following terms are used herein.

A database: the data management system is a warehouse which organizes, stores and manages data according to a certain data structure, and can be a relational database which is established on the basis of a relational model, wherein the relational model is a two-dimensional table model and is used for organizing the data into a plurality of tables consisting of rows and columns. As the data volume of the database is larger and larger, the database partition is generated at the same time. And organizing the data in the database partitions into a plurality of partition data tables corresponding to the database partitions. A tuple is a data record in a data table. The tuples of the partition data table are clustered on the memory device in a physical partition policy. Partition granularity is the unit of a partition, for example, a partition in months or years.

And (3) query statement: is a database query and programming language for accessing data and querying, updating and managing databases. Besides querying data in the database, a query statement also needs to operate on a read data set, so that the query statement may include one or more operators, and different query statements include different operators. The Query statement is, for example, an SQL (Structured Query Language) statement. SQL refers to a specialized programming language that is used exclusively to manage data stored in a relational database. SQL may refer to various types of data-related languages, including data definition languages and data manipulation languages, where the scope of SQL may include data insertion, queries, updates and deletions, schema creation and modification, and data access control.

And (3) query planning: which may also be referred to as an execution plan, is a collection of logical operations that an execution unit actually performs a query statement.

And (3) optimal query planning: may be the query plan with the smallest execution cost of the plurality of candidate query plans. The execution cost of the query plan is related to the characteristics of the operators, the execution sequence of the operators and the like.

Time-varying relationship (TVR) -based incremental query planning (TIP) model: is an algebraic definition proposed on the basis of a time-varying relation, for example, "snapshot R _ t" represents an instance of R at a time point t, and "Δ R _ { t1} { t2 }" represents a change of R from the time point t1 to the time point t 2. On this basis, a series of basic operations can be defined, for example, "R _ { t1} + Δ R _ { t1} { t2} - [ R _ { t2 }", where "+" may mean a simple combination of R _ { t1} and Δ R _ { t1} { t2} in different cases, or a more complex aggregation operation.

Calculating a cluster: it may be a computer cluster, e.g. a data center, made up of a number of different terminals (servers). The computing cluster may include a plurality of computing nodes, and the computing forces corresponding to different computing nodes may be the same or different. The computing power of a compute node is the sum of all the computing resources that the compute node can call, and the computing resources may include a Central Processing Unit (CPU), a memory, and the like. In a practical application scenario, one or more computing nodes (which may be physically a computer or a virtual computing system) are allocated for a computing task (e.g., a database query task). In theory, one or more compute nodes may run one or more task slices (subtasks) of the compute task, but in practice, the individual task slices may not all be allocated to "valid" compute nodes. For example, although one or more task slices of the computing task are logically allocated to one or more computing nodes, in practice, the available computing resources in the computing cluster do not support as many computing nodes, and many computing nodes are not configured with corresponding computing resources and are invalid; or, although one or more task slices of the computing task are allocated to the computing nodes and corresponding computing resources are allocated to the computing nodes in the initial task allocation step, some computing nodes cannot normally operate due to problems such as hardware/software errors, and the like, and thus the computing nodes are effectively invalid. Therefore, the scheduling unit needs to be used to allocate the computing tasks to compute nodes with sufficient computing power to execute according to the computing resource usage (e.g., central processing unit occupation, memory occupation, etc.) of each compute node in the current compute cluster. In the context of a database access system as applied by the present disclosure, a database management server may be one computing node of a computing cluster.

The computing device: the device with computing or processing capability may be embodied in the form of a terminal, such as an internet of things device, a mobile terminal, a desktop computer, a laptop computer, etc., or may be embodied as a server or a cluster of servers. In the context of a database access system to which the present disclosure applies, the computing device may be a database management server.

Application scenarios of the present disclosure

The embodiment of the disclosure provides a database management scheme. The whole database management scheme is relatively universal, and can be used for various hardware devices for continuously querying the database, such as a data center, an AI (artificial intelligence) acceleration unit, a GPU (graphic processing unit), an IOT (Internet of things) device capable of executing a deep learning model, an embedded device and the like. The database management scheme is independent of the hardware on which the processing unit executing the database management scheme is ultimately deployed.

Database access system

Fig. 1 shows an internal structural diagram of a database access system to which an embodiment of the present disclosure is applied. In some embodiments, as shown in FIG. 1, database access system 100 includes: a client 101, a database management server 102, and a database 103.

In some embodiments, database 103 is a repository built on computer storage devices that organizes, stores, and manages data according to a data structure. In the embodiment of the present disclosure, the implementation form of the computer storage device carrying the database 103 is not limited, and the type of the database 103 and the data structure adopted by the database are not limited. For example, from a data structure perspective, database 103 may be a hierarchical database, a network database, or a relational database; from the storage format, the database 103 may be a line-type database or a column-type database; from the database language, the database 103 may be an SQL database, an Oracle database, or the like. In some embodiments, the database management server 102 is a core service for storing, processing and protecting data, and its tasks may include processing transactions related to the database 103, such as designing and creating the database 103, saving various lists and documents required by the database 103, providing daily management support for the database 103 to optimize the performance of the database 103, and performing related operations on the database 103 in response to access requests of the client 101 and returning the related operation results to the client 101, thereby implementing viewing, deleting, modifying, adding, etc. of data in the database 103. The database management server 102 may be a computing node in a computing cluster (e.g., a data center) whose computing power is the sum of all the available computing resources that the database management server 102 can invoke. In some embodiments, the scheduling unit may be used to allocate the database query task to the computationally intensive database management server 102 for execution according to the usage of the computing resources (e.g., central processor occupancy, memory occupancy, etc.) of each computing node in the current computing cluster. In some embodiments, the communication connection between the client 101 and the database management server 102 may be a wired or wireless network connection. In some embodiments, the client 101 may be in the same local area network as the database management server 102 or in a different local area network. Additionally, in some embodiments, a communication connection is also established between the database management server 102 and the database 103, which may be a wired or wireless network connection. In some embodiments, in a deployment implementation, the database management server 102 and the database 103 may be deployed on the same physical device, or may be deployed on different physical devices. When the database management server 102 and the database 103 are deployed on different physical devices, they may be deployed in the same local area network or in different local area networks.

In some embodiments, the client 101 may be viewed as a user-oriented interactive interface for the database 103, allowing a user to access the database 103 through the client 101. In some embodiments, the client 101 may comprise any type of device or application configured to interact with the database management server 102. For example, an application refers to a computer program that may provide various specific functions, including but not limited to a billing application, an internet browser, a multimedia player, and so forth. When a user needs to access the database 103, or when the client 101 has a database access requirement, the client 101 may send a database access request to the database management server 102; the database management server 102 may perform a corresponding operation on the database 103 according to the database access request. In some embodiments, in a database query scenario, the client 101 may send a query statement to the database management server 102. The query statement may be different according to the database language supported by the database 103, and may be, for example, an SQL statement or an Oracle statement. In some embodiments, the client 101 may continuously send a plurality of query statements to the database management server 102, the plurality of query statements are usually the same or similar, the database management server 102 may generate a plurality of first candidate query plans according to a current query statement, select an intermediate target query plan from the plurality of first candidate query plans according to execution costs of the plurality of first candidate query plans, generate a second candidate query plan from the plurality of first candidate query plans according to expansion of first data reaching the database 103 in relevant data queried by the current query statement in a case where predicted execution time of each query step of the intermediate target query plan is sufficient and computational resources are sufficient, select a target query plan as a final execution plan according to execution costs of the plurality of first candidate query plans and the second candidate query plan, execute the final query plan to perform a query operation on the database 103, to access the database 103 to obtain the relevant data and manipulate the relevant data to generate a query result corresponding to the query, and to return the query result to the client 101. In the process of continuously querying the database, the target query plan at the current moment is respectively searched at a plurality of predicted execution moments in the process of executing the current query statement, so that the execution cost of the whole query process of executing the current query statement is reduced, and the query efficiency of the database is improved. The target query plan at the current moment respectively searched at the predicted execution moment is executed under the condition that the current computing resources of the database management server 102 are sufficient, and the target query plan at the current moment is skipped to be executed under the condition that the current computing resources are insufficient, so that the current query sentence is ensured to be normally executed, and the utilization rate of the computing resources is improved. Since the process of continuously querying the database 103 using the database management server 102 will be described in detail below, it will not be described in detail here.

Computing device

Fig. 2 illustrates an internal structure diagram of the database management server 102 (computing device 141 or system on a chip 142) according to an embodiment of the present disclosure. As shown in fig. 2, computing device 141 may include one or more processors 22, as well as memory 29. The memory 29 in the computing device 141 may be a main memory (referred to as a main memory or an internal memory) for storing instruction information and/or data information represented by data signals, and may also be used for data exchange between the processor 22 and an external storage device 26 (or referred to as an auxiliary memory or an external memory).

In some cases, processor 22 may need to access memory 29 to retrieve data in memory 29 or to make modifications to data in memory 29. To alleviate the speed gap between processor 22 and memory 29 due to the slow access speed of memory 29, computing device 141 further includes a cache memory 28 coupled to bus 21, cache memory 28 being used to cache some data in memory 29, such as program data or message data, that may be recalled repeatedly. The cache Memory 28 is implemented by a storage device such as a Static Random Access Memory (SRAM).

Based on this, the processor 22 may include an instruction execution unit 221, a memory management unit 222, and the like. The instruction execution unit 221 initiates a write access request when executing some instructions that need to modify the memory, where the write access request specifies write data and a corresponding physical address that need to be written into the memory; the memory management unit 222 is configured to translate the virtual addresses specified by the instructions into physical addresses mapped by the virtual addresses, and the physical addresses specified by the write access request may be consistent with the physical addresses specified by the corresponding instructions.

The information exchange between the memory 29 and the cache 28 is typically organized in blocks. In some embodiments, the cache 28 and the memory 29 may be divided into data blocks according to the same spatial size, and the data blocks may be a minimum unit (including one or more data of a preset length) of data exchange between the cache 28 and the memory 29. For the sake of brevity and clarity, each data block in the cache memory 28 is referred to as a cache block (which may be referred to as a cacheline or cache line) and different cache blocks have different cache block addresses; each data block in the memory 29 is referred to as a memory block, and different memory blocks have different memory block addresses. The cache block address comprises, for example, a physical address tag for locating the data block.

Due to space and resource constraints, cache memory 28 cannot cache the entire contents of memory 29, i.e. the storage capacity of cache memory 28 is generally smaller than that of memory 29, and the addresses of the cache blocks provided by cache memory 28 cannot correspond to the addresses of the memory blocks provided by memory 29. When the processor 22 needs to access the memory, firstly, the processor accesses the cache 28 through the bus 21 to determine whether the content to be accessed is stored in the cache 28, if so, the cache 28 hits, and at this time, the processor 22 directly calls the content to be accessed from the cache 28; if the content that processor 22 needs to access is not in cache 28, processor 22 needs to access memory 29 via bus 21 to look up the corresponding information in memory 29. Because the access rate of the cache memory 28 is very fast, the efficiency of the processor 22 can be significantly improved when the cache memory 28 hits, thereby also improving the performance and efficiency of the overall computing device 141.

As shown, processor 22, cache 28, and memory 29 are packaged in a system on chip (SoC) 201. The designer may configure the SoC architecture so that communications between various elements in computing device 141 are secure.

In this example, the computing device 141 may also include various software, illustrated as an embedded operating system 203, a loader 202, and an application 204. The software may be either resident in the memory 29 or stored in the external memory 26. Typically, the loader 202 and embedded operating system 203 are solidified in the memory 29 and the application 204 may be stored in the external memory 26. In some cases, the loader 202 and the embedded operating system 203 may also be combined into one. For such software, loader 202 may be configured to verify and load various software into cache 28. The loader 202 itself may be software that is loaded in a secure manner. The system on chip 201 may be configured to retrieve the loader 202 from the memory 29 immediately or soon after a system power-up or reset, and may then determine which software to load based on the configuration information, and then load the corresponding software into the cache 28 based on a verification of the software, e.g., based on a software source, fingerprint, certificate, etc., to determine whether to load a certain software.

A portion of the application 204 may be independent of the embedded operating system 203 and loaded by the loader 202, and another portion of the application 204 may be dependent on the embedded operating system 203 and loaded by and controlling the operation of the embedded operating system 203. As an example, application 1 to application n, n being a natural number other than 0, are shown in fig. 2. The application programs 204 may include, without limitation, programs for controlling or responding to external devices (e.g., biometric sensors, printers, microphones, speakers, flow valves, or other I/O components, sensors, actuators, or devices), programs for various I/O tasks, security programs, attestation programs, various computing modules, communication programs, communication support protocols, or other programs, or combinations thereof.

In some embodiments, computing device 141 may also include database management system 30. In some embodiments, the database management system 30 is a functional module in the database management server 102, and in implementation, the database management system 30 may be a program module of software, or may be implemented in hardware, for example, based on FPGA or CPLD. In some embodiments, the database management system 30 may receive a plurality of query statements continuously sent by the client, and may respectively find the target query plan at the current time at a plurality of predicted execution times in the process of executing the current query statement, so that the execution cost of the whole query process of executing the current query statement is reduced, and the database query efficiency is improved. The target query plan at the current moment respectively searched at the predicted execution moment is executed under the condition that the current computing resources of the database management server 102 are sufficient, and the target query plan at the current moment is skipped to be executed under the condition that the current computing resources are insufficient, so that the current query sentence is ensured to be normally executed, and the utilization rate of the computing resources is improved. Since the process of continuously querying the database using the database management system 30 will be described in detail below, it will not be described in detail here.

Further, the computing device 141 may also include a storage device 26, a display device 23, an audio device 24, a mouse/keyboard 25, and the like. The storage device 26 is a device for information access such as a hard disk, an optical disk, and a flash memory coupled to the bus 21 via corresponding interfaces. The display device 23 is coupled to the bus 21, for example via a corresponding graphics card, for displaying in accordance with display signals provided by the bus 21.

The computing device 141 also typically includes a communication device 27 and thus may communicate with a network or other device in a variety of ways. The communication device 27 may include, for example, one or more communication modules, by way of example, the communication device 27 may include a wireless communication module adapted for a particular wireless communication protocol.

Of course, the structure of different computer systems may vary depending on the motherboard, operating system, and instruction set architecture. For example, many computer systems today have an input/output control hub coupled between the bus 21 and various input/output devices, and the input/output control hub may be integrated within the processor 22 or separate from the processor 22.

Database management system

FIG. 3 illustrates an internal block diagram of a database management system 30 of one embodiment of the present disclosure. As shown in fig. 3, the database management system 30 includes a job management unit 310, a scheduling unit 320, a metadata management unit 330, a rule unit 340, a search space exploration unit 350, a query optimization unit 360, an execution unit 370, and a storage unit 380. In some embodiments, the database management system 30 may continuously query the database with the same or similar query statements.

In some embodiments, as shown in fig. 3, job management unit 310 may receive a registration request of a database query job corresponding to a current query statement submitted by a client, and register the database query job in job management unit 310, where the registration information may include the current query statement, data dependencies between the database query job and other database query jobs on a database query production line, and a termination time of the database query job. In some embodiments, the scheduling unit 320 may formulate a scheduling policy according to the registration information of the database query job corresponding to the current query statement, or formulate the scheduling policy in a user-defined manner, for example, the scheduling unit 320 may trigger the execution unit 370 to execute the target query plan corresponding to the current query statement when relevant data queried by the current query statement stored in the database exceeds a certain threshold.

In some embodiments, the scheduling unit 320 may send the first scheduling request after the database query job corresponding to the current query statement is successfully registered. In some embodiments, as shown in fig. 3, the search space exploration unit 350 may parse the semantics and syntax of the current query statement (e.g., SQL statement) according to the first scheduling request, generating a plurality of first candidate query plans. The first candidate query plan comprises a plurality of query steps, and the plurality of first candidate query plans are equivalent query plans, i.e. the same output will be generated for the same input, but with different operators or different operator sequences. In other words, the multiple first candidate query plans are functionally identical in their implementation but are differently implemented. In some embodiments, the search space exploration unit 350 may perform semantic and syntax parsing on the current query statement to obtain a syntax tree, then generate a logic execution plan of the current query statement based on the syntax tree obtained by parsing, and perform a certain transformation on the logic execution plan to obtain a plurality of first candidate query plans. In some embodiments, the query optimization unit 360 may select a query plan with an execution cost satisfying a numerical condition from the plurality of first candidate query plans as the intermediate target query plan based on the execution cost. For example, a query plan with the lowest, highest, or some numerical execution cost may be selected from the first plurality of candidate query plans as the intermediate target query plan.

In some embodiments, the scheduling unit 320 may preset a plurality of predicted execution times that are predicted execution times for each of the plurality of query steps of one of the plurality of first candidate query plans (e.g., the intermediate target query plan). In some embodiments, the scheduling unit 320 may also transmit the second scheduling request at each predicted execution time and with sufficient computing resources, and suspend transmitting the second scheduling request if the computing resources are insufficient. In some embodiments, the metadata management unit 330 may obtain, at the predicted execution time, the usage of the computing resource and the first data that arrives at the database from the related data queried by the current query statement according to the second scheduling request. In some embodiments, the rules unit 340 may generate a second candidate query plan based on a plurality of first candidate query plan developments from the first data. The second candidate query plan, which is an equivalent query plan to the first candidate query plan, i.e. will produce the same output for the same input, but with different operators or operator order, comprises a plurality of query steps. The set of the plurality of first candidate query plans and the second candidate query plan constitute a search space for the current query statement, the search space being a subset of all candidate query plans for the current query statement, and a final query plan for execution by the execution unit for one candidate query plan may be determined from the search space. In some embodiments, as shown in FIG. 4, the query path of the first candidate query plan 1 corresponding to the current query statement is composed of an input, filter 1, operator 1, filter 3, and an output. The query path of the first candidate query plan 2 corresponding to the current query statement is composed of an input, a filter 2, an operator 2, and an output. The query path of the second candidate query plan corresponding to the current query statement is composed of an input, a filter 1, an operator 2, an operator 3, a filter 3, and an output. The first candidate query plan 1, the first candidate query plan 2, and the second candidate query plan would produce the same output for the same input. It should be noted that, in general, different operators have different operating efficiencies, and the first candidate query plan 1, the first candidate query plan 2, and the second candidate query plan include different operators and the execution order of the operators is different, so that the execution costs of the first candidate query plan 1, the first candidate query plan 2, and the second candidate query plan are different. In some embodiments, as shown in FIG. 3, the rules unit 340 may include a query step selection module 341 and a query plan expansion module 342. The query step selection module 341 may select a third query step from the first candidate query plan having the first data as input data. The query plan expansion module 342 may adjust the operators and the operator execution order for executing the third query step to generate a second candidate query plan. In some embodiments, the client sends the same query statement to the database management server at regular query cycles to query statistics for a specific event within a preset time period in the past. For example, the client sends the same query statement to the database management server every 1 day to query for the total sales over the past 5 days. For the tth day, the sales of each day from the tth day to the t-4 day needs to be queried, and then the sales of each day are summed to obtain a query result, where the first candidate query plan corresponding to the query statement may be:

SUM(t-4)+SUM(t-3)+SUM(t-2)+SUM(t-1)+SUM(t) (1)

where sum (t) is the sales amount on day t, and sum (t) corresponds to the sales data arriving at the database on day t. For example, on day t-2, the first data stored in the database includes sales data corresponding to SUM (t-4), sales data corresponding to SUM (t-3), and sales data corresponding to SUM (t-2), the third query step using the first data as input data is SUM (t-4), SUM (t-3), and SUM (t-2), the third query steps SUM (t-4), SUM (t-3), and SUM (t-2) may be adjusted to SUM ((t-4) → (t-2)), and the generated second candidate query plan may be SUM ((t-4) → (t-2))

SUM((t-4)→(t-2))+SUM(t-1)+SUM(t) (2)

Among them, SUM ((t-4) → (t-2)) is sales from day t-4 to day t-2.

In some embodiments, the query optimization unit 360 may estimate the execution costs of the plurality of first candidate query plans and the second candidate query plan in case of a predicted execution time and sufficient computing resources, and select the target query plan of the predicted execution time based on the execution costs. In some embodiments, the query optimization unit 360 may compare the current computing resource occupancy (e.g., processor occupancy and memory occupancy) of the database management server to a preset threshold, determine that the current computing resources are sufficient if the current computing resource occupancy is less than the preset threshold, and otherwise determine that the current computing resources are insufficient. In some embodiments, the plurality of first candidate query plans and the second candidate query plan may include different query steps. For example, if a current query statement relates to three tables, the connection order of the three tables in different candidate query plans is different and/or the actual execution cost is different due to different operators used for implementing the current query plan, the target query plan can be determined by evaluating the execution cost of each candidate query plan. In some embodiments, the query optimization unit 360 may select, based on the execution costs of the plurality of first candidate query plans and the second candidate query plan, a query plan with an execution cost satisfying a certain numerical condition as the target query plan at the predicted execution time when the execution time is predicted and the computing resources are sufficient. For example, a query plan that has the lowest, the highest, or a certain number of execution costs may be selected from the plurality of first and second candidate query plans as the target query plan. Since the method for estimating the execution cost of the query plan is the prior art, it is not described in detail. In some embodiments, for each target query plan at the predicted execution time, the execution unit 370 executes the target query plan corresponding to the current query statement in a manner of performing the query step incrementally based on the query result of the query step of the target query plan corresponding to the previous query statement to generate the query result corresponding to the current query statement. In some embodiments, after receiving the target query plan corresponding to the current query statement, the execution unit 370 executes the query plan according to query steps in the target query plan, where each target query plan has a plurality of corresponding query steps, and each query step is executed and completed by an operator in the execution unit 370 (one operator is responsible for completing a basic data processing logic, and a group of operators completes a group of query steps of data according to the target query plan, in other words, a query step is executed and completed by at least one operator and a corresponding query result is obtained). In some embodiments, the primary operators in execution unit 370 include: data read operations, conditional filter operations, join operations, gather operations, Group by operations, deduplication (distint) operations, aggregation operations, sort operations, and the like. The condition filtering operation can be specific to the data table, or specific to the sub-query, or with a filtering condition; the collective operations include intersection, union, difference, etc. operations. The execution unit 370 may generally encapsulate multiple query steps belonging to the target query plan as query results of the target query plan, i.e., query results of the target query plan including query results of the multiple query steps. Therefore, in practical applications, the query result of the target query plan corresponding to the current query statement may be obtained from the execution unit 370 in units of the target query plan. In some embodiments, as shown in fig. 5, the related data queried by the current query statement gradually reaches the database over time, where the related data reaching the database is the first data, and the usage of the computing resources of the database management server also changes over time. t1, t2, t3, t4, and t5 are predicted execution times at which the current computing resources are insufficient at t1, and at which the current computing resources are sufficient at t2, t3, t4, and t 5. So at the predicted execution time t1, the query optimization unit 360 skips generating the target query plan at the current time, and the execution unit 370 skips executing the target query plan at the current time. At the predicted execution time t2, t3, t4, and t5, the query optimization unit 360 generates a target query plan at the current time, and the execution units 370 execute the target query plans at the current time, respectively, so as to obtain a query result of the current query statement.

In some embodiments, the client continuously sends a plurality of query statements, which are usually the same or similar, to the database management server, so that in the process of continuously querying the database, the query steps of the target query plan corresponding to the current query statement may include the query step executed in the target query plan corresponding to the previous query statement, that is, the target query plan corresponding to the current query statement and the target query plan corresponding to the previous query statement may include the same query step. It should be noted that the same query step may have the same inputs and outputs and execute the same data processing logic. In some embodiments, the plurality of query steps of the target query plan corresponding to the current query statement includes a first query step and a second query step, where the first query step is a query step executed in the target query plan corresponding to the previous query statement, and the second query step is a query step not executed in the target query plan corresponding to the previous query statement. The storage unit 380 may store a first query result obtained by executing a first query step in a target query plan corresponding to a previous query statement. In some embodiments, where the database management system 30 queries the database periodically and continuously using the same or similar query statements, the storage unit 380 may determine the first query result to be stored and the length of time to store it based on the query cycle and the data read by each query statement querying the database.

In some embodiments, as shown in fig. 3, the query optimization unit 360 includes a first reading module 361, an execution cost estimation module 362, and a query plan selection module 363. For the target query plan at each predicted execution time, the first reading module 361 may read the first query result when the first query step needs to be executed in the process of executing the cost estimation. The execution cost estimation module 362 may estimate, based on the first query result, the execution costs of the first candidate query plan and the second candidate query plan corresponding to the current query statement by using an execution cost estimation method, and when the first query step needs to be executed in the execution cost estimation process, read the first query result and skip the first query step. Since the cost estimation method is performed in the prior art, it is not described in detail. The query plan selection module 363 may select a query plan with the minimum execution cost from the plurality of first candidate query plans and the plurality of second candidate query plans, as a target query plan (i.e., an optimal query plan) corresponding to the current query statement. In some embodiments, as shown in fig. 3, the execution unit 370 includes a second reading module 371, a query plan execution module 372, and a query result generation module 373. For the target query plan at each predicted execution time, the second reading module 371 may read the first query result when the first query step needs to be executed in the execution process of the target query plan corresponding to the current query statement. The query plan executing module 372 is configured to execute a plurality of query steps according to a target query plan corresponding to a current query statement, where the second query step is executed to obtain a second query result, and when the first query step needs to be executed in an executing process of the target query plan corresponding to the current query statement, the first query result is read and the first query step is skipped. The query result generation module 373 may generate a query result corresponding to the current query statement based on the first query result and the second query result. For example, the generated first query result and the second query result at each predicted execution time may be added or aggregated according to an incremental query plan model based on a time-varying relationship, so as to generate a query result corresponding to the current query statement. Therefore, for each target query plan at the predicted execution time, when the first query step executed in the target query plan corresponding to the previous query statement needs to be executed in the target query plan corresponding to the current query statement to obtain the first query result, the first query step does not need to be repeatedly executed, and the query result corresponding to the current query statement can be generated based on the read first query result and the second query result obtained by executing the second query step, so that the query efficiency of the database is improved, and the calculation cost of database query is reduced.

Database management method according to embodiments of the present disclosure

According to one embodiment of the present disclosure, a database management method is provided. The method may be performed by the database management system 30. In the case where the computing apparatus 141 is a single computer, the database management system 30 is a part of the single computer, and the database management method is executed by a part of the single computer. When the computing apparatus 141 is a set of a plurality of computers, the database management system 30 is a single computer, and the database management method is executed by a single computer. In the case of the computing device 141 being in the form of a cloud, the database management system 30 is a series of computers or portions thereof in the cloud, and the database management method is performed by a series of computers or portions thereof in the cloud.

As shown in fig. 6, a database management method according to one embodiment of the present disclosure includes: step S610, analyzing the semantics and grammar of the current query statement according to the first scheduling request to generate a plurality of first candidate query plans; step S620, obtaining the use condition of the computing resource and the first data reaching the database in the related data inquired by the current inquiry statement at the predicted execution time according to the second scheduling request, wherein the predicted execution time is the predicted execution time of each inquiry step in a plurality of inquiry steps of one inquiry plan in the plurality of first candidate inquiry plans; step S630, generating a second candidate query plan based on the plurality of first candidate query plan expansion according to the first data; and step S640, selecting a target query plan from the plurality of first candidate query plans and the second candidate query plan based on an execution cost when the execution time is predicted and the computing resources are sufficient, wherein the database is queried by executing the target query plan.

Since the implementation details of the above database management method are described in the above detailed description of the embodiment of the apparatus, they are not repeated for brevity.

Commercial value of the disclosure

In the embodiment of the disclosure, in a scenario of continuously querying a database, a database management server may receive a plurality of query statements continuously sent by a client, may respectively search a target query plan at a current time at a plurality of predicted execution times in a process of executing the current query statement, and reduces execution cost of the whole query process of executing the current query statement, thereby improving database query efficiency. In this scenario, the operating cost of the computing device is reduced by improving the utilization rate of the computing resources of the computing device. The embodiment of the disclosure reduces the running cost of the computing device, thereby having good commercial value and economic value.

As will be appreciated by one skilled in the art, the present disclosure may be embodied as systems, methods and computer program products. Accordingly, the present disclosure may be embodied in the form of entirely hardware, entirely software (including firmware, resident software, micro-code), or in the form of a combination of software and hardware. Furthermore, in some embodiments, the present disclosure may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied therein.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium is, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer-readable storage medium include: an electrical connection for the particular wire or wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical memory, a magnetic memory, or any suitable combination of the foregoing. In this context, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a processing unit, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a chopper. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any other suitable combination. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., and any suitable combination of the foregoing.

Computer program code for carrying out embodiments of the present disclosure may be written in one or more programming languages or combinations. The programming language includes an object-oriented programming language such as JAVA, C + +, and may also include a conventional procedural programming language such as C. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A database management system, comprising:

a query optimization unit, configured to select a target query plan from the plurality of first candidate query plans and the second candidate query plan based on an execution cost in a case where the execution time is predicted and the computing resources are sufficient, where a database is queried by executing the target query plan.

2. The database management system of claim 1, further comprising:

3. The database management system according to claim 2, wherein the target query plan corresponding to the current query statement includes a plurality of query steps, the plurality of query steps includes a first query step and a second query step, the first query step is a query step executed in the target query plan corresponding to a previous query statement, the second query step is a query step not executed in the target query plan corresponding to a previous query statement, the database management system further includes:

4. The database management system of claim 3, wherein the query optimization unit comprises a first reading module, an execution cost estimation module, and a query plan selection module,

5. The database management system of claim 1, wherein the first candidate query plan includes a plurality of query steps, the rule unit includes a query step selection module and a query plan expansion module,

6. The database management system of claim 3, wherein the execution unit includes a second reading module, a query plan execution module, and a query result generation module,

the query plan execution module is used for executing the plurality of query steps according to a target query plan corresponding to the current query statement, wherein the second query step is executed to obtain a second query result, and the first query result is read to skip the first query step;

7. The database management system of claim 1, further comprising:

8. The database management system of claim 7, wherein the predicted execution time comprises a plurality of predicted execution times, and the scheduling unit is further configured to preset the plurality of predicted execution times, and to transmit the second scheduling request when the predicted execution time is sufficient and the computing resources are sufficient.

9. A database management engine comprising:

10. A computing device, comprising:

a memory for storing computer executable code;

a processor for executing the computer executable code; and

a database management system as recited in any of claims 1-8.

11. A system on a chip, comprising:

a memory for storing computer executable code;

a processor for executing the computer executable code; and

a database management system as recited in any of claims 1-8.

12. A database management method, comprising:

selecting a target query plan from the plurality of first candidate query plans and the second candidate query plan based on an execution cost in a case that the execution time is predicted and the computing resources are sufficient, wherein a database is queried by executing the target query plan.

13. A computer-readable medium comprising computer-executable code that, when executed by a processor, implements the method of claim 12.