WO2023000561A1 - 一种对数据库操作进行加速的方法和装置 - Google Patents

一种对数据库操作进行加速的方法和装置 Download PDF

Info

Publication number
WO2023000561A1
WO2023000561A1 PCT/CN2021/131504 CN2021131504W WO2023000561A1 WO 2023000561 A1 WO2023000561 A1 WO 2023000561A1 CN 2021131504 W CN2021131504 W CN 2021131504W WO 2023000561 A1 WO2023000561 A1 WO 2023000561A1
Authority
WO
WIPO (PCT)
Prior art keywords
execution
plan tree
hardware accelerator
execution plan
database
Prior art date
Application number
PCT/CN2021/131504
Other languages
English (en)
French (fr)
Inventor
向俊凌
周民
刘睿民
Original Assignee
威讯柏睿数据科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 威讯柏睿数据科技(北京)有限公司 filed Critical 威讯柏睿数据科技(北京)有限公司
Priority to US18/267,215 priority Critical patent/US20240045868A1/en
Publication of WO2023000561A1 publication Critical patent/WO2023000561A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Definitions

  • the present application relates to the technical field of databases, and more specifically, to a method and device for accelerating database operations.
  • the database execution is shown in Figure 1.
  • the user inputs a query request, which is analyzed by the parsing engine to generate a parsing tree.
  • the query tree is output and rewritten into an execution plan tree, and then the execution engine actually executes according to the execution plan tree. Read, cache or store the final data through the storage engine.
  • Hardware acceleration is usually output by the database optimization engine, which offloads part of the execution nodes in the execution plan tree executed by the execution engine to the hardware for execution.
  • hardware is limited by resources, power consumption, and complexity, and cannot implement as many complex execution functions as software.
  • it can only implement some basic relational algebra operations, such as selection, projection, and connection.
  • different nodes in the plan tree output by the optimization engine have similar functions, which is a waste of hardware capabilities. All of these lead to the fact that the execution actions in the execution plan tree generated by the current software cannot well match the execution functions and capabilities of the hardware.
  • the invention provides a method for accelerating database operations to solve the technical problem that the execution plan tree generated in the prior art cannot well match the execution function and capability of the hardware accelerator.
  • the method is applied to a computer connected to a hardware accelerator In the database, this method includes:
  • the new execution cost of the new execution plan tree is less than the initial execution cost of the initial execution plan tree, distribute the new execution plan tree to the hardware accelerator for execution according to the organization mode and operation mode of the hardware accelerator;
  • the new execution cost and the initial execution cost are determined based on calling a preset cost function.
  • the initial execution plan tree is processed and a new execution plan tree is generated according to the basic operations and acceleration resources supported by the hardware accelerator, specifically:
  • the accelerated mapping module will perform normalized mapping on the nodes corresponding to accelerated execution actions, and will also remove some nodes whose functions are integrated, and then map some nodes after preprocessing.
  • the normalized mapping includes merging, and/or deleting, and/or adding.
  • the nodes and output target columns in the matching execution plan tree are optimized according to the acceleration resources, specifically:
  • the acceleration resource determine the method for the hardware accelerator to access the required data, the method for each node to perform data access, the final output and the column that the successor node needs to use;
  • the way the hardware accelerator accesses the required data includes accessing based on its own memory, or accessing a shared memory based on a cache coherence protocol, or accessing a remote storage node through a network.
  • the new execution plan tree is distributed to the hardware accelerator for execution according to the organization mode and operation mode of the hardware accelerator, specifically:
  • one or more of the new execution plan trees are sent to the hardware accelerator as a whole for execution, or the new execution plan trees are sent to the hardware accelerator in sequence by individual nodes
  • the hardware accelerator executes, or sends a combination of multiple nodes in the new execution plan tree to the hardware accelerator for execution;
  • the digital twin image includes the number of execution modules of the hardware accelerator, the number of executable function nodes, the affiliation relationship of each executable function node and the schedulable execution granularity.
  • the method also includes:
  • the digital twin image is generated according to the self information.
  • each of the hardware accelerators is respectively provided with a local storage unit, and the result after the hardware accelerator accelerates the database operation is determined according to the execution result returned by the hardware accelerator Before the data, the method also includes:
  • the method further includes:
  • the new execution cost is not less than the initial execution cost, execute the initial execution plan tree based on the database, and determine the result of the database operation according to the execution result.
  • the present invention also provides a device for accelerating database operations, which is applied to a database connected to a hardware accelerator, and the device includes:
  • the acceleration mapping module is used to process the initial execution plan tree and generate a new execution plan tree according to the basic operations and acceleration resources supported by the hardware accelerator when receiving the initial execution plan tree sent by the database and needing to be accelerated. plan tree;
  • An accelerated execution engine configured to distribute the new execution plan tree to all Execution by the hardware accelerator described above;
  • the accelerated storage engine module is used to complete the memory and external memory management of the accelerated part, and coordinate the cache and final data consistency between multiple hardware accelerators;
  • a determining module configured to determine the result data after the hardware accelerator accelerates the database operation according to the execution result returned by the hardware accelerator
  • the new execution cost and the initial execution cost are determined based on calling a preset cost function.
  • the present invention also proposes a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on the terminal device, the terminal device executes the above-mentioned A method for speeding up database operations.
  • the initial execution plan tree when receiving the initial execution plan tree sent by the database that needs to be accelerated, the initial execution plan tree is processed according to the basic operations and acceleration resources supported by the hardware accelerator.
  • the execution plan tree is processed and a new execution plan tree is generated; if the new execution cost of the new execution plan tree is less than the initial execution cost of the initial execution plan tree, the new The execution plan tree is distributed to the hardware accelerator for execution; according to the execution result returned by the hardware accelerator, the result data after the hardware accelerator accelerates the database operation is determined, thereby realizing the execution plan tree of the database and the execution function of the hardware accelerator and The ability to match and maintain a reasonable complexity, and then give full play to the acceleration capability of the hardware accelerator, while retaining the existing database system architecture to achieve hardware acceleration, ensuring the reliability of the database system.
  • Fig. 1 shows a schematic diagram of database architecture in the prior art
  • FIG. 2 shows a schematic flowchart of a method for accelerating database operations proposed by an embodiment of the present invention
  • FIG. 3 shows a schematic diagram of the database architecture of the embodiment of the present invention
  • FIG. 4 shows a schematic diagram of scanning node mapping principles in an embodiment of the present invention
  • FIG. 5 shows a schematic diagram of the principle of connection node mapping in an embodiment of the present invention
  • FIG. 6 shows a first schematic diagram of the principle of materialized node mapping in the embodiment of the present invention
  • FIG. 7 shows the second schematic diagram of the principle of materialized node mapping in the embodiment of the present invention.
  • FIG. 8 shows a schematic diagram of the principle of control node mapping in an embodiment of the present invention.
  • Fig. 9 shows a schematic diagram of the principle of an accelerated execution engine in an embodiment of the present invention.
  • FIG. 10 shows a schematic structural diagram of a hardware accelerator in an embodiment of the present invention.
  • Fig. 11 shows the execution plan schematic diagram of PostgreSQL output in the embodiment of the present invention.
  • FIG. 12 shows a schematic diagram of a mapped execution plan tree in an embodiment of the present invention
  • Fig. 13 shows a schematic diagram of the merged execution plan tree in the embodiment of the present invention.
  • Fig. 14 shows a schematic diagram of the output column optimization code in the embodiment of the present invention.
  • FIG. 15 shows a schematic structural diagram of an apparatus for accelerating database operations proposed by an embodiment of the present invention.
  • the embodiment of the present application provides a method for accelerating database operations, which is applied to a database connected to a hardware accelerator. As shown in FIG. 2, the method includes the following steps:
  • Step S101 when receiving the initial execution plan tree sent by the database that needs to be accelerated, process the initial execution plan tree according to the basic operations and acceleration resources supported by the hardware accelerator and generate a new execution plan tree.
  • the database is connected with a hardware accelerator
  • the hardware accelerator can be an external hardware accelerator, which is a device that realizes acceleration based on hardware, including GPU (Graphics Processing Unit, graphics processing unit), or FPGA (Field Programmable Gate Array, available on-site Programmable Gate Array), or ASIC (Application Specific Integrated Circuit, Application Specific Integrated Circuit).
  • Database operations refer to a series of operations on the data on the database, including reading data, writing data, updating or modifying data, deleting data, and so on.
  • the initial execution plan tree is generated by the optimization engine in the database after rewriting the query on the parse tree sent by the parse engine.
  • the query request can be a query request including acceleration instructions. If the query request needs to be accelerated, the parsing engine generates a parse tree according to the query request and sends the parse tree to the optimization engine. The optimization engine rewrites the query on the parse tree and generates the required The initial execution plan tree for acceleration.
  • hardware accelerators generally only support limited basic operations, different hardware accelerators have different acceleration resources. According to the basic operations supported by the hardware accelerator and the acceleration resources The initial execution plan tree is processed and a new execution plan tree is generated.
  • the initial execution plan tree is processed and a new execution plan tree is generated according to the basic operations and acceleration resources supported by the hardware accelerator, specifically:
  • the accelerated mapping module will perform normalized mapping on the nodes corresponding to accelerated execution actions, and will also remove some nodes whose functions are integrated, and then map some nodes after preprocessing.
  • the nodes performing the acceleration operation in the initial execution plan tree are subjected to normalized mapping according to the basic operation, and the normalized mapping refers to the initial execution plan tree Nodes that perform accelerated operations in , form a corresponding relationship with each basic operation.
  • the results of the normalized mapping a matching execution plan tree matching the functions of the hardware accelerator is generated. Since the acceleration resources of different hardware accelerators are different, in order to better match the capabilities of the hardware accelerator, the matching execution is performed according to the acceleration resources of the hardware accelerator.
  • the nodes and output target columns in the plan tree are optimized, and a new execution plan tree is generated according to the optimization results.
  • the normalization mapping includes merging, and/or deleting, and/or adding.
  • the way the hardware accelerator accesses the required data includes accessing based on its own memory, or accessing a shared memory based on a cache coherence protocol, or accessing a remote storage node through a network.
  • the hardware accelerator can access the required data in different ways, including access based on its own memory, access to shared memory based on a cache coherence protocol, or access to remote storage nodes through the network.
  • the method of accessing the required data determines the nodes in the matching execution plan tree that can be executed concurrently; when there are different nodes in the matching execution plan tree accessing the same data, in order to avoid waste of resources, the same data in the matching execution plan tree will be Different execution nodes for data access are merged into one execution action; since the final output and successor nodes need to use not all output columns, the output target columns are optimized according to the final output and successor nodes.
  • Step S102 if the new execution cost of the new execution plan tree is less than the initial execution cost of the initial execution plan tree, distribute the new execution plan tree to the hardware accelerator according to the organization mode and operation mode of the hardware accelerator implement.
  • the new execution cost is the computing resources consumed by executing the new execution plan tree
  • the initial execution cost is the computing resources consumed by executing the initial execution plan tree.
  • the new execution cost and the initial execution cost can be determined respectively by calling the preset cost function If the new execution cost is less than the initial execution cost, it means that the new execution plan tree is better than the initial execution plan tree.
  • the new execution plan tree Distributed to hardware accelerators for execution.
  • the new execution plan tree is distributed to the hardware accelerator for execution according to the organization and operation of the hardware accelerator, specifically:
  • one or more of the new execution plan trees are sent to the hardware accelerator as a whole for execution, or the new execution plan trees are sent to the hardware accelerator in sequence by individual nodes
  • the hardware accelerator executes, or sends a combination of multiple nodes in the new execution plan tree to the hardware accelerator for execution;
  • the digital twin image includes the number of execution modules of the hardware accelerator, the number of executable function nodes, the affiliation relationship of each executable function node and the schedulable execution granularity.
  • the hardware accelerator may include one or more execution modules
  • the digital twin image includes the number of execution modules of the hardware accelerator, the number of executable function nodes, the affiliation relationship of each executable function node and the schedulable execution granularity .
  • one or more new execution plan trees are sent to the hardware accelerator for execution as a whole, or the new execution plan tree is sent to the hardware accelerator for execution by a single node, or multiple nodes in the new execution plan tree are combined and sent to Hardware accelerator execution, for example, if the hardware accelerator is implemented by FPGA, which implements multiple sets of independent relational algebra operation pipelines, multiple new execution plan trees can be sent to the hardware accelerator for execution; if the hardware accelerator implements the time-sharing function , and the scheduling capability for each basic relational algebra operator, then multiple execution nodes in the new execution plan tree can be sent to the hardware accelerator for execution according to the resources of the basic operator.
  • the method further includes:
  • the digital twin image is generated according to the self information.
  • the hardware accelerator will submit its own information including the organization mode and operation mode, and a digital twin image can be generated based on the self information.
  • the method further include:
  • the new execution cost is not less than the initial execution cost, execute the initial execution plan tree based on the database, and determine the result of the database operation according to the execution result.
  • the new execution cost is not less than the initial execution cost, it means that the execution of the initial execution plan tree requires fewer computing resources.
  • the initial execution plan tree is executed based on the database, and the result of the database operation is determined according to the execution result.
  • Step S103 determine the result data after the hardware accelerator accelerates the database operation according to the execution result returned by the hardware accelerator.
  • the hardware accelerator returns the execution result after executing the new execution plan tree, so as to determine the result data after the hardware accelerator accelerates the database operation.
  • step S103 the method further includes:
  • the database executes a part of the new execution plan tree at the same time, determine the operation result of the database according to the result data and the execution result of the database itself; otherwise, determine the result data as the operation result.
  • the method further includes:
  • each hardware accelerator executes a new execution plan tree, it is necessary to manage the internal memory and the external memory. If there are multiple hardware accelerators, each hardware accelerator is respectively provided with a local storage unit, and each hardware corresponding to the execution result needs to be adjusted. The accelerator's cache and final data, so that each cache and each final data are respectively consistent.
  • the initial execution plan tree is processed according to the basic operations and acceleration resources supported by the hardware accelerator.
  • the execution plan tree is processed and a new execution plan tree is generated; if the new execution cost of the new execution plan tree is less than the initial execution cost of the initial execution plan tree, the new The execution plan tree is distributed to the hardware accelerator for execution; according to the execution result returned by the hardware accelerator, the result data after the hardware accelerator accelerates the database operation is determined, thereby realizing that the execution plan tree of the database is consistent with the execution function and capability of the hardware accelerator. Match and maintain a reasonable complexity, and then give full play to the acceleration capability of the hardware accelerator.
  • hardware acceleration is realized on the basis of retaining the existing database system architecture, ensuring the reliability of the database system.
  • the embodiment of this application provides a method for accelerating database operations. As shown in FIG.
  • the module realizes the execution node remapping and optimization processing of the execution plan tree to match the function of the hardware accelerator, and outputs the new execution plan tree after matching, and then the accelerated execution engine module delivers the new execution plan tree to the hardware accelerator for execution.
  • the core of the hardware accelerator is the hardware accelerator, which can be GPU, FPGA or ASIC, and mainly implements accelerated database operations, such as data scanning, conditional filtering, multi-table connection, group aggregation, sorting and other functions that can be reasonably realized by hardware .
  • some complex function processing requires an additional coprocessor to complete. Or to complete data access and storage at a higher speed, memory and high-speed disks can also be connected to it.
  • the hardware accelerator organizes the database operations it implements to realize the concurrency among operation functions.
  • each module has the same sub-function, and each module can independently complete the execution of a plan tree (perhaps requiring the assistance of a coprocessor). Its structure is shown in Figure 10.
  • the specific organization of the hardware accelerator is not limited. Before it officially executes the acceleration process, it needs to notify the accelerated execution engine module of its own resource function organization and operation mode, that is, to establish a digital twin image of the hardware accelerator at the accelerated execution engine. In this way, the accelerated execution engine can efficiently utilize the full capability of the hardware accelerator.
  • the method includes the following steps:
  • Step 1 the acceleration mapping module receives the initial execution plan tree that needs to be accelerated after being parsed by the analysis engine and output by the optimization engine;
  • Step 2 the acceleration mapping module remaps the nodes in the initial execution plan tree according to the mapping rules of the execution node type, and obtains the matching execution plan tree matching the acceleration function nodes; then merges the nodes of the matching execution plan tree and outputs the target column Optimize, output a new execution plan tree;
  • the acceleration mapping module compares the new execution cost of the new execution plan tree with the initial execution cost of the initial execution plan tree, and if the initial execution cost is smaller, resends the initial execution plan tree to the execution engine for execution, otherwise executes step 3;
  • Step 3 the accelerated execution engine module selects whether to send multiple or a single entire new execution plan tree at a time according to the resources and capabilities of the hardware accelerator, or to combine multiple or single execution nodes in the new execution plan tree to the hardware accelerator;
  • Step 4 the hardware accelerator completes the corresponding function of the received node, and returns the final result data set;
  • Step 5 the accelerated storage engine completes the memory and external memory management of the accelerated part.
  • Each module in the processing flow of the acceleration part can be dispersed in multiple hardware boards and realized by different devices, or can be integrated in one hardware board and realized by one physical device.
  • the interfaces between them are traditional PCIe, Eth, FC (Fibre Channel, Fiber Channel technology), Infiniband and other interfaces, or CCIX (Cache Coherent Interconnect for Accelerators, cache coherence protocol for accelerators), CXL (Compute Express Link, computing fast link) and other cache coherence interfaces, or other
  • the internal high-speed data interface can be set flexibly by those skilled in the art.
  • the execution plan tree generated by the existing optimization engine after query rewriting has many types of nodes.
  • the execution plan tree output by the current open source database software PostgreSQL after query rewriting has 40 possible node types, which are divided into four categories: control nodes, scan nodes, connection nodes, and materialization nodes.
  • Hardware acceleration takes resource costs into consideration. Due to consumption constraints and implementation complexity, it is impossible to implement so many complex execution functions. It usually only implements basic operations, and the specific supported basic operations are reported by the hardware accelerator or determined in advance.
  • the current PostgreSQL scanning node implements a total of 16 types of scanning nodes.
  • FPGA is used to realize the acceleration component, and the support for scanning actions is usually implemented in one type, such as sequential scanning, which is implemented according to the index, and may also provide index scanning.
  • the accelerated mapping module needs to normalize as many as 16 different types of scanning node mappings in the received plan tree into hardware-supported scanning actions.
  • the first seven scan nodes can be directly mapped to hardware accelerators, and the last two will not appear in the execution tree and can be ignored.
  • the remaining types of scanning nodes require software to cooperate with additional processing.
  • FunctionScan, TableFuncScan is mapped to the corresponding function call depending on whether the hardware supports the corresponding function function, or mapped to the software call to return all the results at once;
  • CteScan scans the temporary table generated by CTE (Common Table Expressions, common table expression), Usually, it can not be processed, just pass the generated temporary table to the upper node, but when the control node RecursiveUnion appears, it needs to expand the Recursive together with WorkTableScan and then remap it together.
  • NamedTuplestoreScan can pass the scanned temporary table to the upper layer; ValuesScan and ForeignScan also return all results at one time.
  • connection nodes PostgreSQL, NestLoop, MergeJoin and HashJoin which correspond to hardware accelerators, and there is only one connection action.
  • the accelerated mapping module normalizes the received connection node mappings of different types in the plan tree into hardware-supported connection actions.
  • the Hash nodes in the lower layer of HashJoin can be directly ignored and removed in this mapping.
  • the Sort and IncrementalSort in the materialization node are mapped to the Sort of the hardware accelerator; as shown in Figure 7, the Group node, Agg node, and WindowAgg node are mapped to the GroupAgg of the hardware accelerator; SetOp plus Append in the control node node, the MergeAppend node has been mapped as a collection operation of the hardware accelerator.
  • the Limit and Unique nodes in the materialization node can be mapped to a parameter of the hardware accelerator, and no longer correspond to a specific action.
  • the accelerated mapping module performs normalized mapping on the nodes corresponding to accelerated execution actions, and removes some nodes whose functions are integrated, and some nodes are pre-processed before mapping.
  • the accelerated mapping module After the accelerated mapping module completes the mapping process, it optimizes the matching between the nodes of the execution tree and the output columns according to the accelerated resources of the hardware accelerator.
  • one possible way for a hardware accelerator is to hold large-capacity memory by itself, or directly access shared memory through a cache coherence protocol (such as CCIX, CXL), or access the data of remote storage nodes through network access, so , which can determine the concurrent scanning node actions according to the storage method of the required access data;
  • a cache coherence protocol such as CCIX, CXL
  • the accelerated execution engine can execute some node functions by itself. Taking postgres as an example, such as Result node, RecursiveUnion node, etc., the accelerated execution engine also chooses how to use the received matched new execution plan tree according to the implementation method of the hardware accelerator. Distributed to one or more hardware accelerators for execution.
  • the digital twin image includes information such as the number of execution modules of the hardware accelerator, the number of executable function nodes, their respective affiliation, and the granularity of execution that can be scheduled. If the hardware accelerator adopts the architecture shown in Figure 10, the digital twin image includes the execution granularity of the entire execution tree at most, and the minimum is determined by the scheduling method of its internal nodes. Each execution module includes functional nodes, their respective numbers, execution The scheduling method of the internal nodes of the tree, etc.
  • FIG. 9 is a schematic diagram of the principle of the accelerated execution engine.
  • the matched plan tree 1-n is the new execution plan tree 1-n
  • the digital twin of the execution component is the digital twin image.
  • the accelerated storage engine completes the memory and external memory management of the accelerated part. If multiple hardware accelerators have their own local storage units, it is also necessary to coordinate the cache and final data consistency between multiple hardware accelerators.
  • the execution plan output by PostgreSQL is shown in Figure 11.
  • the two scan nodes indicated by the arrows in the figure can perform node merging for the same table and under the same conditions.
  • the execution plan tree is transformed into the execution plan tree shown in FIG. 12 .
  • both Index Scan and Seq Scan are mapped to a unified Scan node
  • Hash Join and Nested Join are mapped to Join operations
  • the Hash node under Hash Join is removed; Gather Merge is also removed.
  • the entire execution plan tree is remapped to a new execution plan tree matching the hardware accelerator.
  • the execution plan tree is transformed into the execution plan tree shown in Figure 13 .
  • the scan node for nation will be optimized as the same node, that is, it will only be executed once.
  • its output columns include all columns, but the actual columns that need to be used in the final output and subsequent nodes are only l_orderkey, l_suppkey, l_shipdate, l_extendedprice, l_discount and l_year 6 columns . Therefore, the accelerated mapping module will further optimize the output columns of each node, and the output column optimization code is shown in Figure 14.
  • the accelerated execution engine After the accelerated mapping module completes the generation of the new execution plan tree, the accelerated execution engine sends the new execution plan tree and related parameters to the hardware accelerator for execution. It can be to send the entire new execution plan tree, or one by one node, or multiple nodes, which can be determined by the digital twin image of the hardware accelerator it holds.
  • the embodiment of the present application also proposes a device for accelerating database operations, which is applied to a database connected to a hardware accelerator, as shown in Figure 15 As shown, the device includes:
  • the acceleration mapping module 501 is configured to process the initial execution plan tree and generate a new execution plan tree according to the basic operations and acceleration resources supported by the hardware accelerator when receiving the initial execution plan tree sent by the database that needs to be accelerated. execution plan tree;
  • the accelerated execution engine 502 is configured to distribute the new execution plan tree to The hardware accelerator executes;
  • a determining module 503, configured to determine the result data after the hardware accelerator accelerates the database operation according to the execution result returned by the hardware accelerator;
  • the new execution cost and the initial execution cost are determined based on calling a preset cost function.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种对数据库操作进行加速的方法和装置,应用于连接有硬件加速器的数据库中,该方法包括:当接收到由数据库发送的需要进行加速的初始执行计划树时,根据硬件加速器支持的基本操作和加速资源对初始执行计划树进行处理并生成新执行计划树(S101);若新执行计划树的新执行代价小于初始执行计划树的初始执行代价,根据硬件加速器的组织方式和运行方式将新执行计划树分发至硬件加速器执行(S102);根据硬件加速器返回的执行结果确定硬件加速器对数据库操作加速后的结果数据(S103)。从而实现了数据库的执行计划树与进行加速的硬件加速器的执行功能和能力相匹配并保持合理的复杂度,进而充分发挥了该硬件加速器的加速能力。

Description

一种对数据库操作进行加速的方法和装置 技术领域
本申请涉及数据库技术领域,更具体地,涉及一种对数据库操作进行加速的方法和装置。
背景技术
过去十年来移动互联网的飞速发展和各行业的数字化改造和转型,各领域都出现了大规模的数据增长,但通用计算能力的发展却面临挑战,半导体技术的改进几乎达到了物理极限,通用处理器性能再无法按照摩尔定律增长。随着5G、云计算、万物互联、人工智能时代的到来,数据的产生、传输、存储和分析处理的要求却越来越高,数据增长对计算性能的要求与处理器性能发展间的缺口越来越大。
现有技术中,在数据库应用领域,面临海量数据处理分析的性能挑战下,通常采用异构计算解决方案来满足性能需求。通常的硬件加速方案都是在部署数据库软件的服务器上扩展一块PCIe(Peripheral Component Interconnect Express,周边设备高速连接)接口的加速卡,通过将部分数据库操作卸载到加速卡中执行实现数据库处理性能的提升。如何采用硬件尽可能大的提升数据库处理性能,已经成为数据库领域一个重要的技术研究点。
通常数据库执行如图1所示,用户输入查询请求,经由解析引擎分析后生成解析树,由优化引擎处理后输出查询树并重写为执行计划树,再由执行引擎按照执行计划树实际执行,通过存储引擎读取,缓存或存储最终数据。
硬件加速通常是由数据库优化引擎输出的,将由执行引擎执行的执行计划树中的部分执行节点卸载到硬件中执行。但硬件受限于资源,功耗和复杂度,并无法实现与软件一样的数量众多的复杂执行功能,通常仅能实现一些基本的关系代数操作,比如选择,投影,连接等。另外,在某些场景下,比如存在子查询等,优化引擎输出的计划树中不同节点存在相似功能,这对硬件能力来说是一种浪费。这都导致当前软件生成的执行计划树中的执行动作并不能很好的匹配硬件的执行功能和能力。
因此,如何实现数据库的执行计划树与进行加速的硬件加速器的执行功能和能力相匹配并保持合理的复杂度,进而充分发挥该硬件加速器的加速能力,是当前需要解决的技术问题。
发明内容
本发明提供一种对数据库操作进行加速的方法,用以解决现有技术中生成的执行计划树不能很好的匹配硬件加速器的执行功能和能力的技术问题,该方法应用于连接有硬件加速器的数据库中,该方法包括:
当接收到由所述数据库发送的需要进行加速的初始执行计划树时,根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树;
若所述新执行计划树的新执行代价小于所述初始执行计划树的初始执行代价,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分发至所述硬件加速器执行;
根据所述硬件加速器返回的执行结果确定所述硬件加速器对数据库操作加速后的结果数据;
其中,所述新执行代价和所述初始执行代价是基于调用预设代价函数确定的。
在本申请一些实施例中,根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树,具体为:
根据所述基本操作将所述初始执行计划树中执行加速操作的节点进行归一映射,根据归一映射的结果生成与所述硬件加速器匹配的匹配执行计划树;
此外,加速映射模块将对应加速执行动作的节点进行归一映射的同时,还将去掉部分功能被整合的节点,再有部分节点进行预处理后再映射。
根据所述加速资源对所述匹配执行计划树中节点和输出目标列进行优化,并根据优化结果生成所述新执行计划树。
在本申请一些实施例中,所述归一映射包括归并、和/或删除、和/或新增。
在本申请一些实施例中,根据所述加速资源对所述匹配执行计划树中节点和输出目标列进行优化,具体为:
根据所述加速资源确定所述硬件加速器访问所需数据的方式、各节点执 行数据访问的方式、最终输出和后继节点需要使用的列;
根据所述硬件加速器访问所需数据的方式确定所述匹配执行计划树中的可并发执行的节点,并将所述匹配执行计划树中对同一数据进行访问的不同执行节点合并为一个执行动作,并根据最终输出和后继节点需要使用的列对输出目标列进行优化;
其中,所述硬件加速器访问所需数据的方式包括基于自身内存进行访问、或基于缓存一致性协议对共享内存进行访问、或通过网络对远端存储节点进行访问。
在本申请一些实施例中,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分发至所述硬件加速器执行,具体为:
根据与所述组织方式和所述运行方式对应的数字孪生图像将一个或多个所述新执行计划树整体发送至所述硬件加速器执行、或将所述新执行计划树按单个节点依次发送至所述硬件加速器执行、或将所述新执行计划树中的多个节点组合发送至所述硬件加速器执行;
其中,所述数字孪生图像包括所述硬件加速器的执行模块数量,可执行功能节点的数量,各可执行功能节点的归属关系和可调度的执行粒度。
在本申请一些实施例中,所述方法还包括:
当接收到初始化指令并接收所述硬件加速器发送的包括组织方式和运行方式的自身信息时,根据所述自身信息生成所述数字孪生图像。
在本申请一些实施例中,所述硬件加速器为多个,各所述硬件加速器分别设置有本地存储单元,在根据所述硬件加速器返回的执行结果确定所述硬件加速器对数据库操作加速后的结果数据之前,所述方法还包括:
调整与所述执行结果对应的各所述硬件加速器的缓存和最终数据,以使各所述缓存和各所述最终数据分别保持一致。
在本申请一些实施例中,在根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树之后,所述方法还包括:
若所述新执行代价不小于所述初始执行代价,基于所述数据库执行所述初始执行计划树,并根据执行结果确定数据库操作的结果。
相应的,本发明还提供了一种对数据库操作进行加速的装置,应用于连接有硬件加速器的数据库中,所述装置包括:
加速映射模块,用于当接收到由所述数据库发送的需要进行加速的初始执行计划树时,根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树;
加速执行引擎,用于若所述新执行计划树的新执行代价小于所述初始执行计划树的初始执行代价,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分发至所述硬件加速器执行;
加速存储引擎模块,用于完成加速部分的内存和外存管理,协调多个所述硬件加速器间的缓存和最终数据的一致;
确定模块,用于根据所述硬件加速器返回的执行结果确定所述硬件加速器对数据库操作加速后的结果数据;
其中,所述新执行代价和所述初始执行代价是基于调用预设代价函数确定的。
相应的,本发明还提出了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在终端设备上运行时,使得所述终端设备执行如上所述的对数据库操作进行加速的方法。
通过应用以上技术方案,在连接有硬件加速器的数据库中,当接收到由所述数据库发送的需要进行加速的初始执行计划树时,根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树;若所述新执行计划树的新执行代价小于所述初始执行计划树的初始执行代价,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分发至所述硬件加速器执行;根据所述硬件加速器返回的执行结果确定所述硬件加速器对数据库操作加速后的结果数据,从而实现了数据库的执行计划树与进行硬件加速器的执行功能和能力相匹配并保持合理的复杂度,进而充分发挥了该硬件加速器的加速能力,同时在保留现有数据库系统架构的基础上实现硬件加速,保证了数据库系统的可靠性。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了现有技术中数据库架构示意图;
图2示出了本发明实施例提出的一种对数据库操作进行加速的方法的流程示意图;
图3示出了本发明实施例数据库架构示意图;
图4示出了本发明实施例中扫描节点映射原理示意图;
图5示出了本发明实施例中连接节点映射原理示意图;
图6示出了本发明实施例中物化节点映射原理示意图一;
图7示出了本发明实施例中物化节点映射原理示意图二;
图8示出了本发明实施例中控制节点映射原理示意图;
图9示出了本发明实施例中加速执行引擎原理示意图;
图10示出了本发明实施例中硬件加速器结构示意图;
图11示出了本发明实施例中PostgreSQL输出的执行计划示意图;
图12示出了本发明实施例中映射后的执行计划树示意图;
图13示出了本发明实施例中合并后的执行计划树示意图;
图14示出了本发明实施例中输出列优化代码示意图;
图15示出了本发明实施例提出的一种对数据库操作进行加速的装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供一种对数据库操作进行加速的方法,应用于连接有硬件加速器的数据库中,如图2所示,所述方法包括以下步骤:
步骤S101,当接收到由所述数据库发送的需要进行加速的初始执行计划树时,根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树。
本实施例中,数据库连接有硬件加速器,该硬件加速器可以为外部硬件加速器,是基于硬件实现加速的设备,包括GPU(Graphics Processing Unit,图形处理器)、或FPGA(Field Programmable Gate Array,现场可编程门阵 列)、或ASIC(Application Specific Integrated Circuit,专用集成电路)。数据库操作是指对数据库上的数据进行的一系列操作,包括读取数据、写数据、更新或修改数据、删除数据等。
所述初始执行计划树是数据库中优化引擎对解析引擎发送的解析树进行查询重写后生成的,解析树是解析引擎根据用户输入的查询请求生成的,也即在接收到用户输入的查询请求时,该查询请求可以为包括加速指令的查询请求,若该查询请求需要加速,解析引擎根据查询请求生成解析树并将解析树发送到优化引擎,优化引擎对解析树进行查询重写并生成需要进行加速的初始执行计划树,在接收到该初始执行计划树时,由于硬件加速器一般仅支持有限的基本操作,不同的硬件加速器有不同的加速资源,根据硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树。
为了可靠的生成新执行计划树,在本申请一些实施例中,根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树,具体为:
根据所述基本操作将所述初始执行计划树中执行加速操作的节点进行归一映射,根据归一映射的结果生成与所述硬件加速器匹配的匹配执行计划树;
此外,加速映射模块将对应加速执行动作的节点进行归一映射的同时,还将去掉部分功能被整合的节点,再有部分节点进行预处理后再映射。
根据所述加速资源对所述匹配执行计划树中节点和输出目标列进行优化,并根据优化结果生成所述新执行计划树。
本实施例中,为了使新执行计划树与硬件加速器的功能匹配,根据基本操作将所述初始执行计划树中执行加速操作的节点进行归一映射,该归一映射是指将初始执行计划树中执行加速操作的节点与各基本操作形成对应关系。根据归一映射的结果生成与硬件加速器的功能相匹配的匹配执行计划树,由于不同硬件加速器的加速资源存在差异,为了更好的与硬件加速器的能力匹配,根据硬件加速器的加速资源对匹配执行计划树中节点和输出目标列进行优化,根据优化结果生成新执行计划树。
需要说明的是,以上实施例的方案仅为本申请所提出的一种具体实现方案,其他根据硬件加速器支持的基本操作和加速资源对初始执行计划树进行处理并生成新执行计划树的方式均属于本申请的保护范围。
为了准确的生成匹配计划树,在本申请一些实施例中,所述归一映射包括归并、和/或删除、和/或新增。
为了准确的对匹配执行计划树中节点和输出目标列进行优化,在本申请一些实施例中,根据所述加速资源对所述匹配执行计划树中节点和输出目标列进行优化,具体为:
根据所述加速资源确定所述硬件加速器访问所需数据的方式、各节点执行数据访问的方式、最终输出和后继节点需要使用的列;
根据所述硬件加速器访问所需数据的方式确定所述匹配执行计划树中的可并发执行的节点,并将所述匹配执行计划树中对同一数据进行访问的不同执行节点合并为一个执行动作,并根据最终输出和后继节点需要使用的列对输出目标列进行优化;
其中,所述硬件加速器访问所需数据的方式包括基于自身内存进行访问、或基于缓存一致性协议对共享内存进行访问、或通过网络对远端存储节点进行访问。
本实施例中,硬件加速器可通过不同的方式访问所需数据,包括基于自身内存进行访问、或基于缓存一致性协议对共享内存进行访问、或通过网络对远端存储节点进行访问,根据硬件加速器访问所需数据的方式确定所述匹配执行计划树中的可并发执行的节点;在匹配执行计划树中存在不同节点对同一数据进行访问时,为了避免资源浪费,将匹配执行计划树中对同一数据进行访问的不同执行节点合并为一个执行动作;由于最终输出和后继节点需要使用的输出列并非所有列,根据最终输出和后继节点需要使用的列对输出目标列进行优化。
需要说明的是,以上实施例的方案仅为本申请所提出的一种具体实现方案,其他根据加速资源对匹配执行计划树中节点和输出目标列进行优化的方式均属于本申请的保护范围。
步骤S102,若所述新执行计划树的新执行代价小于所述初始执行计划树的初始执行代价,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分发至所述硬件加速器执行。
本实施例中,新执行代价为执行新执行计划树所消耗的计算资源,初始执行代价为执行初始执行计划树所消耗的计算资源,可通过调用预设代价函数分别确定新执行代价和初始执行代价,若新执行代价小于初始执行代价, 说明新执行计划树优于初始执行计划树,基于不同硬件加速器有不同的组织方式和运行方式,根据与该组织方式和该运行方式将新执行计划树分发至硬件加速器执行。
为了准确的将新执行计划树分发至硬件加速器执行,在本申请一些实施中,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分发至所述硬件加速器执行,具体为:
根据与所述组织方式和所述运行方式对应的数字孪生图像将一个或多个所述新执行计划树整体发送至所述硬件加速器执行、或将所述新执行计划树按单个节点依次发送至所述硬件加速器执行、或将所述新执行计划树中的多个节点组合发送至所述硬件加速器执行;
其中,所述数字孪生图像包括所述硬件加速器的执行模块数量,可执行功能节点的数量,各可执行功能节点的归属关系和可调度的执行粒度。
本实施例中,硬件加速器可包括一个或多个执行模块,数字孪生图像包括所述硬件加速器的执行模块数量,可执行功能节点的数量,各可执行功能节点的归属关系和可调度的执行粒度。
根据数字孪生图像将一个或多个新执行计划树整体发送至硬件加速器执行、或将新执行计划树按单个节点依次发送至硬件加速器执行、或将新执行计划树中的多个节点组合发送至硬件加速器执行,举例来说,若硬件加速器采用FPGA实现,其实现了多套独立的关系代数操作流水线,则可一次发送多个新执行计划树给硬件加速器执行;若硬件加速器实现了分时功能,和对各个基本关系代数算子的调度能力,则可依据基本算子的资源,发送新执行计划树中的多个执行节点给硬件加速器执行。
需要说明的是,以上实施例的方案仅为本申请所提出的一种具体实现方案,其他根据硬件加速器的组织方式和运行方式将新执行计划树分发至硬件加速器执行的方式均属于本申请的保护范围。
为了准确的将新执行计划树分发至硬件加速器执行,在本申请一些实施例中,所述方法还包括:
当接收到初始化指令并接收所述硬件加速器发送的包括组织方式和运行方式的自身信息时,根据所述自身信息生成所述数字孪生图像。
本实施中,在初始化时,硬件加速器会提交包括组织方式和运行方式的自身信息,根据该自身信息可生成数字孪生图像。
为了保证数据库操作的可靠性,在本申请一些实施例中,在根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树之后,所述方法还包括:
若所述新执行代价不小于所述初始执行代价,基于所述数据库执行所述初始执行计划树,并根据执行结果确定数据库操作的结果。
本实施例中,若新执行代价不小于初始执行代价,说明执行初始执行计划树需要的计算资源更少,此时基于数据库执行初始执行计划树,并根据执行结果确定数据库操作的结果。
步骤S103,根据所述硬件加速器返回的执行结果确定所述硬件加速器对数据库操作加速后的结果数据。
本实施例中,硬件加速器执行新执行计划树后返回执行结果,从而确定硬件加速器对数据库操作加速后的结果数据。
可以理解的是,在执行步骤S103之后,该方法还包括:
若存在所述数据库同时执行所述新执行计划树的一部分,则根据所述结果数据和数据库自身的执行结果确定数据库的操作结果,否则,将所述结果数据确定为所述操作结果。
为了准确的确定硬件加速器对数据库操作加速后的结果数据,在本申请一些实施例中,所述硬件加速器为多个,各所述硬件加速器分别设置有本地存储单元,在根据所述硬件加速器返回的执行结果确定所述硬件加速器对数据库操作加速后的结果数据之前,所述方法还包括:
调整与所述执行结果对应的各所述硬件加速器的缓存和最终数据,以使各所述缓存和各所述最终数据分别保持一致。
本实施例中,在硬件加速器执行新执行计划树时,需要对内存和外存进行管理,若硬件加速器为多个,各硬件加速器分别设置有本地存储单元,需要调整与执行结果对应的各硬件加速器的缓存和最终数据,以使各缓存和各最终数据分别保持一致。
通过应用以上技术方案,在连接有硬件加速器的数据库中,当接收到由所述数据库发送的需要进行加速的初始执行计划树时,根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树;若所述新执行计划树的新执行代价小于所述初始执行计划树的初始执行代价,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分 发至所述硬件加速器执行;根据所述硬件加速器返回的执行结果确定硬件加速器对数据库操作加速后的结果数据,从而实现了数据库的执行计划树与进行硬件加速器的执行功能和能力相匹配并保持合理的复杂度,进而充分发挥了该硬件加速器的加速能力,同时在保留现有数据库系统架构的基础上实现硬件加速,保证了数据库系统的可靠性。
为了进一步阐述本发明的技术思想,现结合具体的应用场景,对本发明的技术方案进行说明。
本申请实施例提供一种对数据库操作进行加速的方法,如图3所示,新增加速部分处理流程(虚线框中部分),与现有流程(没有虚线框的部分)共存,通过加速映射模块实现执行计划树的执行节点再映射和优化处理,来达到与硬件加速器功能的匹配,输出匹配后的新执行计划树,再由加速执行引擎模块将新执行计划树交由硬件加速器执行。
其中,硬件加速器的核心是硬件加速器,其可以是GPU,FPGA或ASIC,主要实现可加速的数据库操作,比如数据扫描,条件过滤,多表连接,分组汇聚,排序等其他硬件能合理实现的功能。除此之外,一些复杂的函数处理需一个额外的协处理器完成。或为更加高速的完成数据访问和存储,其上还可连接内存和高速磁盘。
硬件加速器将其实现的数据库操作进行组织,实现操作功能间的并发。一种可能的实现方式,考虑到实现的复杂度,实现最简单的多个相互独立的执行模块即可。即每个模块都有同样的子功能,其每个模块都可以独立完成一个计划树的执行(也许需要协处理器协助)。其结构如图10所示。
硬件加速器的具体组织不做限定,其正式执行加速处理前,需将自身的资源功能组织方式和运行方式通告给加速执行引擎模块,即在加速执行引擎处建立一个硬件加速器的数字孪生图像。这样,加速执行引擎可以高效的发挥硬件加速器的全部能力。
该方法包括以下步骤:
步骤1,加速映射模块接收经由解析引擎解析后并由优化引擎处理输出的需进行加速的初始执行计划树;
步骤2,加速映射模块依据执行节点类型的映射规则,完成对初始执行计划树中节点的再映射,得到匹配加速功能节点的匹配执行计划树;再对匹配执行计划树进行节点合并和输出目标列优化,输出新执行计划树;
加速映射模块比较新执行计划树的新执行代价和初始执行计划树的初始执行代价,如初始执行代价更小,则重新将初始执行计划树发送给执行引擎执行,否则执行步骤3;
步骤3,加速执行引擎模块根据硬件加速器的资源和能力,选择是一次发送多个或单个整个新执行计划树,还是将新执行计划树中多个或单个执行节点的组合给硬件加速器;
步骤4,硬件加速器完成接收到的节点的对应功能,返回最终结果数据集;
步骤5,加速存储引擎完成加速部分的内存和外存管理。
加速部分处理流程中各个模块可以是分散在多个硬件板卡由不同器件实现,也可以集成在一个硬件板卡由一个物理器件实现,其相互间的接口是传统的PCIe,Eth,FC(Fibre Channel,光纤通道技术),Infiniband等接口,或是CCIX(Cache Coherent Interconnect for Accelerators,用于加速器的缓存一致性协议),CXL(Compute Express Link,计算快速链接)等缓存一致性接口,或是其他的内部高速数据接口,本领域技术人员可灵活设定。
以下对图3中加速部分处理流程涉及的各个模块进行说明。
1、加速映射模块
现有优化引擎进行查询重写后生成的执行计划树,节点类型非常多。当前开源数据库软件PostgreSQL查询重写后输出的执行计划树,其可能的节点类型有40个,分为控制节点,扫描节点,连接节点和物化节点4大类,而硬件加速考虑到资源代价,功耗限制和实现复杂度,并无法实现如此多的复杂执行功能,其通常只实现基本的操作,具体的支持的基本操作由硬件加速器上报或提前相互确定。
以当前PostgreSQL的扫描节点为例,其总共实现了16种扫描节点类型,如采用FPGA实现加速部件,对扫描动作的支持通常就实现一种,比如顺序扫描,根据索引实现,还可能提供索引扫描。因此,加速映射模块需要将接收到的计划树中可能多达16种的不同类型的扫描节点映射归一为硬件支持的扫描动作。
如图4所示,前7个可直接映射为硬件加速器的扫描节点,后两个不会出现在执行树中,可以忽略。剩余类型的扫描节点,都需要软件配合进行额外的处理。FunctionScan,TableFuncScan视硬件是否支持对应的函数功能而 映射为对应的功能调用,还是映射为软件调用一次性返回所有结果;CteScan扫描的是CTE(Common Table Expressions,通用表表达式)生成的临时表,通常可不处理,仅将生成的临时表向上层节点传递即可,但当控制节点RecursiveUnion出现时,需与WorkTableScan一同对Recursive展开后一起重映射。NamedTuplestoreScan将所扫描临时表向上层传递即可;ValuesScan,ForeignScan同样一次性返回所有结果。
如图5所示,连接节点PostgreSQL有三种,NestLoop,MergeJoin和HashJoin,对应到硬件加速器,只有一种连接动作。加速映射模块将接收到的计划树中的不同类型的连接节点映射归一为硬件支持的连接动作。如图5所示,其中HashJoin下层的Hash节点在该映射中可以直接忽略并去掉。
如图6所示,物化节点中的Sort和IncrementalSort映射到硬件加速器的Sort;如图7所示,Group节点,Agg节点,WindowAgg节点映射到硬件加速器的GroupAgg中;SetOp加上控制节点中的Append节点,MergeAppend节点已经映射为硬件加速器的集合操作。物化节点中的Limit和Unique两个节点映射为硬件加速器的一个参数即可,不再对应为一个具体的动作。
如图8所示,控制节点中处映射到集合操作中的Append和MergeAppend节点外,Gather和GatherMerge直接忽略去掉,其余bitmapAnd和bitmapOr与bitmapindexScan一道合并到硬件加速器的Scan中,RecursiveUnion进行展开处理后,与CteScan和WorkTableScan一同进行映射,ModifyTable对应硬件加速器的增删改操作,剩余的控制节点则同样一次性返回所有结果。
综上所述,加速映射模块将对应加速执行动作的节点进行归一映射,同时去掉部分功能被整合的节点,其中,部分节点进行预处理后再映射。
加速映射模块完成映射处理后,再根据硬件加速器的加速资源进行执行树的节点和输出列的优化匹配。
比如硬件加速器一种可能的方式是自己持有大容量的内存,或者通过缓存一致性协议(比如CCIX,CXL)等直接访问共享内存,或者可以通过网络访问方式访问远端存储节点的数据,因此,其可以依据所需访问数据的存储方式决定可并发的扫描节点动作;
同时,不同执行节点对同一数据的访问,可直接合并为一个执行动作。同时加速部件执行产生的中间输出,可以消除与后继处理节点和最终结果非相关的输出列的缓存。
2、加速执行引擎
加速执行引擎可自身执行部分节点功能,以postgres为例,比如Result节点,RecursiveUnion节点等,加速执行引擎还根据硬件加速器的实现方式,选择将接收到的匹配后的新执行计划树以何种方式分发给一个或多个硬件加速器执行。
初始化时,所有硬件加速器将向加速执行引擎通告包括组织方式和运行方式的自身信息,加速执行引擎自身维护一个硬件加速器的数字孪生图像。依据所述自身信息,选择如何将新执行计划树提交给硬件加速器。数字孪生图像包括硬件加速器的执行模块数量,可执行功能节点的数量,各自归属关系,能够调度的执行粒度等信息。若硬件加速器采用图10所示的架构,则该数字孪生图像包括了执行粒度最大为整个执行树,最小由其内部节点的调度方式决定,每个执行模块包括的功能节点,各自的数量,执行树内部节点的调度方式等。
一种可能的实现方式是将多个用户终端的查询请求对应的新执行计划树发送给硬件加速器。硬件加速器将其一一对应到其一个执行模块,并依次按单个节点执行,完成加速处理过程。这仅限于硬件加速器的采用如图10所示架构。如果其余架构,执行动作也会有区别。如图9中所示为加速执行引擎的原理示意图,图9中,匹配的计划树1-n即所述新执行计划树1-n,执行部件数字孪生即所述数字孪生图像。
3、加速存储引擎
加速存储引擎完成加速部分的内存和外存管理,若多个硬件加速器都有各自的本地存储单元,则还需协调多个硬件加速器间的缓存和最终数据的一致性。
以下以TPC-H的测试的Q7为例对本方案进行说明。
对应的SQL(Structured Query Language,结构化查询语言)如下:
Figure PCTCN2021131504-appb-000001
Figure PCTCN2021131504-appb-000002
PostgreSQL输出的执行计划如图11所示,图中箭头指示的两个scan节点,对同一表,相同条件下的扫描就可以进行节点合并。
加速映射模块进行加速映射后,执行计划树转变为如图12所示的执行计划树。其中,Index Scan和Seq Scan都映射为统一的Scan节点,Hash Join,Nested Join都被映射为Join操作,且Hash Join下层的Hash节点被去掉;同时Gather Merge也被去掉。这样,整个执行计划树重新映射为了一颗匹配硬件加速器的新的执行计划树。加速映射模块再进行节点和输出目标列的优化匹配后,执行计划树转变为如图13所示的执行计划树。
其中,对nation的扫描节点将被优化为同一节点,即只会被执行一次。通过添加verbose进一步查看执行计划,在对lineitem进行扫描时,其输出列包括了所有列,但实际在最终输出和后继节点需要使用的列仅有l_orderkey,l_suppkey,l_shipdate,l_extendedprice,l_discount和l_year 6列。故加速映射模 块将进一步优化各个节点的输出列,输出列优化代码如图14所示。
加速映射模块完成新执行计划树的生成后,加速执行引擎将该新执行计划树和相关参数发送给硬件加速器执行。其可以是对整个新执行计划树进行发送,也可以是一个一个节点进行发送,也可以是多个节点发送,具体可由其所持有的硬件加速器的数字孪生图像来决定。
与本申请实施例中的一种对数据库操作进行加速的方法相对应,本申请实施例还提出了一种对数据库操作进行加速的装置,应用于连接有硬件加速器的数据库中,如图15所示,所述装置包括:
加速映射模块501,用于当接收到由所述数据库发送的需要进行加速的初始执行计划树时,根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树;
加速执行引擎502,用于若所述新执行计划树的新执行代价小于所述初始执行计划树的初始执行代价,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分发至所述硬件加速器执行;
确定模块503,用于根据所述硬件加速器返回的执行结果确定所述硬件加速器对数据库操作加速后的结果数据;
其中,所述新执行代价和所述初始执行代价是基于调用预设代价函数确定的。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (10)

  1. 一种对数据库操作进行加速的方法,应用于连接有硬件加速器的数据库中,其特征在于,所述方法包括:
    当接收到由所述数据库发送的需要进行加速的初始执行计划树时,根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树和新执行代价;
    若所述新执行计划树的新执行代价小于所述初始执行计划树的初始执行代价,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分发至所述硬件加速器执行;
    所述硬件加速器返回的执行结果,完成对数据库操作的加速操作;
    其中,所述新执行代价和所述初始执行代价是基于调用预设代价函数确定的。
  2. 如权利要求1所述的方法,其特征在于,根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树,具体为:
    根据所述基本操作将所述初始执行计划树中执行加速操作的节点进行归一映射,同时还将去掉部分功能被整合的节点,再将部分节点进行预处理后再映射,然后根据归一映射的结果生成与所述硬件加速器匹配的匹配执行计划树;
    根据所述加速资源对所述匹配执行计划树中节点和输出目标列进行优化,并根据优化结果生成所述新执行计划树。
  3. 如权利要求2所述的方法,其特征在于,所述归一映射包括归并、和/或删除、和/或新增。
  4. 如权利要求2所述的方法,其特征在于,根据所述加速资源对所述匹配执行计划树中节点和输出目标列进行优化,具体为:
    根据所述加速资源确定所述硬件加速器访问所需数据的方式、各节点执行数据访问的方式、最终输出和后继节点需要使用的列;
    根据所述硬件加速器访问所需数据的方式确定所述匹配执行计划树中的可并发执行的节点,并将所述匹配执行计划树中对同一数据进行访问的不同 执行节点合并为一个执行动作,并根据最终输出和后继节点需要使用的列对输出目标列进行优化;
    其中,所述硬件加速器访问所需数据的方式包括基于自身内存进行访问、或基于缓存一致性协议对共享内存进行访问、或通过网络对远端存储节点进行访问。
  5. 如权利要求1所述的方法,其特征在于,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分发至所述硬件加速器执行,具体为:
    根据与所述组织方式和所述运行方式对应的数字孪生图像将一个或多个所述新执行计划树整体发送至所述硬件加速器执行、或将所述新执行计划树按单个节点依次发送至所述硬件加速器执行、或将所述新执行计划树中的多个节点组合发送至所述硬件加速器执行;
    其中,所述数字孪生图像包括所述硬件加速器的执行模块数量,可执行功能节点的数量,各可执行功能节点的归属关系和可调度的执行粒度。
  6. 如权利要求5所述的方法,其特征在于,所述方法还包括:
    当接收到初始化指令并接收所述硬件加速器发送的包括组织方式和运行方式的自身信息时,根据所述自身信息生成所述数字孪生图像。
  7. 如权利要求1所述的方法,其特征在于,所述硬件加速器为多个,各所述硬件加速器分别设置有本地存储单元,在根据所述硬件加速器返回的执行结果确定所述硬件加速器对数据库操作加速后的结果数据之前,所述方法还包括:
    调整与所述执行结果对应的各所述硬件加速器的缓存和最终数据,以使各所述缓存和各所述最终数据分别保持一致。
  8. 如权利要求1所述的方法,其特征在于,在根据所述硬件加速器支持的基本操作和加速资源对所述初始执行计划树进行处理并生成新执行计划树之后,所述方法还包括:
    若所述新执行代价不小于所述初始执行代价,基于所述数据库执行所述初始执行计划树,并根据执行结果确定数据库操作的结果。
  9. 一种对数据库操作进行加速的硬件加速装置,其特征在于,所述装置包括:
    加速映射模块,用于当接收到由所述数据库发送的需要进行加速的初始执行计划树时,根据所述硬件加速器支持的基本操作和加速资源对所述初始 执行计划树进行处理并生成新执行计划树;
    加速执行引擎模块,用于若所述新执行计划树的新执行代价小于所述初始执行计划树的初始执行代价,根据所述硬件加速器的组织方式和运行方式将所述新执行计划树分发至所述硬件加速器执行;
    加速存储引擎模块,用于完成加速部分的内存和外存管理,协调多个所述硬件加速器间的缓存和最终数据的一致。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在终端设备上运行时,使得所述终端设备执行如权利要求1-8任一项所述的对数据库操作进行加速的方法。
PCT/CN2021/131504 2021-07-20 2021-11-18 一种对数据库操作进行加速的方法和装置 WO2023000561A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/267,215 US20240045868A1 (en) 2021-07-20 2021-11-18 Method and device for accelerating database operation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110818910.0 2021-07-20
CN202110818910.0A CN113448967B (zh) 2021-07-20 2021-07-20 一种对数据库操作进行加速的方法和装置

Publications (1)

Publication Number Publication Date
WO2023000561A1 true WO2023000561A1 (zh) 2023-01-26

Family

ID=77816840

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131504 WO2023000561A1 (zh) 2021-07-20 2021-11-18 一种对数据库操作进行加速的方法和装置

Country Status (3)

Country Link
US (1) US20240045868A1 (zh)
CN (1) CN113448967B (zh)
WO (1) WO2023000561A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448967B (zh) * 2021-07-20 2022-02-08 威讯柏睿数据科技(北京)有限公司 一种对数据库操作进行加速的方法和装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504018A (zh) * 2014-12-11 2015-04-08 浙江大学 基于浓密树和自顶向下的大数据实时查询优化方法
US20150234895A1 (en) * 2014-02-20 2015-08-20 Citus Data Bilgi Islemleri Ticaret A.S. Apparatus and Method for Processing Distributed Relational Algebra Operators in a Distributed Database
CN107545015A (zh) * 2016-06-29 2018-01-05 华为技术有限公司 一种查询故障的处理方法及处理装置
CN110569257A (zh) * 2019-09-16 2019-12-13 上海达梦数据库有限公司 数据处理方法、相应装置、设备及存储介质
US20190392002A1 (en) * 2018-06-25 2019-12-26 BigStream Solutions, Inc. Systems and methods for accelerating data operations by utilizing dataflow subgraph templates
CN111026776A (zh) * 2019-11-06 2020-04-17 中科驭数(北京)科技有限公司 关系型数据库中的查询方法及装置
CN113448967A (zh) * 2021-07-20 2021-09-28 威讯柏睿数据科技(北京)有限公司 一种对数据库操作进行加速的方法和装置

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8244718B2 (en) * 2006-08-25 2012-08-14 Teradata Us, Inc. Methods and systems for hardware acceleration of database operations and queries
US9424315B2 (en) * 2007-08-27 2016-08-23 Teradata Us, Inc. Methods and systems for run-time scheduling database operations that are executed in hardware
US8468151B2 (en) * 2010-06-29 2013-06-18 Teradata Us, Inc. Methods and systems for hardware acceleration of database operations and queries based on multiple hardware accelerators
CN102436494B (zh) * 2011-11-11 2013-05-01 中国工商银行股份有限公司 基于实践检验的执行计划优化的装置及方法
US8838577B2 (en) * 2012-07-24 2014-09-16 International Business Machines Corporation Accelerated row decompression
US9495418B2 (en) * 2013-08-07 2016-11-15 International Business Machines Corporation Scalable acceleration of database query operations
CN105677683B (zh) * 2014-11-21 2018-12-14 华为技术有限公司 批量数据查询方法和装置
CN112099940A (zh) * 2016-08-26 2020-12-18 华为技术有限公司 实现硬件加速处理的方法、设备和系统
CN106991116B (zh) * 2017-02-10 2020-04-14 阿里巴巴集团控股有限公司 数据库执行计划的优化方法和装置
CN108038215A (zh) * 2017-12-22 2018-05-15 上海达梦数据库有限公司 数据处理方法及系统
US11144548B2 (en) * 2018-04-24 2021-10-12 Dremio Corporation Optimized data structures of a relational cache with a learning capability for accelerating query execution by a data system
US10884795B2 (en) * 2018-04-26 2021-01-05 International Business Machines Corporation Dynamic accelerator scheduling and grouping for deep learning jobs in a computing cluster
US20200301898A1 (en) * 2018-06-25 2020-09-24 BigStream Solutions, Inc. Systems and methods for accelerating data operations by utilizing dataflow subgraph templates
CN110688393B (zh) * 2019-09-29 2021-01-29 星环信息科技(上海)股份有限公司 查询语句优化方法、装置、计算机设备及存储介质
US11275734B2 (en) * 2020-05-05 2022-03-15 Microsoft Technology Licensing, Llc Data lake workload optimization through index modeling and recommendation
CN111625585B (zh) * 2020-05-22 2021-08-31 中科驭数(北京)科技有限公司 硬件加速数据库的访问方法、装置、主机及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150234895A1 (en) * 2014-02-20 2015-08-20 Citus Data Bilgi Islemleri Ticaret A.S. Apparatus and Method for Processing Distributed Relational Algebra Operators in a Distributed Database
CN104504018A (zh) * 2014-12-11 2015-04-08 浙江大学 基于浓密树和自顶向下的大数据实时查询优化方法
CN107545015A (zh) * 2016-06-29 2018-01-05 华为技术有限公司 一种查询故障的处理方法及处理装置
US20190392002A1 (en) * 2018-06-25 2019-12-26 BigStream Solutions, Inc. Systems and methods for accelerating data operations by utilizing dataflow subgraph templates
CN110569257A (zh) * 2019-09-16 2019-12-13 上海达梦数据库有限公司 数据处理方法、相应装置、设备及存储介质
CN111026776A (zh) * 2019-11-06 2020-04-17 中科驭数(北京)科技有限公司 关系型数据库中的查询方法及装置
CN113448967A (zh) * 2021-07-20 2021-09-28 威讯柏睿数据科技(北京)有限公司 一种对数据库操作进行加速的方法和装置

Also Published As

Publication number Publication date
CN113448967B (zh) 2022-02-08
CN113448967A (zh) 2021-09-28
US20240045868A1 (en) 2024-02-08

Similar Documents

Publication Publication Date Title
CN107391653B (zh) 一种分布式NewSQL数据库系统及图片数据储存方法
US9317554B2 (en) SQL generation for assert, update and delete relational trees
CN103309958B (zh) Gpu和cpu混合架构下的olap星型连接查询优化方法
WO2020228801A1 (zh) 一种多语言融合查询方法及多模数据库系统
CN103631870B (zh) 一种用于大规模分布式数据处理的系统及其方法
US8171047B2 (en) Query execution and optimization utilizing a combining network in a parallel computer system
US8996492B2 (en) Graph traversal operator inside a column store
US8954419B2 (en) Method for serial and condition-based execution of operators by parallel processes
Papaphilippou et al. Accelerating database systems using FPGAs: A survey
US20100293135A1 (en) Highconcurrency query operator and method
Yang et al. Flexpushdowndb: Hybrid pushdown and caching in a cloud dbms
US7475056B2 (en) Query processing in a parallel single cursor model on multi-instance configurations, using hints
US20050131879A1 (en) Parallel single cursor model on multiple-server configurations
US10452655B2 (en) In-memory cursor duration temp tables
US10397317B2 (en) Boomerang join: a network efficient, late-materialized, distributed join technique
CN115269561B (zh) 一种混合数据库管理方法、装置、混合数据库及电子设备
WO2023000561A1 (zh) 一种对数据库操作进行加速的方法和装置
Vakharia et al. Shared foundations: Modernizing meta’s data lakehouse
CN111221860A (zh) 基于大数据的混合查询优化方法及装置
CN115982230A (zh) 数据库的跨数据源查询方法、系统、设备及存储介质
US7203676B2 (en) Dynamic performance views with a parallel single cursor model
US20140244681A1 (en) Optimization of join queries for related data
Xie et al. Cool, a COhort OnLine analytical processing system
Castellana et al. High-performance data analytics beyond the relational and graph data models with gems
WO2023279962A1 (zh) 数据处理的方法、装置和计算系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21950799

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18267215

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE