CN112328620A

CN112328620A - Distributed database query acceleration method

Info

Publication number: CN112328620A
Application number: CN202011220443.3A
Authority: CN
Inventors: 贾德星; 张晖; 苑晓龙; 张炜刚
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2021-02-05

Abstract

The invention relates to the field of distributed databases, in particular to a distributed database query acceleration method, which comprises the steps of pushing SQL (structured query language) layer operators to a storage layer for execution, wherein the storage layer is programmed through an FPGA (field programmable gate array) after pulling data, decoding the data and filtering conditions in an FPGA circuit for parallel execution, and only copying the data meeting the conditions to an SQL layer; and the rest aggregation operators are directly executed in parallel in the storage layer FPGA module, and the result of the aggregation calculation is returned to the SQL layer. Compared with the prior art, the method pushes the SQL operator down to the storage layer for execution, the storage layer directly performs local calculation after reading data, and only returns data meeting conditions, so that the data throughput between the SQL layer and the storage layer is greatly reduced, and the purpose of improving the query and analysis performance of mass data is achieved.

Description

Distributed database query acceleration method

Technical Field

The invention relates to the field of distributed databases, and particularly provides a distributed database query acceleration method.

Background

CockroachDB (CRDB) is an open source distributed database system constructed based on the Google Spanner idea, has the storage management capability of NoSQL on mass data, and maintains the characteristics of ACID, SQL and the like supported by the traditional database.

The storage layer of the cockroachDB adopts an open-source rocksDB storage engine, the SQL layer is developed by Go language, the SQL layer operator realizes the reading and writing of the data of the storage engine through a CGO class library (libroach), and the detailed flow is shown in FIG. 3.

Since RocksDB only supports key-value-format data reading and writing, after a user side sends out an SQL query statement, all data need to be copied from a storage layer to an SQL layer, and then, conditional filtering and aggregation calculation are performed by SQL layer operators, in an OLAP analysis scenario with a large data volume, this calculation method is very inefficient, and the query performance of the user side is seriously affected.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a distributed database query acceleration method with strong practicability.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a distributed database query acceleration method is characterized in that SQL (structured query language) layer operators are pushed down to a storage layer to be executed, the storage layer is programmed through an FPGA (field programmable gate array) after data are pulled, decoding and condition filtering of the data are executed in parallel in an FPGA circuit, and only the data which are in line with submission are copied to an SQL layer;

and the rest aggregation operators are directly executed in parallel in the storage layer FPGA module, and the result of the aggregation calculation is returned to the SQL layer.

Further, the method comprises the following steps:

s1, the SQL layer sends a query request to the storage layer;

s2, calling libroach library to inquire by the storage layer Store object;

s3, invoking iterators of the key value database to scan data line by libroach;

s4, transmitting the scanning data to the FPGA;

s5, concurrently performing data analysis by a calculation function in the FPGA;

s6, libroach transfers eligible data pointers from the C + + library to the Go language layer Store object.

S7, copying data from the C + + memory to the GO memory by the storage layer Store object;

s8, converting the table reader into an SQL processor for execution;

and S9, outputting the data to the client after the SQL processor finishes execution.

Further, in step S1, the table reader of the SQL layer sends a query request filtering scan to the storage layer.

Further, in step S2, the storage tier Store object is a range query method called to the libroach library by the CGO to scan the data meeting the condition.

Further, in step S4, the scanned data is transmitted to a calculation function in the FPGA, and conditional filtering on the data is performed.

Further, in step S5, the computation functions in the FPGA are executed concurrently, the key value data of each row is parsed into a row format, the conditional expression is parsed, and the data is filtered.

Further, in step S8, the table reader reads the line data meeting the condition, and forwards the line data to the next SQL processor for execution.

Further, in step S9, the SQL processor finishes executing, and outputs data to the client via the processor.

Compared with the prior art, the distributed database query acceleration method has the following outstanding beneficial effects:

(1) according to the distributed database query acceleration method, the SQL operator is pushed down to the storage layer to be executed, the storage layer directly performs local calculation after reading data, only data meeting conditions are returned, data throughput between the SQL layer and the storage layer is greatly reduced, and the purpose of improving query analysis performance of mass data is achieved.

(2) The FPGA technology is utilized in the storage layer, decoding, conditional filtering and aggregation calculation of data can be executed in parallel, and the overall execution efficiency of the SQL statement is improved.

(3) The method provided by the invention can accelerate the condition filtering of large data volume of the distributed database and the execution of the aggregation SQL-like statement, and meet the query analysis requirements of mass data.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of SQL conditional filter push-down in a distributed database query acceleration method;

FIG. 2 is a flow chart of SQL aggregate computation push-down in a distributed database query acceleration method;

FIG. 3 is a flowchart of the original computation in the background art of a distributed database query acceleration method.

Detailed Description

The present invention will be described in further detail with reference to specific embodiments in order to better understand the technical solutions of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A preferred embodiment is given below:

as shown in fig. 1, in the distributed database query acceleration method in this embodiment, an SQL layer operator is pushed down to a storage layer for execution, the storage layer is programmed through an FPGA after pulling data, decoding and conditional filtering of the data are executed in parallel in an FPGA circuit, and only data that meets submission requirements are copied to an SQL layer;

and the rest aggregation operators are directly executed in parallel in the storage layer FPGA module, and the result of the aggregation calculation is returned to the SQL layer. Therefore, the execution speed of the aggregation operator can be accelerated, the data throughput between the SQL layer and the storage layer is greatly reduced, and the purpose of improving the query analysis performance of mass data is achieved.

Comprises the following steps:

s1, TableReader (table reader) of SQL layer sends query request FilterScan (scanning and filtering) to storage layer, and adds the following conditions: data for "id > 100" is filtered.

S2, the storage layer Store object is a range query method called to the library by CGO, and data meeting the condition of 'id > 100' are scanned.

S3, libroach calls an iterator of rocksdb (key-value store) to scan the data line by line.

And S4, transmitting the scanned mass data to a calculation function in the FPGA, and executing conditional filtering on the data.

And S5, executing the calculation functions in the FPGA concurrently, analyzing the Key-Value data of each line into a line format, analyzing the conditional expression, and filtering the data with the 'id > 100'.

S7, copying data from the C + + memory to the GO memory by the storage layer Store object.

S8, TableReader (table reader) reads the line data meeting the conditions, and then the SQL processor is forwarded to execute.

And S9, outputting data to the client through a Processor after the SQL Processor is executed.

As shown in fig. 2, similar same steps are also adopted for SQL operators of aggregation calculation classes such as Count and Sum, a request for aggregation calculation is initiated to a storage layer Store object in the SQL layer, the Store object is called to a libroach library aggregation calculation method through CGO, and libroach can directly perform aggregation calculation operation when scanning data, and only needs to return an aggregated result.

Pushing SQL calculation to the storage layer, and realizing Value decoding of KV data in the storage layer, wherein the Value format in CockroachDB is as follows: the decoding process consumes CPU time, because each KV data decoding process is independent, Value decoding can be achieved in FPGA programming, and data is filtered according to a conditional expression, so that the execution of decoding and conditional filtering is accelerated, and the execution efficiency of the whole SQL statement is improved.

The above embodiments are only specific ones of the present invention, and the scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions that are consistent with the claims of a distributed database query acceleration method of the present invention and are made by those skilled in the art shall fall within the scope of the present invention.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A distributed database query acceleration method is characterized in that SQL (structured query language) layer operators are pushed down to a storage layer to be executed, the storage layer is programmed through an FPGA (field programmable gate array) after data are pulled, decoding and condition filtering of the data are executed in parallel in an FPGA circuit, and only the data which are consistent with submission are copied to an SQL layer;

2. The method for accelerating the query of a distributed database according to claim 1, comprising the steps of:

s1, the SQL layer sends a query request to the storage layer;

s2, calling libroach library to inquire by the storage layer Store object;

s3, invoking iterators of the key value database to scan data line by libroach;

s4, transmitting the scanning data to the FPGA;

s8, converting the table reader into an SQL processor for execution;

3. The distributed database query acceleration method of claim 2, wherein in step S1, the table reader of the SQL layer sends the query request filtering scan to the storage layer.

4. The distributed database query acceleration method of claim 2, wherein in step S2, the storage tier Store object is a range query method called to libroach library by CGO to scan the data meeting the condition.

5. The distributed database query acceleration method according to claim 2, characterized in that, in step S4, the scanned data are transmitted to a calculation function in the FPGA, and the conditional filtering of the data is executed concurrently.

6. The method according to claim 2, wherein in step S5, the computation functions in the FPGA are executed concurrently, the key value data of each row is parsed into a row format, the conditional expressions are parsed, and the data are filtered.

7. The method according to claim 2, wherein in step S8, the table reader reads a row of data meeting the condition, and forwards the row of data to the next SQL processor for execution.

8. The method of claim 2, wherein in step S9, after the SQL processor finishes executing, the processor outputs data to the client.