CN112328620A - Distributed database query acceleration method - Google Patents

Distributed database query acceleration method Download PDF

Info

Publication number
CN112328620A
CN112328620A CN202011220443.3A CN202011220443A CN112328620A CN 112328620 A CN112328620 A CN 112328620A CN 202011220443 A CN202011220443 A CN 202011220443A CN 112328620 A CN112328620 A CN 112328620A
Authority
CN
China
Prior art keywords
data
sql
layer
storage layer
fpga
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011220443.3A
Other languages
Chinese (zh)
Inventor
贾德星
张晖
苑晓龙
张炜刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202011220443.3A priority Critical patent/CN112328620A/en
Publication of CN112328620A publication Critical patent/CN112328620A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of distributed databases, in particular to a distributed database query acceleration method, which comprises the steps of pushing SQL (structured query language) layer operators to a storage layer for execution, wherein the storage layer is programmed through an FPGA (field programmable gate array) after pulling data, decoding the data and filtering conditions in an FPGA circuit for parallel execution, and only copying the data meeting the conditions to an SQL layer; and the rest aggregation operators are directly executed in parallel in the storage layer FPGA module, and the result of the aggregation calculation is returned to the SQL layer. Compared with the prior art, the method pushes the SQL operator down to the storage layer for execution, the storage layer directly performs local calculation after reading data, and only returns data meeting conditions, so that the data throughput between the SQL layer and the storage layer is greatly reduced, and the purpose of improving the query and analysis performance of mass data is achieved.

Description

Distributed database query acceleration method
Technical Field
The invention relates to the field of distributed databases, and particularly provides a distributed database query acceleration method.
Background
CockroachDB (CRDB) is an open source distributed database system constructed based on the Google Spanner idea, has the storage management capability of NoSQL on mass data, and maintains the characteristics of ACID, SQL and the like supported by the traditional database.
The storage layer of the cockroachDB adopts an open-source rocksDB storage engine, the SQL layer is developed by Go language, the SQL layer operator realizes the reading and writing of the data of the storage engine through a CGO class library (libroach), and the detailed flow is shown in FIG. 3.
Since RocksDB only supports key-value-format data reading and writing, after a user side sends out an SQL query statement, all data need to be copied from a storage layer to an SQL layer, and then, conditional filtering and aggregation calculation are performed by SQL layer operators, in an OLAP analysis scenario with a large data volume, this calculation method is very inefficient, and the query performance of the user side is seriously affected.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a distributed database query acceleration method with strong practicability.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a distributed database query acceleration method is characterized in that SQL (structured query language) layer operators are pushed down to a storage layer to be executed, the storage layer is programmed through an FPGA (field programmable gate array) after data are pulled, decoding and condition filtering of the data are executed in parallel in an FPGA circuit, and only the data which are in line with submission are copied to an SQL layer;
and the rest aggregation operators are directly executed in parallel in the storage layer FPGA module, and the result of the aggregation calculation is returned to the SQL layer.
Further, the method comprises the following steps:
s1, the SQL layer sends a query request to the storage layer;
s2, calling libroach library to inquire by the storage layer Store object;
s3, invoking iterators of the key value database to scan data line by libroach;
s4, transmitting the scanning data to the FPGA;
s5, concurrently performing data analysis by a calculation function in the FPGA;
s6, libroach transfers eligible data pointers from the C + + library to the Go language layer Store object.
S7, copying data from the C + + memory to the GO memory by the storage layer Store object;
s8, converting the table reader into an SQL processor for execution;
and S9, outputting the data to the client after the SQL processor finishes execution.
Further, in step S1, the table reader of the SQL layer sends a query request filtering scan to the storage layer.
Further, in step S2, the storage tier Store object is a range query method called to the libroach library by the CGO to scan the data meeting the condition.
Further, in step S4, the scanned data is transmitted to a calculation function in the FPGA, and conditional filtering on the data is performed.
Further, in step S5, the computation functions in the FPGA are executed concurrently, the key value data of each row is parsed into a row format, the conditional expression is parsed, and the data is filtered.
Further, in step S8, the table reader reads the line data meeting the condition, and forwards the line data to the next SQL processor for execution.
Further, in step S9, the SQL processor finishes executing, and outputs data to the client via the processor.
Compared with the prior art, the distributed database query acceleration method has the following outstanding beneficial effects:
(1) according to the distributed database query acceleration method, the SQL operator is pushed down to the storage layer to be executed, the storage layer directly performs local calculation after reading data, only data meeting conditions are returned, data throughput between the SQL layer and the storage layer is greatly reduced, and the purpose of improving query analysis performance of mass data is achieved.
(2) The FPGA technology is utilized in the storage layer, decoding, conditional filtering and aggregation calculation of data can be executed in parallel, and the overall execution efficiency of the SQL statement is improved.
(3) The method provided by the invention can accelerate the condition filtering of large data volume of the distributed database and the execution of the aggregation SQL-like statement, and meet the query analysis requirements of mass data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of SQL conditional filter push-down in a distributed database query acceleration method;
FIG. 2 is a flow chart of SQL aggregate computation push-down in a distributed database query acceleration method;
FIG. 3 is a flowchart of the original computation in the background art of a distributed database query acceleration method.
Detailed Description
The present invention will be described in further detail with reference to specific embodiments in order to better understand the technical solutions of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A preferred embodiment is given below:
as shown in fig. 1, in the distributed database query acceleration method in this embodiment, an SQL layer operator is pushed down to a storage layer for execution, the storage layer is programmed through an FPGA after pulling data, decoding and conditional filtering of the data are executed in parallel in an FPGA circuit, and only data that meets submission requirements are copied to an SQL layer;
and the rest aggregation operators are directly executed in parallel in the storage layer FPGA module, and the result of the aggregation calculation is returned to the SQL layer. Therefore, the execution speed of the aggregation operator can be accelerated, the data throughput between the SQL layer and the storage layer is greatly reduced, and the purpose of improving the query analysis performance of mass data is achieved.
Comprises the following steps:
s1, TableReader (table reader) of SQL layer sends query request FilterScan (scanning and filtering) to storage layer, and adds the following conditions: data for "id > 100" is filtered.
S2, the storage layer Store object is a range query method called to the library by CGO, and data meeting the condition of 'id > 100' are scanned.
S3, libroach calls an iterator of rocksdb (key-value store) to scan the data line by line.
And S4, transmitting the scanned mass data to a calculation function in the FPGA, and executing conditional filtering on the data.
And S5, executing the calculation functions in the FPGA concurrently, analyzing the Key-Value data of each line into a line format, analyzing the conditional expression, and filtering the data with the 'id > 100'.
S6, libroach transfers eligible data pointers from the C + + library to the Go language layer Store object.
S7, copying data from the C + + memory to the GO memory by the storage layer Store object.
S8, TableReader (table reader) reads the line data meeting the conditions, and then the SQL processor is forwarded to execute.
And S9, outputting data to the client through a Processor after the SQL Processor is executed.
As shown in fig. 2, similar same steps are also adopted for SQL operators of aggregation calculation classes such as Count and Sum, a request for aggregation calculation is initiated to a storage layer Store object in the SQL layer, the Store object is called to a libroach library aggregation calculation method through CGO, and libroach can directly perform aggregation calculation operation when scanning data, and only needs to return an aggregated result.
Pushing SQL calculation to the storage layer, and realizing Value decoding of KV data in the storage layer, wherein the Value format in CockroachDB is as follows: the decoding process consumes CPU time, because each KV data decoding process is independent, Value decoding can be achieved in FPGA programming, and data is filtered according to a conditional expression, so that the execution of decoding and conditional filtering is accelerated, and the execution efficiency of the whole SQL statement is improved.
The above embodiments are only specific ones of the present invention, and the scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions that are consistent with the claims of a distributed database query acceleration method of the present invention and are made by those skilled in the art shall fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. A distributed database query acceleration method is characterized in that SQL (structured query language) layer operators are pushed down to a storage layer to be executed, the storage layer is programmed through an FPGA (field programmable gate array) after data are pulled, decoding and condition filtering of the data are executed in parallel in an FPGA circuit, and only the data which are consistent with submission are copied to an SQL layer;
and the rest aggregation operators are directly executed in parallel in the storage layer FPGA module, and the result of the aggregation calculation is returned to the SQL layer.
2. The method for accelerating the query of a distributed database according to claim 1, comprising the steps of:
s1, the SQL layer sends a query request to the storage layer;
s2, calling libroach library to inquire by the storage layer Store object;
s3, invoking iterators of the key value database to scan data line by libroach;
s4, transmitting the scanning data to the FPGA;
s5, concurrently performing data analysis by a calculation function in the FPGA;
s6, libroach transfers eligible data pointers from the C + + library to the Go language layer Store object.
S7, copying data from the C + + memory to the GO memory by the storage layer Store object;
s8, converting the table reader into an SQL processor for execution;
and S9, outputting the data to the client after the SQL processor finishes execution.
3. The distributed database query acceleration method of claim 2, wherein in step S1, the table reader of the SQL layer sends the query request filtering scan to the storage layer.
4. The distributed database query acceleration method of claim 2, wherein in step S2, the storage tier Store object is a range query method called to libroach library by CGO to scan the data meeting the condition.
5. The distributed database query acceleration method according to claim 2, characterized in that, in step S4, the scanned data are transmitted to a calculation function in the FPGA, and the conditional filtering of the data is executed concurrently.
6. The method according to claim 2, wherein in step S5, the computation functions in the FPGA are executed concurrently, the key value data of each row is parsed into a row format, the conditional expressions are parsed, and the data are filtered.
7. The method according to claim 2, wherein in step S8, the table reader reads a row of data meeting the condition, and forwards the row of data to the next SQL processor for execution.
8. The method of claim 2, wherein in step S9, after the SQL processor finishes executing, the processor outputs data to the client.
CN202011220443.3A 2020-11-05 2020-11-05 Distributed database query acceleration method Pending CN112328620A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011220443.3A CN112328620A (en) 2020-11-05 2020-11-05 Distributed database query acceleration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011220443.3A CN112328620A (en) 2020-11-05 2020-11-05 Distributed database query acceleration method

Publications (1)

Publication Number Publication Date
CN112328620A true CN112328620A (en) 2021-02-05

Family

ID=74317032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011220443.3A Pending CN112328620A (en) 2020-11-05 2020-11-05 Distributed database query acceleration method

Country Status (1)

Country Link
CN (1) CN112328620A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925954A (en) * 2021-03-05 2021-06-08 北京中经惠众科技有限公司 Method and apparatus for querying data in a graph database
CN116841752A (en) * 2023-08-31 2023-10-03 杭州瞬安信息科技有限公司 Data analysis and calculation system based on distributed real-time calculation framework

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677812A (en) * 2015-12-31 2016-06-15 华为技术有限公司 Method and device for querying data
CN105868388A (en) * 2016-04-14 2016-08-17 中国人民大学 Method for memory on-line analytical processing (OLAP) query optimization based on field programmable gate array (FPGA)
CN107122490A (en) * 2017-05-18 2017-09-01 郑州云海信息技术有限公司 The data processing method and system of aggregate function in a kind of Querying by group
CN111046072A (en) * 2019-11-29 2020-04-21 浪潮(北京)电子信息产业有限公司 Data query method, system, heterogeneous computing acceleration platform and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677812A (en) * 2015-12-31 2016-06-15 华为技术有限公司 Method and device for querying data
CN105868388A (en) * 2016-04-14 2016-08-17 中国人民大学 Method for memory on-line analytical processing (OLAP) query optimization based on field programmable gate array (FPGA)
CN107122490A (en) * 2017-05-18 2017-09-01 郑州云海信息技术有限公司 The data processing method and system of aggregate function in a kind of Querying by group
CN111046072A (en) * 2019-11-29 2020-04-21 浪潮(北京)电子信息产业有限公司 Data query method, system, heterogeneous computing acceleration platform and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925954A (en) * 2021-03-05 2021-06-08 北京中经惠众科技有限公司 Method and apparatus for querying data in a graph database
CN112925954B (en) * 2021-03-05 2024-05-24 北京中经惠众科技有限公司 Method and device for querying data in graph database
CN116841752A (en) * 2023-08-31 2023-10-03 杭州瞬安信息科技有限公司 Data analysis and calculation system based on distributed real-time calculation framework
CN116841752B (en) * 2023-08-31 2023-11-07 杭州瞬安信息科技有限公司 Data analysis and calculation system based on distributed real-time calculation framework

Similar Documents

Publication Publication Date Title
US8332389B2 (en) Join order for a database query
US8396852B2 (en) Evaluating execution plan changes after a wakeup threshold time
KR102361153B1 (en) Managing data profiling operations related to data type
CN109491989B (en) Data processing method and device, electronic equipment and storage medium
US11693912B2 (en) Adapting database queries for data virtualization over combined database stores
US11893011B1 (en) Data query method and system, heterogeneous acceleration platform, and storage medium
CN101021874A (en) Method and apparatus for optimizing request to poll SQL
US10915535B2 (en) Optimizations for a behavior analysis engine
CN109947804B (en) Data set query optimization method and device, server and storage medium
CN111008020B (en) Method for analyzing logic expression into general query statement
CN112328620A (en) Distributed database query acceleration method
CN115033646B (en) Method for constructing real-time warehouse system based on Flink and Doris
US10997175B2 (en) Method for predicate evaluation in relational database systems
US9870399B1 (en) Processing column-partitioned data for row-based operations in a database system
Rompf et al. A SQL to C compiler in 500 lines of code
CN114328606B (en) Method, device and storage medium for improving SQL execution efficiency
CN115374121A (en) Database index generation method, machine-readable storage medium and computer equipment
CN112835932B (en) Batch processing method and device for business table and nonvolatile storage medium
CN114461454A (en) Data recovery method and device, storage medium and electronic equipment
CN109543079B (en) Data query method and device, computing equipment and storage medium
CN115952200B (en) MPP architecture-based multi-source heterogeneous data aggregation query method and device
CN115544096B (en) Data query method and device, computer equipment and storage medium
CN116680299B (en) Database query method, system, storage medium and device
CN115563183B (en) Query method, query device and program product
CN111159218B (en) Data processing method, device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210205

RJ01 Rejection of invention patent application after publication