CN109977116B

CN109977116B - FPGA-DDR-based hash connection operator acceleration method and system

Info

Publication number: CN109977116B
Application number: CN201910192544.5A
Authority: CN
Inventors: 齐乐; 李凯一; 彭福来; 吴登勇
Original assignee: Chaoyue Technology Co Ltd
Current assignee: Chaoyue Technology Co Ltd
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2023-04-21
Anticipated expiration: 2039-03-14
Also published as: CN109977116A

Abstract

The invention discloses a hash connection operator acceleration method and a system based on FPGA-DDR, which belong to the field of memory database acceleration, and solve the technical problems of realizing acceleration operation in the construction stage and the detection stage of a hash connection algorithm; under the cooperation of the DDR memory and the FPGA chip, the construction and matching of the hash table are realized by executing parallel multi-thread operation in the construction stage and the detection stage of the hash connection. The system comprises a DDR memory and an FPGA chip, wherein the DDR memory is used for storing tuples, hash tables and linked lists, and the FPGA chip is used for storing base addresses, performing hash calculation, updating the hash tables and the linked lists and controlling interactive communication facing the DDR memory.

Description

FPGA-DDR-based hash connection operator acceleration method and system

Technical Field

The invention relates to the field of memory database acceleration, in particular to a hash connection operator acceleration method and system based on FPGA-DDR.

Background

With the increasing amount of information, the size of the database is larger and larger, and the real-time performance requirement of the data analysis task is also stricter. For the conditions of small scale and high real-time requirement, the method can directly access and operate in a high-speed Block-RAM in the FPGA, but the scale and the number of the high-speed Block-RAM cannot meet the number requirement of a large database at present due to the limitation of the performance of an FPGA device. In many application scenarios, one is still required to build a larger memory database by accessing the FPGA external memory storage (DDR 3/DDR 4). Compared with the on-chip solution based on the Block-RAM, the memory database based on the DDR memory has larger data capacity, lower response delay compared with the traditional disk database solution, and has higher research and practical application values in terms of cost performance, system overhead and practicability.

Based on an FPGA platform, how to take off-chip DDR3/DDR4 as a main memory, aiming at hash connection operators frequently called in a database, and realizing high-efficiency acceleration operation in a construction stage and a detection stage of the hash connection algorithm is a technical problem to be solved.

Disclosure of Invention

The technical task of the invention is to provide the hash connection operator acceleration method and the system based on the FPGA-DDR, which solve the problem of how to take an off-chip DDR memory as a main memory and realize acceleration operation in the construction stage and the detection stage of the hash connection algorithm.

In a first aspect, the invention provides a hash connection operator acceleration method based on FPGA-DDR, which comprises the steps of storing a tuple, a hash table and a linked list through a DDR memory, storing a base address and constructing a database tuple through an on-chip register of an FPGA chip, carrying out hash calculation through the FPGA chip, updating the hash table and the linked list, and controlling interactive communication of the FPGA chip facing the DDR memory; the tuples include build tuples and probe tuples; the connection keys with the same hash value are assigned to the same hash bucket, elements in each hash bucket are linked through a linked list, and a head node of each linked list is stored in the hash table; under the cooperation of the DDR memory and the FPGA chip, the construction and matching of the hash table are realized by executing parallel multi-thread operation in the construction stage and the detection stage of the hash connection.

In this embodiment, the tuple, the hash table and the linked list are stored by the DDR memory, wherein the tuple includes a construction tuple of the construction stage and a probe tuple of the probe stage; storing base addresses and constructing database tuples through on-chip registers of the FPGA chip, wherein the base addresses comprise base addresses of all modules in the FPGA chip, base addresses of a constructing database, base addresses of a hash table and base addresses of a linked list, and the constructing database tuples comprise the number of the tuples, the tuple size, the position of a connecting key and the size of the hash table; the FPGA on-chip register resources are used to program and save the pointing relationships at run-time. The main calculation work is migrated to the inside of the FPGA chip and matched with the external DDR memory, and the problems of low single thread execution efficiency, memory access bottleneck and the like in the traditional scheme are solved through massive parallel multi-thread operation in two key links of the construction stage and the exploration stage.

Preferably, the construction stage and the detection stage of the hash connection comprise:

creating a corresponding thread for each tuple in the FPGA chip, and generating a request for a connection key of the thread;

the FPGA chip sends a tuple connection key request to the DDR memory through a plurality of channels, the hash table responds to the tuple connection key request and returns a head node of the linked list to the FPGA chip to finish the tuple request;

after the tuple request is completed, the corresponding thread is activated;

and calculating hash values of the connection keys in the FPGA chip, and storing the hash values of the connection keys and the connection keys into a state of the thread.

Preferably, when sending the tuple connection key request to the DDR memory, if the tuple connection key is split between two memory locations, two tuple connection key requests are sent, and the response information of the two tuple connection key requests is merged.

Preferably, the construction stage of the hash connection further comprises the following operations: and writing the connection key value and the tuple pointer into a new node of the corresponding hash bucket linked list through the FPGA chip.

Preferably, the construction phase of the hash connection further comprises reading and updating the hash table by making an atomic request, comprising the following sub-steps:

sending an atomic request to the DDR memory through the FPGA chip;

the atom request is responded through the linked list, and a head pointer of the hash bucket is returned to the FPGA chip;

after the atomic request is answered, the thread is activated;

the head pointer of the hash bucket is replaced by the pointer of the new node to update the linked list and hash table.

Preferably, when the head pointer of the hash bucket is replaced by the pointer of the new node in the FPGA chip, if no matching item with the new node exists in the linked list corresponding to the returned hash bucket head pointer, the atomic request returns the head pointer of the hash bucket which is an empty bucket.

Preferably, the hash table reading is further included in the probing phase of the hash connection, which includes the following sub-steps:

sending a hash table checking request to a DDR (double data rate) memory through an FPGA (field programmable gate array) chip, responding to the hash table checking request through a detection tuple, returning a connecting key to the FPGA chip, searching a head pointer of a corresponding hash bucket through a hash value of the connecting key by a thread, and if the hash bucket is a storage bucket, judging that the detection tuple is matched and writing the thread into a FIFO module of the FPGA chip;

and if the hash bucket is an empty bucket, judging that the detection tuples are not matched, and deleting the detection tuples in a data path of the FPGA chip.

Preferably, the detecting stage of the hash connection further includes detecting whether all nodes of the linked list match, which includes the following sub-steps:

the FPGA chip sends a linked list node request to the DDR memory through a double channel, and the activated thread is requested to correspond to the linked list node of the linked list;

the linked list node request is responded through the linked list, and the linked list node is returned to the FPGA chip;

and in the FPGA chip, judging whether the construction tuple and the detection tuple are paired, if so, combining the detection tuple pointer and the construction tuple pointer, and outputting a combined result.

More preferably, after the pairing of the construction tuple and the detection tuple is judged, judging whether the returned linked list node is the last node in the linked list, and if the returned linked list node is the last node, deleting the corresponding thread from the data path of the FPGA chip;

otherwise, the pointer of the next linked list node corresponding to the returned linked list node is updated in the thread state, and the thread corresponding to the next linked list node is activated.

In a second aspect, the invention provides a hash connection operator acceleration system based on FPGA-DDR, comprising a DDR memory and an FPGA chip, wherein the DDR memory is used for storing tuples, hash tables and linked lists, the tuples comprise construction tuples and detection tuples, the hash tables store head nodes of each linked list, the connection keys with the same hash value are assigned to the same hash bucket, and elements in each hash bucket are linked through one linked list;

the FPGA chip is configured with:

a tuple request module for constructing a thread for each tuple and generating a request for its connection key;

the hash calculation module is used for calculating the hash value of the connecting key;

the writing chain table module is used for writing the connection key value and the tuple pointer into a new node of the corresponding hash bucket chain table;

a hash table processing module (replaced by an update hash table module) for generating an atomic request to read and update the hash table and for generating a hash table look-up request to read the hash table;

the update linked list module is used for updating the linked list;

the detection linked list module is used for detecting and accepting the activated thread and requesting the linked list node of the linked list corresponding to the activated thread;

an analysis module for determining whether the probe tuple and the build tuple are paired;

the connection tuple data/pointer module is used for receiving the matched detection tuple and the construction tuple, combining the future detection tuple pointer and the construction tuple pointer and outputting the combined result;

the judging module is used for determining the next activated thread;

the system comprises a FIFO module, a synchronous FIFO module and a cyclic FIFO module, wherein the FIFO module is used for realizing first-in first-out;

the on-chip register is used for storing the base address of the constructed database, the base address of the constructed database tuple, the hash table and the base address of the linked list;

the construction database tuple comprises tuple number, tuple size, connection key position, hash table size.

In the above technical solution, the acceleration system is used for implementing the hash connection operator acceleration method based on FPGA-DDR of the first aspect.

The method and the system for accelerating the hash connection operator based on the FPGA-DDR have the following advantages: the method is used for optimizing the bottom layer framework of the database, and can effectively improve the operation efficiency of the hash connection operator in the large memory database; through massive parallel multi-thread operation in two key links of a construction stage and a probing stage, the problems of low single thread execution efficiency, memory access bottleneck and the like in the traditional scheme are solved; the FPGA can support deep pipelining, and thousands of outstanding memory requests can be maintained at the same time, so that the problem of DDR interface bandwidth can be solved, the parallelism is improved, the execution complexity of a database algorithm is reduced, excessive multi-level cache overhead like a CPU (Central processing Unit) is not needed, and compared with the traditional scheme, the method has an obvious acceleration effect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of the working principle of the construction stage in the method for accelerating hash join operators based on FPGA-DDR of embodiment 1;

fig. 2 is a schematic diagram of the working principle of the detection stage in the hash connection operator acceleration method based on FPGA-DDR in embodiment 1.

Detailed Description

The invention will be further described with reference to the accompanying drawings and specific examples, so that those skilled in the art can better understand the invention and implement it, but the examples are not meant to limit the invention, and the technical features of the embodiments of the invention and the examples can be combined with each other without conflict.

The embodiment of the invention provides a hash connection operator acceleration method and a hash connection operator acceleration system based on an FPGA-DDR (field programmable gate array-double data rate), which are used for solving the technical problem of how to take an off-chip DDR memory as a main memory and realize acceleration operation in a construction stage and a detection stage of a hash connection algorithm.

Example 1:

according to the FPGA-DDR-based hash connection operator acceleration method, a DDR memory is an external memory, and tuples, hash tables and linked lists are stored through the DDR memory; the FPGA chip is configured with an on-chip register, a tuple request module, a hash calculation module, a writing link list module, a hash table processing module, an updating link list module, a detection link list module, a judging module, an analyzing module, a tuple data/pointer connecting module, a FIFO module, a synchronous FIFO module and a circulating FIFO module, wherein the on-chip register resource comprises a base address for constructing a database, a base address for constructing a database tuple, a hash table, a base address of a link list and base addresses of the modules; under the cooperation of the DDR memory and the FPGA chip, the construction and matching of the hash table are realized by executing parallel multi-thread operation in the construction stage and the detection stage of the hash connection.

The hash value of the hash table is the same as the hash value of the hash table, the elements in each hash table are linked through a linked list, and the head node of each linked list is stored in the hash table.

The hash bucket is stored with a linked list, and is not contained with a linked list, and the empty bucket in the hash table is represented by a unique value (0 xFFFFFFFF).

The build database tuple includes a tuple number, a tuple size, a join key location, and a hash table size.

During the construction phase of the hash connection, operations are performed including issuing threads, tuple requests, hash table processing.

Specifically, the sending out the thread is that, in the FPGA chip, a corresponding thread is created for each constructed tuple by the tuple request module, and a request for its connection key is generated.

The tuple request includes the following operations:

after the tuple request is completed, the corresponding thread is activated;

in the FPGA core, hash values of the connecting keys are calculated through a hash calculation module, the hash values of the connecting keys and the connecting keys are transmitted to a FIFO module, and the hash values of the connecting keys and the connecting keys are stored into a state of a thread through the FIFO module;

meanwhile, in the FPGA chip, the writing chain table module writes the connection key value and the tuple pointer into a new node of the corresponding storage barrel chain table, and the new node is a new chain table node created in the corresponding storage barrel chain table.

In the above step, when sending the tuple connection key request to the DDR memory, if the tuple connection key is split between two memory locations, sending two tuple connection key requests, and merging the response information of the two tuple connection key requests.

The hash table processing is used for reading and updating the hash table by making an atomic request for the FPGA chip, and specifically comprises the following operations:

the hash table processing module sends an atomic request to the DDR memory;

the atomic request is responded through the linked list, and a head pointer of the hash bucket is returned to the synchronous FIFO module;

after the atomic request is answered, the thread is activated;

and replacing the head pointer of the hash bucket by the pointer of the new node in the update linked list module to update the linked list and the hash table, and if no matching item with the new node exists in the linked list corresponding to the returned hash bucket head pointer, returning the atomic request to the head pointer of the hash bucket which is an empty bucket.

In the probe phase of the hash connection, operations are performed including issuing threads, tuple requests, hash table processing, and detecting if all nodes of the hash bucket list match.

Specifically, the sending out the thread is that, in the FPGA chip, a corresponding thread is created for each probe tuple by the tuple request module, and a request for its connection key is generated.

The tuple request includes the following operations:

the FPGA chip sends a tuple connection key to the DDR memory through a plurality of channels, the hash table responds to the tuple connection key request, and a head node of the linked list is returned to the FPGA chip to complete the tuple request;

after the tuple request is completed, the corresponding thread is activated;

in the FPGA core, hash values of the connecting keys are calculated through the hash calculation module, the hash values of the connecting keys and the connecting keys are transmitted to the FIFO module, and the hash values of the connecting keys and the connecting keys are stored into a state of a thread through the FIFO module.

The hash table processing is hash table reading, and comprises the following sub-steps:

in an FPGA chip, a hash table processing module sends a hash table checking request to a DDR memory, a detection tuple answers and returns a connecting key to a synchronous FIFO module, a thread in the synchronous FIFO module searches a head pointer of a corresponding hash bucket through a hash value of the connecting key, if the hash bucket is a storage bucket, the detection tuple is judged to be matched, and the thread is sent to the FIFO module; and if the hash bucket is an empty bucket, judging that the detection tuples are not matched, and deleting the detection tuples in a data path of the FPGA chip.

In the construction phase of the hash connection, it is necessary to check whether all nodes of the linked list are matched, and the thread cannot predict the linked list length of the hash bucket, so it is necessary to cycle through the path to the last node in the linked list.

Detecting whether all nodes of the linked list are matched, wherein the method comprises the following substeps:

in the FPGA chip, the detection linked list module receives the activated thread and sends a linked list node request, namely, the activated thread is requested to correspond to the linked list node of the linked list;

in the DDR memory, the linked list answers the linked list node request and returns the linked list node to the analysis module; judging whether the constructed tuple and the detection tuple are paired or not through an analysis module in the FGPA chip;

after the pairing of the construction tuple and the detection tuple is judged, the paired detection tuple and the construction tuple are sent to a connection tuple data/pointer module;

after the pairing of the construction tuple and the detection tuple is judged, judging whether the returned linked list node is the last node in the linked list or not through a judging module, deleting the corresponding thread from the data path of the FPGA chip if the returned linked list node is the last node, otherwise, updating the pointer of the next linked list node corresponding to the returned linked list node in the thread state, and activating the thread corresponding to the next linked list node;

after the matching is completed, the detecting tuple pointer from the thread and the constructing tuple pointer from the linked list node are combined, and the combined result is output.

Example 2:

the invention relates to a hash connection operator acceleration system based on FPGA-DDR, which comprises a DDR memory and an FPGA chip, wherein the DDR memory is used for storing tuples, hash tables and linked lists, the tuples comprise construction tuples and detection tuples, the hash tables are stored with head nodes of each linked list, the connection keys with the same hash value are assigned to the same hash bucket, and elements in each hash bucket are linked through one linked list;

the FPGA chip is provided with the following modules:

the hash table processing module is used for generating an atomic request to read and update the hash table and generating a hash table check request to read the hash table;

the update linked list module is used for updating the linked list;

the connection tuple data/pointer module is used for receiving the paired detection tuple and the construction tuple, combining the detection tuple pointer from the thread and the construction tuple pointer from the linked list node, and outputting the combined result;

the judging module is used for determining the next activated thread;

the building database tuple includes a tuple number, a tuple size, a join key location, and a hash table size.

The hash connection operator acceleration system based on the FPGA-DDR can execute the hash connection operator acceleration method based on the FPGA-DDR disclosed in the embodiment 1.

The above-described embodiments are merely preferred embodiments for fully explaining the present invention, and the scope of the present invention is not limited thereto. Equivalent substitutions and modifications will occur to those skilled in the art based on the present invention, and are intended to be within the scope of the present invention. The protection scope of the invention is subject to the claims.

Claims

1. The method is characterized in that a DDR memory is used for storing tuples, hash tables and linked lists, wherein the tuples comprise a construction tuple of a construction stage and a detection tuple of a detection stage; storing a base address and constructing a database tuple through an on-chip register of the FPGA chip, performing hash calculation, updating a hash table and a linked list through the FPGA chip, and controlling interaction communication of the FPGA chip for the DDR memory, wherein the base address comprises a base address of each module in the FPGA chip, a base address of a constructed database, a base address of the hash table and a base address of the linked list, and the construction of the database tuple comprises the construction of the database tuple comprising the number of tuples, the tuple size, the position of a connecting key and the hash table size; the register resource in the FPGA chip is used for programming and storing the pointing relation during running;

the connection keys with the same hash value are assigned to the same hash bucket, elements in each hash bucket are linked through a linked list, and a head node of each linked list is stored in the hash table;

under the cooperation of the DDR memory and the FPGA chip, the construction and matching of the hash table are realized by executing parallel multi-thread operation in the construction stage and the detection stage of the hash connection.

2. The FPGA-DDR-based hash join operator acceleration method of claim 1, wherein in both the construction phase and the probing phase of the hash join, comprising:

after the tuple request is completed, the corresponding thread is activated;

3. The method for accelerating hash join operator based on FPGA-DDR as claimed in claim 2, wherein when sending tuple join key requests to DDR memory, if the join key of tuple is split between two memory locations, sending two tuple join key requests, and merging response information of the two tuple join key requests.

4. The FPGA-DDR-based hash join operator acceleration method of claim 2, further comprising, at a hash join construction stage:

and writing the connection key value and the tuple pointer into a new node of the corresponding linked list through the FPGA chip.

5. The FPGA-DDR-based hash join operator acceleration method of claim 4, further comprising reading and updating a hash table by making an atomic request at a construction stage of the hash join, comprising the sub-steps of:

sending an atomic request to the DDR memory through the FPGA chip;

after the atomic request is answered, the thread is activated;

6. The method for accelerating hash join operator based on FPGA-DDR according to claim 5, wherein when the head pointer of the hash bucket is replaced by the pointer of the new node in the FPGA chip, if no matching item with the new node exists in the linked list corresponding to the returned hash bucket head pointer, the atomic request returns the head pointer of the hash bucket which is the empty bucket.

7. The FPGA-DDR based hash join operator acceleration method of claim 2, further comprising a hash table read during a probing phase of the hash join, comprising the sub-steps of:

8. The FPGA-DDR based hash join operator acceleration method of claim 2, further comprising detecting if all nodes of the linked list match during a probing phase of the hash join, comprising the sub-steps of:

9. The method for accelerating hash connection operator based on FPGA-DDR as claimed in claim 8, wherein after determining that the construction tuple and the detection tuple are paired, determining whether the returned linked list node is the last node in the linked list, if so, deleting the corresponding thread from the data path of the FPGA chip;

10. The hash connection operator acceleration system based on the FPGA-DDR is characterized by comprising a DDR memory and an FPGA chip, wherein the DDR memory is used for storing tuples, hash tables and linked lists, the tuples comprise construction tuples and detection tuples, the hash tables are stored with head nodes of each linked list, the connection keys with the same hash value are assigned to the same hash bucket, and elements in each hash bucket are linked through one linked list;

the FPGA chip is configured with:

the writing chain table module is used for writing the connection key value and the tuple pointer into a new node of the corresponding chain table;

the update linked list module is used for updating the linked list;

the connection tuple data and pointer module is used for receiving the matched detection tuple and the construction tuple, combining the detection tuple pointer and the construction tuple pointer and outputting the combined result;

the judging module is used for determining the next activated thread;

the build database tuple includes the tuple number, tuple size, join key location, and hash table size.