WO2019165901A1

WO2019165901A1 - Data merging method, fpga-based merger and database system

Info

Publication number: WO2019165901A1
Application number: PCT/CN2019/075322
Authority: WO
Inventors: 许浩; 周军蕊
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2018-03-01
Filing date: 2019-02-18
Publication date: 2019-09-06
Also published as: CN110309138A; CN110309138B

Abstract

Provided are a data merging method, an FPGA-based merger, and a database system. The FPGA-based merger (30) is applied to a database system (100), and is responsible for carrying out merging processing on data records in the database system (100), thereby reducing the rate to which a data merging operation occupies CPU resources in the database system (100), reducing the impact on the write-in and query performances of the database system (100), improving the overall capability of the database system (100), and eliminating the problem of performance jitter.

Description

Data merge method, FPGA-based combiner and database system

The present application claims priority to Chinese Patent Application No. 20110117 245 6.4, filed on March 1, 2018, entitled "Data merging method, FPGA-based merging and database system", the entire contents of which are incorporated herein by reference. in.

Technical field

The present application relates to the field of database technologies, and in particular, to a data merging method, an FPGA-based merging device, and a database system.

Background technique

With the rise of the Internet and big data applications, non-relational databases (Not Only SQL, NoSQL) have developed rapidly. In non-relational databases, there are some key-value pair (KV)-based databases based on log delta storage, such as LevelDB and RocksDB based on LevelDB evolution.

In LevelDB or RocksDB, most of the KV records are stored on disk using tiered storage, which reduces memory resource consumption and enables persistent storage. However, reading KV records needs to be searched in order according to the freshness of KV records in the data files of each level in the memory and disk, which is more complicated and slower to find.

In order to speed up the reading of KV records, the prior art adopts a compaction method to sort and compress existing KV records, remove some invalid KV records, reduce the query complexity and reduce the query efficiency by reducing the number of files. However, the existing data merge process reduces the write and query performance of the database system.

Summary of the invention

Aspects of the present application provide a data merge method, a FPAG-based combiner, and a database system to reduce the impact of the data merge process on the write and query performance of the database system.

An embodiment of the present application provides an FPGA-based combiner, including: a control unit, a storage unit, and a merging unit;

The control unit is configured to load data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit according to a data merge instruction of the database system, and control the merged unit pair The data records of the Nth layer to the N+jth layer to be merged are subjected to a merge process; wherein N is a non-negative integer and j is a non-negative integer;

The merging unit is configured to perform a merging process on the data records of the Nth layer to the N+thth layer that need to be merged to obtain a new N+j layer data record and store the data record in the storage unit. And replacing, by the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the data record of the new N+j layer.

The embodiment of the present application further provides a data merging method, which is applicable to an FPGA-based combiner, and the method includes:

Loading, according to the data merge instruction of the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system into the storage unit of the FPGA-based combiner; wherein N is a non-negative integer, j is a non-negative integer;

Performing a merge process on the data records of the Nth layer to the N+jth layer that need to be merged to obtain a new data record of the N+jth layer and storing the data record in the storage unit for use by the database system The data record of the new N+j layer replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system.

The embodiment of the present application further provides a database system, including: a memory, a processor, and an FPGA-based combiner;

The memory is configured to store a computer program and at least two layers of data records in the database system;

The processor is coupled to the memory and the combiner for executing the computer program for:

Identifying an Nth layer to an N+jth layer data record to be merged from the at least two layers of data records; transmitting a data merge instruction to the FPGA based combiner to indicate the FPGA based combiner pair Data records of the Nth layer to the N+jth layer to be merged are subjected to a merge process; and a data record of the new N+jth layer output by the FPGA-based combiner is replaced with a block to be merged in the memory Data records from the Nth layer to the N+jth layer; wherein N is a non-negative integer and j is a non-negative integer;

The FPGA-based combiner is configured to receive the data merge instruction, and perform a merge process on the Nth layer to the N+jth layer data record that needs to be merged according to the data merge instruction to obtain the new The data of the N+jth layer is recorded and output to the processor.

In the embodiment of the present application, a merger is implemented based on the FPGA, and the FPGA-based combiner is applied to the database system, and is responsible for merging the data records of the Nth layer to the N+j layer that need to be merged in the database system. Processing, reducing the data mining operation on the CPU resources in the database system, reducing the impact on the database system write and query performance, improve the overall capacity of the database system, improve performance jitter.

DRAWINGS

The drawings described herein are intended to provide a further understanding of the present application, and are intended to be a part of this application. In the drawing:

FIG. 1 is a schematic structural diagram of a database system according to an exemplary embodiment of the present application;

1b is a schematic diagram of a hierarchical structure of data file formation on a disk according to an exemplary embodiment of the present disclosure;

1c is a schematic diagram of a hierarchical structure formed by a data file on a memory and a disk according to an exemplary embodiment of the present application;

1d is a schematic flowchart of a data merging method described from the perspective of an FPGA-based combiner according to an exemplary embodiment of the present application;

2a is a schematic structural diagram of an FPGA-based combiner according to another exemplary embodiment of the present application;

FIG. 2b is a schematic flowchart diagram of a data merging method based on the combiner shown in FIG. 2a according to another exemplary embodiment of the present application; FIG.

3a is a schematic structural diagram of another FPGA-based combiner provided by another exemplary embodiment of the present application;

FIG. 3b is a schematic flowchart diagram of a data merging method based on the combiner shown in FIG. 3a according to another exemplary embodiment of the present application; FIG.

4a is a schematic structural diagram of an FPGA-based combiner with an on-chip cache function according to still another exemplary embodiment of the present application;

FIG. 4b is a schematic flowchart diagram of a data merging method based on the merging device shown in FIG. 4a according to another exemplary embodiment of the present application; FIG.

5a is a schematic structural diagram of an FPGA-based combiner with a codec function according to still another exemplary embodiment of the present application;

FIG. 5b is a schematic diagram of a data record hierarchy provided by another exemplary embodiment of the present application; FIG.

6a is a schematic structural diagram of an FPGA-based combiner with a compression function according to still another exemplary embodiment of the present application;

FIG. 6b is an implementation structure of a database system LevelDB or RocksDB according to another exemplary embodiment of the present application.

Detailed ways

The technical solutions of the present application will be clearly and completely described in the following with reference to the specific embodiments of the present application and the corresponding drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

In a database system with tiered storage, data merge can be used to remove some invalid data records, which is beneficial to improve query efficiency, but will reduce the write and query performance of the database system. For the technical problem, the embodiment of the present application provides a solution, and the main idea is: implementing a combiner based on the FPGA, applying the FPGA-based combiner to the database system, and the merger needs to merge in the database system The data records of adjacent layers are merged to reduce the occupancy rate of the CPU resources of the database system by the data merge operation, reduce the impact on the write and query performance of the database system, improve the overall capacity of the database system, and improve the performance jitter problem. .

It should be noted that, in order to simplify the description, the FPGA-based combiner will be referred to as a combiner in some descriptions of the following embodiments of the present application, and those skilled in the art can understand that the "combiner" in the embodiments of the present application "The same concept as "FPGA-based merger".

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

FIG. 1 is a schematic structural diagram of a database system according to an exemplary embodiment of the present application. As shown in FIG. 1a, the database system 100 includes a memory 10, an FPGA-based combiner 20, and a processor 30. The memory 10 is coupled to the processor 30 and the FPGA-based combiner 20, respectively.

The memory 10 is mainly used as a storage space of the database system 100, and may include at least one storage medium, and at least one storage medium may be the same type of storage medium or a different type of storage medium. For example, the memory 10 may include a volatile storage medium such as a RAM, and may also include a non-volatile storage medium such as a Read-Only Memory (ROM), a flash memory, or the like. As shown in FIG. 1a, the memory 10 mainly includes a memory and a magnetic disk. The memory is generally implemented by a volatile storage medium, and the magnetic disk is generally implemented by a non-volatile storage medium.

The memory 10 can store various data associated with the database system 100, such as data records stored by the database system 100, an operating system (OS) of the database system 100, various computer programs running on the database system 100, program data, and the like.

In the database system 100, data records are stored in the memory 10 in a hierarchical storage manner. There are at least two layers of data records in the memory 10. Optionally, at least two levels of data files may be included in the memory 10, and each level may include at least one data file, and each data file stores some or all of the data records of the level to which the file belongs.

In the database system 100, when writing a data record, the data record is first written into a log file on the disk; when the log file is successfully written, the data record is written into the memory; After the space occupancy reaches a certain limit, the data records in the memory are exported to a new data file on the disk. The data file on the disk is a hierarchical structure. For example, the first layer (the layer closest to the memory) is Level_0, the second layer is Level_1, and so on, and the level is gradually increased.

In the application scenario where the tiered storage mode is used, the data records need to be searched in order according to the freshness of the data records in the memory and the data files on the disk. The search speed is relatively slow and the search speed is slow. In order to speed up the reading of the data record, the existing data record can be collated and compressed by using a compaction method to remove some invalid data records, thereby reducing the number of data records and reducing the number of data files of each level. Reduce query complexity and improve query efficiency.

In this embodiment, a memory program associated with the data merge process is also stored in the memory 10, and the processor 30 executes the computer program to implement a new data merge scheme in conjunction with the FPGA-based combiner 20.

In this embodiment, the processor 30 executes a computer program related to the data merge process stored in the memory 10, and can identify the Nth layer to the N+jth layer data record to be merged from at least two layers of data records. The FPGA-based combiner 20 sends a data merge instruction to instruct the FPGA-based combiner 20 to merge the data records of the Nth to N+thth layers that need to be merged; and the FPGA-based combiner 20 pairs need to be merged After the data records of the Nth layer to the N+jth layer are merged and the data record of the new N+jth layer is output, the new N+j layer data output by the FPGA-based combiner 20 can also be utilized. The data records of the Nth layer to the N+jth layer that need to be merged in the memory 10 are recorded. Where N is a non-negative integer, for example, may be 0, 1, 2, 3, etc.; j is a non-negative integer, for example, may be 0, 1, 2, 3, and the like.

In some application scenarios, the data records on the disks included in the memory 10 may be merged. The at least two layers of data records described in the embodiments of the present application mainly include data records of the layers stored on the disk. As shown in FIG. 1b, the data file a is included in the memory, and the data file level_0, the data file level_1, the data file level_2, ..., the data file level_n are stored in the hierarchical structure on the disk, and only the data records on the disk are needed. In the merged scene, the data file level_0 can be regarded as the 0th layer, the data file level_1 is regarded as the first layer, the data file level_2 layer is regarded as the second layer, and so on, and the data file level_n is regarded as the nth layer, that is, The at least two layers of data records described in the embodiments of the present application mainly include data files level_0, data file level_1, data files level_2, ..., and data records stored in the data file level_n; in other words, the processor 30 only needs to The data file level_0, the data file level_1, the data file level_2, ..., and the data file level_n may identify the data records of the Nth layer to the N+jth layer to be merged. Where N + j ≤ n, n is a non-negative integer.

In other application scenarios, it is considered that the data records in the memory need to be accumulated to a certain amount before being exported to the data files on the disk, and overlapping data records may occur during the accumulation of the data records, so The data records in the memory are merged. In this case, not only the data processing on the disk included in the memory 10 needs to be merged, but also the data records on the memory included in the memory 10 need to be merged. The two-tier data record mainly includes the data records of each layer stored in the memory and on the disk. Among them, the data record stored in the memory forms a hierarchical structure with the data records of each layer stored on the disk. As shown in FIG. 1c, the memory includes a data file a, and the data file level_0, the data file level_1, the data file level_2, ..., the data file level_n are stored in the hierarchical structure on the disk, and the data file a is regarded as the 0th layer, and the data is The file level_0 is treated as the first layer, the data file level_1 is regarded as the second layer, the data file level_2 layer is regarded as the third layer, and so on, and the data file level_n is regarded as the n+1th layer, so that the data record stored in the memory is A low-to-high hierarchical structure is formed between the data records of the layers stored on the disk. In the scenario that the data processing on the memory and the disk is required to be combined, the at least two data records described in the embodiment of the present application mainly include the data file a, the data file level_0, the data file level_1, the data file level_2, ... ..., and the data record stored in the data file level_n; in other words, the processor 30 needs to identify the Nth to be merged from the data file a, the data file level_0, the data file level_1, the data file level_2, ..., and the data file level_n. Data record from layer to N+j layer.

In general, the value of j is a positive integer such as 1, 2, and 3. However, in some application scenarios, the value of j can also be zero. For example, for the case where there may be overlapping data records on the same level, the value of j is 0, which means that the data records of the same layer can be merged. For example, when a data record in memory needs to be merged, j=0. For another example, the data record at the level level_0 on the disk is directly from the memory, and there may be overlap. Therefore, the data records at the level level_0 need to be merged, and j=0. In addition, the embodiment of the present application does not limit the correspondence between the value of N and the level. For example, the layer 0 may be represented by N=0, and so on, or the layer 0 may be represented by N=1. And so on, or N = 10 can also be used to represent the 0th layer, and so on.

In this embodiment, the processor 30 may be triggered by different events or conditions to identify the Nth to Nth jth layer data records to be merged from the at least two layers of data records.

Example 1: A data merge period can be set, and each time the data merge period arrives, the processor 30 can be triggered to identify the Nth layer to the N+jth layer data record to be merged from at least two layers of data records.

Alternatively, in Example 1, the Nth to Nth jth layer data records that need to be merged may be data records of all the layers. Alternatively, in Example 1, it can be further determined by other conditions which data records of which layers are required to be merged. For example, when the data merge period arrives, the data records of the adjacent layers whose number of data records reaches the upper limit value may be identified as the Nth layer to be merged according to whether the number of data records of each layer reaches a preset upper limit value. To the N+j layer data record.

Example 2: When the number of data records of a certain level reaches a set upper limit, the processor 30 can be triggered to identify the hierarchical data record and the data of several upper levels adjacent to the hierarchy from at least two layers of data records. Record as the Nth to N+jth layer data records that need to be merged.

Example 3: In the scenario where the memory 10 includes a memory and a disk, the data record is continuously written into the memory, and when the memory space occupancy reaches a certain limit, the processor 30 can be triggered to record the data in the memory and the first on the disk. Hierarchical data records are recorded as the Nth to Nth jth layer data records that need to be merged.

Optionally, the processor 30 may carry the identifier information related to the data records of the Nth layer to the N+jth layer in the data merge instruction, for example, may be a layer identifier and/or an identifier of the data file where the data record is located, etc., to facilitate The FPGA-based combiner 20 can learn from the data merge instruction that the Nth layer to the N+thth layer data record needs to be merged.

The FPGA-based combiner 20 is further connected to the processor 30, and is configured to receive a data merge instruction sent by the processor 30, and perform, according to the data merge instruction, a combination processing of the Nth layer to the N+jth layer data record that needs to be merged, Obtaining a new N+j layer data record and outputting it to the processor 30 for the processor 30 to replace the Nth layer to the N+j in the memory 10 that need to be merged with the new N+j layer data record. Layer data record. The data records of the Nth layer to the N+jth layer are replaced with the data records of the new N+jth layer, and the data record is merged.

Optionally, after identifying the Nth to Nth jth layer data records that need to be merged, the processor 30 may load the Nth to Nth jth layer data records that need to be merged into the memory included in the memory 10. In order to facilitate the FPGA-based combiner 20 to directly read the Nth layer to the N+jth layer data record to be merged from the memory of the database system 100, the efficiency of the FPGA-based combiner 20 to read the data record is improved, and further Improve the overall efficiency of the data consolidation process. Alternatively, the FPGA-based combiner 20 can be mounted in the database system 100 in the form of a PCIE board, and the FPGA-based combiner 20 can read the Nth that needs to be merged from the memory of the database system 100 through the PCIE channel. Layer to N+j layer data record.

In this embodiment, the FPGA-based combiner 20 is added to the database system 100, and the data merge process is mainly performed by the FPGA-based combiner 20, which can save the computing resources of the processor 30 and reduce the processing load of the processor 30, so that The processor 30 can focus more on the writing and querying of data records, realize the separation of data storage (writing and query) and data merging, thereby reducing the impact of data merging operations on data writing and query performance, and improving the database system 100. Overall ability to improve performance jitter issues. In addition, the embodiment can fully utilize the resource advantages of the FPGA, and can merge data records of two adjacent layers or even two layers (j≥2) without affecting data writing and query performance. The data merge process is more flexible, and the merge efficiency is higher, which is not limited by the application scenario.

Based on the database system 100 shown in FIG. 1a, an exemplary embodiment of the present application further provides a data merging method. The method is primarily described from the perspective of the FPGA-based combiner 20, as shown in Figure 1d, the method comprising:

101. Load, according to the data merge instruction of the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system into the storage unit of the FPGA-based combiner; wherein N is a non-negative integer, and j is a non- Negative integer.

102. Perform a combination process on the data records of the Nth layer to the N+th layer that need to be merged to obtain a new data record of the N+jth layer and store the data in the storage unit for the database system to utilize the new The data record of the N+j layer replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system.

In this embodiment, the data merge instructions may be generated and transmitted by a processor in the database system. In this embodiment, the FPGA-based combiner includes a storage unit for storing data records of the Nth layer to the N+jth layer that need to be merged loaded from the database system.

Optionally, if the processor in the database system identifies the data records of the Nth layer to the N+th layer that need to be merged, the data records of the Nth layer to the N+th layer that need to be merged are loaded into the database system. In memory, the FPGA-based combiner can directly read the Nth to Nth jth data records that need to be merged from the memory of the database system and store them in the storage unit of the FPGA-based combiner.

Optionally, the storage unit of the FPGA-based combiner may be double-rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR RAM), but is not limited thereto.

In this embodiment, the FPGA-based combiner loads the data records of the Nth layer to the N+jth layer that need to be merged into its own storage unit according to the data merge instruction of the database system, and then targets the storage unit. The data records of the N layer to the N+j layer are data merged to free the processor from the data merge operation, which can save the computing resources of the processor, reduce the processing load of the processor, and enable the processor to focus more on the data record. The write and query realize the separation of data storage (write and query) and data merge, which reduces the impact of data merge operation on data write and query performance, improves the overall capacity of the database system, and improves performance jitter. In addition, the embodiment can fully utilize the resource advantages of the FPGA, and can merge data records of two adjacent layers or even two layers (j≥2) without affecting data writing and query performance. The data merge process is more flexible, and the merge efficiency is higher, which is not limited by the application scenario.

In the embodiment of the present application, the FPGA-based combiner may have multiple implementation structures, and accordingly, the process of combining the Nth to N+jth data records by the combiner having different implementation structures may also be different. The embodiment of the present application does not limit the internal implementation structure of the FPGA-based combiner. Any combiner structure that can be implemented by the FPGA and can perform the data merge method shown in FIG. 1d is applicable to the embodiment of the present application. The following embodiments of the present application provide an internal implementation structure of several FPGA-based combiners, and a detailed description of the data merge process of a combiner having different internal implementation structures.

2a is a schematic structural diagram of an FPGA-based combiner according to another exemplary embodiment of the present application. The FPGA-based combiner provided in this embodiment can be applied to a database system and cooperates with a processor in a database system to implement a new data merge logic. As shown in FIG. 2a, the FPGA-based combiner mainly includes a storage unit 21, a control unit 22, and a merging unit 23.

The storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like. As shown in FIG. 2a, both the control unit 22 and the merging unit 23 can access the storage unit 21. Alternatively, storage unit 21 may include on-chip memory implemented internal to the combiner, and/or off-chip memory implemented external to the combiner. In the illustration of the embodiments of the present application, the storage unit 21 is located outside the combiner as an example, but is not limited thereto.

The control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner. The control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction. Storage unit 21. In addition to this, the control unit 22 can also control the merging unit 23 to perform merging processing on the data records of the Nth layer to the N+jth layer that need to be merged. Where N is a non-negative integer, for example, may be 0, 1, 2, 3, etc.; j is a non-negative integer, for example, may be 1, 2, 3, or the like. In general, the value of j is a positive integer such as 1, 2, and 3. However, in some application scenarios, the value of j can also be zero. For example, for the case where there may be overlapping data records on the same level, the value of j is 0, which means that the data records of the same layer can be merged.

The merging unit 23 is a functional module in the FPGA-based combiner, and under the control of the control unit 22, accesses the data records of the Nth layer to the N+jth layer stored in the storage unit 21, and the Nth layer to the The data records of the N+j layer are merged to obtain a new N+j layer data record and stored in the storage unit 21 for the database system to replace the database system with the new N+j layer data record. The merged Nth to N+thth layer data records are used to achieve the purpose of data merge.

The merging unit 23 combines the data records of the Nth layer to the N+jth layer mainly by comparing the data records of the Nth layer to the N+jth layer, and removing duplicate or invalid data records, thereby obtaining the The process of being retained for data logging. Depending on the application scenario or service requirements, the merging unit 23 compares the Nth to Nth jth layer data records, and the process of removing duplicate or invalid data records may be different.

Based on the internal implementation structure of the FPGA-based combiner shown in FIG. 2a, another exemplary embodiment of the present application further provides a data merging method that describes the operation principle of the FPGA-based combiner shown in FIG. 2a. As shown in Figure 2b, the method includes:

20a. The control unit receives a data merge instruction from a database system, which is a database system in which the FPGA-based combiner is located.

21a. The control unit loads, according to the data merge instruction, the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner.

22a. The control unit controls the merging unit to combine the data records of the Nth layer to the N+jth layer stored in the storage unit to obtain a new data record of the N+jth layer and store the data record in the storage unit for The database system replaces the Nth to Nth jth layer data records that need to be merged in the database system with the new N+j layer data record.

In this embodiment, the control unit receives a data merge instruction from the database system, and loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the FPGA-based combiner according to the data merge instruction. The storage unit then controls the merging unit to merge the data records of the Nth layer to the N+jth layer stored in the storage unit.

This embodiment does not limit the control logic that the control unit controls the merging unit to perform the merging processing of the Nth layer to the N+jth layer data records stored in the storage unit.

For example, a processing cycle may be set according to the capabilities of the merging unit, such as the number of data records that the merging unit can process at most for each merging process and the time required to complete a merging process. The control unit may periodically control the merging unit to read the partial data record in the storage unit according to the processing cycle (less than or equal to the maximum number of data records that can be processed by the merging unit each time the merging process) or periodically read from the storage unit. Take a partial data record (less than or equal to the maximum number of data records that the merging unit can process at each merge process) and feed it into the merging unit. For the merging unit, under the control of the control unit, a partial data record may be periodically read into the storage unit or a partial data record periodically sent by the control unit may be received, and based on some attribute information of the data record, The partial data records read by the control unit or sent by the control unit are merged until all the data records stored in the storage unit are merged.

For another example, the merging unit sends a merge merge completion notification message to the control unit after completing the merging process of the current data record. The control unit may read the data in the storage unit when receiving the merge completion notification message sent by the merging unit. Record (less than or equal to the number of data records that the merging unit can process at most for each merging process) and provide it to the merging unit or control the merging unit to read the data record again in the storage unit (less than or equal to the merging unit can process at most each merging process) Number of data records). For the merging unit, under the control of the control unit, each part of the data record is read from the storage unit or a part of the data record sent by the control unit is received, and based on some attribute information of the data record, for each read The partial data records sent by the control unit or the control unit are merged until all the data records stored in the storage unit are merged.

In this embodiment, the data records of the Nth layer to the N+jth layer that need to be merged in the database system are loaded into the storage unit of the combiner by the control unit in the combiner, and the merge in the combiner is controlled. The unit accesses the logic of the storage unit, thereby controlling the merging unit to perform merging processing on the data records of the Nth layer to the N+thth layer stored in the storage unit, and outputting the merged processing as the data record of the new N+j layer.

FIG. 3 is a schematic structural diagram of another FPGA-based combiner provided by another exemplary embodiment of the present application. The FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic. As shown in FIG. 3a, the FPGA-based combiner includes a storage unit 21, a control unit 22, a merging unit 23, and a transmission unit 24.

The storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like. As can be seen from FIG. 3a, the control unit 22 and the transmission unit 24 can directly access the storage unit 21, and the merging unit 23 no longer directly accesses the storage unit 21, but accesses the storage unit 21 through the transmission unit 24.

The control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner. The control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction. Storage unit 21. In addition, the control unit 22 can also control the transmission unit 24 to transfer the data records of the Nth layer to the N+jth layer that need to be merged stored in the storage unit 21 to the merging unit 23, thereby achieving the control merge unit 23 to merge. The data records of the Nth layer to the N+jth layer are subjected to the purpose of the merge processing. Where N and j are non-negative integers. For the values of N and j, refer to the description of the foregoing embodiment, and details are not described herein again.

The transmission unit 24 is a data channel in the FPGA-based combiner, and is mainly responsible for the data transfer logic inside the combiner. For example, the transmission unit 24 can read the data records of the Nth layer to the N+jth layer to be merged from the storage unit 21 under the control of the control unit 22, and transfer the needs stored in the storage unit 21 to the merging unit 23. Data records of the merged Nth to N+jth layers.

The merging unit 23 is a functional module in the FPGA-based combiner, and can receive the data records of the Nth layer to the N+jth layer transmitted by the transmission unit 24, and perform data records of the Nth layer to the N+jth layer. Merging processing to obtain a new N+j layer data record and storing it in the storage unit 21 for the database system to replace the Nth layer to the Nth in the database system that need to be merged with the new N+j layer data record +j layer data records to achieve the purpose of data consolidation.

Based on the internal implementation structure of the FPGA-based combiner shown in Figure 3a, yet another exemplary embodiment of the present application also provides a data merging method that describes the operation of the FPGA-based combiner shown in Figure 3a. As shown in Figure 3b, the method includes:

30a. The control unit receives a data merge instruction from a database system, which is a database system in which the FPGA-based combiner is located.

31a. The control unit loads, according to the data merge instruction, the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner.

32a. The control unit controls the transmission unit to read the data records of the Nth layer to the N+jth layer to be merged from the storage unit and transmit the data records to the merging unit.

33a. The merging unit receives the data records of the Nth layer to the N+jth layer transmitted by the transmission unit, and combines the data records of the Nth layer to the N+jth layer to obtain a new N+j layer. The data is recorded and stored in the storage unit for the database system to replace the Nth to Nth jth layer data records in the database system that need to be merged with the new N+j layer data record.

In this embodiment, the control unit receives a data merge instruction from the database system, and loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the FPGA-based combiner according to the data merge instruction. The storage unit then controls the transfer unit to transfer the data records of the Nth layer to the N+jth layer stored in the storage unit to the merging unit, so that the merging unit can merge the data records of the Nth layer to the N+jth layer .

The embodiment does not limit the control logic that the control unit controls the transmission unit to transmit the Nth layer to the N+jth layer data record to the merging unit.

For example, a processing cycle may be set according to the capabilities of the merging unit, such as the number of data records that the merging unit can process at most for each merging process and the time required to complete a merging process. The control unit may periodically control the transmission unit to read the partial data record in the storage unit according to the processing cycle (less than or equal to the maximum number of data records that can be processed by the merging unit per merge process) and read the partial data record. Transfer to the merging unit for the merging unit to merge for this part of the data record.

For another example, the merge processing unit sends a merge merge completion notification message to the control unit each time the merge processing of the current data record is completed; the control unit may control the transfer unit to the storage unit again upon receiving the merge completion notification message sent by the merge unit. Reading a partial data record (less than or equal to the number of data records that the merging unit can process at most for each merging process) and transmitting the read partial data record to the merging unit for the merging unit to merge for the part of the data record .

In this embodiment, a transmission unit dedicated to data transmission is added, and the transmission unit is responsible for reading the data record from the storage unit and providing the data record to the merging unit, which can simplify the function of the merging unit, so that the merging unit can be more focused on Data merging can simplify the control logic of the control unit. While completing the data merging, it can simplify the implementation logic of the FPGA-based combiner and improve the efficiency of data merging by the FPGA-based combiner.

4a is a schematic structural diagram of an FPGA-based combiner with an on-chip cache function according to still another exemplary embodiment of the present application. The FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic. As shown in FIG. 4a, the FPGA-based combiner includes a storage unit 21, a control unit 22, a merging unit 23, a transmission unit 24, and at least one input buffer 25.

The storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like.

The input buffer 25 is an input buffer of the FPGA-based combiner, and the data records of the Nth layer to the N+jth layer stored in the storage unit 21 can be buffered under the control of the control unit 22. As shown in FIG. 4a, the input buffer 25 may be one or more, and the control unit 22 and the transfer unit 24 may directly access the input buffer 25, and the merging unit 23 may access the input buffer 25 through the transfer unit 24.

The control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner. The control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction. Storage unit 21. In addition, the control unit 22 may also buffer the data records of the Nth layer to the N+jth layer stored in the storage unit 21 into at least one input buffer 25, and control the transmission unit 24 to input at least one input buffer. The data record in 25 is transferred to the merging unit 23, thereby achieving the purpose of combining the data records of the Nth layer to the N+jth layer by the control merging unit 23. For example, N, j is a non-negative integer. For the value of N, j, refer to the description of the foregoing embodiment, and details are not described herein again.

For any of the input buffers 25, the control unit 22 can determine whether to cache new data records into the input buffer 25 based on whether there is available space in the input buffer 25. For example, after the transfer unit 24 transfers the data record in the input buffer 25 to the merging unit 23, the control unit 22 can read the new data record from the segment storage unit 21 and cache it into the input buffer 25.

Optionally, the database system (mainly the processor in the database system) can know the number of input buffers 25 included in the combiner through the API interface of the combiner. In order to simplify the control logic of the combiner, the database system may pre-divide the data records of the Nth layer to the N+jth layer to be merged into at least one input buffer according to the number of input buffers 25 included in the combiner. At least one data record group corresponding to the area 25, and the related information of the data record group that should be buffered by each input buffer 25 is carried in the data merge instruction to the control unit 22. The information about the data record group may be different according to the merge requirement, for example, the identifier of the data record in the data record group, the offset address, the snapshot version number, and the like. Each data record set includes at least one data record.

Based on the above, the control unit 22 may buffer the data records in the at least one data record group in the storage unit 21 into the corresponding input buffers 25, respectively, and control the transmission unit 24 to transfer the data records in the at least one input buffer 25. The merging unit 23 is given, thereby achieving the purpose of combining the data records of the Nth layer to the N+jth layer by the control merging unit 23.

The transmission unit 24 is a data channel in the FPGA-based combiner, and is mainly responsible for the data transfer logic inside the combiner. For example, the transmission unit 24 may transmit the data record in the at least one input buffer 25 to the merging unit 23 under the control of the control unit 22. For example, the transmission unit 24 can read one data record from each of the at least one input buffer 25 and transmit it to the merging unit 23 each time under the control of the control unit 22.

The merging unit 23 is a functional module in the FPGA-based combiner, and can receive the data records transmitted by the transport unit 24, and combine these data records to obtain a new N+j layer data record and store it in the storage. In the unit 21, the database system replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the new N+j layer data record, thereby achieving the purpose of data merge.

Based on the internal implementation structure of the FPGA-based combiner shown in FIG. 4a, yet another exemplary embodiment of the present application further provides a data merging method that describes the operation principle of the FPGA-based combiner shown in FIG. 4a. As shown in Figure 4b, the method includes:

40a. The control unit receives a data merge instruction from a database system, which is a database system in which the FPGA-based combiner is located.

41a. The control unit loads, according to the data merge instruction, the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner, the Nth layer to the N+j The data record of the layer includes at least one data record group corresponding to at least one input buffer.

42a. The control unit reads the data records in the at least one data record group from the storage unit according to the correspondence between the data record group and the input buffer, and caches the data records in the corresponding input buffer.

43a. The control unit controls the transmission unit to read the data record from the at least one input buffer and transmit the data record to the merging unit.

44a. The merging unit receives the data record transmitted by the transmission unit, and combines the data records transmitted by the transmission unit each time to obtain a new data record of the N+jth layer and stores the data record in the storage unit for the database. The system replaces the Nth to Nth jth layer data records that need to be merged in the database system with the new N+j layer data record.

In this embodiment, the control unit receives a data merge instruction from the database system, and loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the FPGA-based combiner according to the data merge instruction. The storage unit then reads the data records in each data record group from the storage unit and caches them in the corresponding input buffer. On the other hand, the control transfer unit reads the data records from the input buffers and transmits them to the storage unit. The merging unit enables the merging unit to merge the data records of the Nth layer to the N+jth layer.

Wherein, considering that the size of the input buffer has a certain limit, the control unit may read the new data record buffer from the corresponding data record group in the storage unit to the corresponding input according to whether there is available space in the input buffer. Inside the buffer.

Among them, considering that the merging unit can also process a certain number of data records each time, the transmission unit can read the data records from the input buffer in batches and transmit them to the merging unit. For example, the transmission unit reads one data record from each input buffer and transmits it to the merging unit each time. As another example, the transmission unit reads several data records from one input buffer at a time and transmits them to the merging unit. As another example, the transmission unit reads a number of data records from a portion of the input buffer and transmits them to the merging unit each time. Regardless of the transmission mode, the number of data records transmitted by the transmission unit to the merging unit each time is less than or equal to the maximum number of data records that the merging unit can process at a time.

In the present embodiment, the logic for the control unit to buffer the data record in the input buffer and the logic for the transfer unit to read the data record from the input buffer are not limited. These two logics can be independent of each other and can also work together.

In an exemplary embodiment, the control unit may monitor whether the data record in the input buffer has been completely transmitted by the transmission unit to the merging unit; when the data records in the input buffer are all transmitted to the merging unit by the transmission unit, the control unit The new data record is read from the corresponding data record group in the storage unit and cached into the input buffer.

In this embodiment, an input buffer is added inside the combiner for buffering the data record in the storage unit, so that the transmission unit can directly read the data record from the input buffer, which is beneficial to improving the transmission unit to read the data record. Efficiency, which in turn increases data transfer efficiency, helps to further improve the overall efficiency of the data consolidation process.

FIG. 5 is a schematic structural diagram of an FPGA-based combiner with a codec function according to still another exemplary embodiment of the present application. The FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic. As shown in FIG. 5a, the FPGA-based combiner includes: a storage unit 21, a control unit 22, a merging unit 23, a transmission unit 24, at least one input buffer 25, at least one decoding unit 26, at least one decoding buffer 27, Encoding unit 28 and encoding buffer 29. There is a one-to-one correspondence between the decoding unit 26, the input buffer 25 and the decoding buffer 27, that is, each decoding unit 26 is responsible for decoding the data record buffered by one input buffer 25 and outputting the decoding result to the corresponding decoding buffer. In area 27.

The input buffer 25 is an on-chip buffer of the FPGA-based combiner, and the data record stored in the storage unit 21 can be cached under the control of the control unit 22. As shown in FIG. 5a, the input buffer 25 can be one or more, and the control unit 22 and the decoding unit 26 can directly access the input buffer 25, and the transmission unit 24 can directly access the decoding buffer 27.

The control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner. The control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction. Storage unit 21. In addition, the control unit 22 can also buffer the data records in the storage unit 21 into at least one input buffer 25, and control the decoding unit 26 to decode the data records buffered in the corresponding input buffer 25 and output the decoding result. To the corresponding decoding buffer 27. In addition, the control unit 22 can also control the transmission unit 24 to transmit the decoding result in the at least one decoding buffer 27 to the merging unit 23, thereby achieving the control merging unit 23 merging the data records of the Nth layer to the N+jth layer. the goal of. For example, N, j is a non-negative integer. For the value of N, j, refer to the description of the foregoing embodiment, and details are not described herein again.

The decoding unit 26 is a functional module in the FPGA-based combiner, and mainly performs decoding processing on the data record in the corresponding input buffer 25 under the control of the control unit 22, and outputs the decoding result to the corresponding decoding buffer 27 Inside.

The transmission unit 24 is a data channel in the FPGA-based combiner, and is mainly responsible for the data transfer logic inside the combiner. For example, the transmission unit 24 may, under the control of the control unit 22, read the new decoding result from the at least one decoding buffer and transmit it to the merging unit 23 for the merging unit 23, each time the merging unit 23 completes the current merging process. The new decoding result is subjected to the merging process, and when the result of the current merging process needs to retain the decoding result, the decoding result to be retained is stored as the data to be encoded in the encoding buffer 29.

The merging unit 23 is a functional module in the FPGA-based combiner, and can receive the decoding result transmitted by the transmission unit 24, and combine the decoding results. In addition, the merging unit 23 also feeds back the merge processing result to the control unit 22, for example, whether the current merging process is completed and whether there is a decoding result or the like that needs to be reserved, so that the control unit 22 can perform the splicing processing result fed back by the merging unit 23 on the transmission unit. Control accordingly.

The coding unit 28 is a functional module in the FPGA-based combiner, and corresponds to the decoding unit 26, and is mainly used for encoding the data to be encoded in the code buffer 29 under the control of the control unit 22 to obtain a new one. The data of the N+jth layer is recorded and stored in the storage unit 21, so that the database system replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the new N+j layer data record, thereby Achieve the purpose of data consolidation.

In an exemplary embodiment, the database system can know the number of input buffers 25 included in the combiner through the API interface of the combiner. In order to simplify the control logic of the combiner, the database system may pre-divide the data records of the Nth layer to the N+jth layer to be merged into at least one input buffer according to the number of input buffers 25 included in the combiner. At least one data record group corresponding to the area 25, and the related information of the data record group that should be buffered by each input buffer 25 is carried in the data merge instruction to the control unit 22. The information about the data record group may be different according to the merge requirement, for example, the identifier of the data record in the data record group, the offset address, the snapshot version number, and the like. Each data record set includes at least one data record.

Based on the above, one of the working principles of the FPGA-based combiner shown in Figure 5a is as follows:

The control unit receives data merge instructions from the database system, which is the database system in which the FPGA-based combiner resides. The control unit loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner according to the data merge instruction. In this embodiment, the data records of the Nth layer to the N+jth layer include at least one data record group corresponding to at least one input buffer. Based on this, the control unit reads the data records in the at least one data record group from the storage unit and caches them in the corresponding input buffer according to the corresponding relationship between the data record group and the input buffer. On the other hand, the control unit controls the decoding unit to perform a decoding operation on the data record in the corresponding input buffer according to the correspondence between the decoding unit and the input buffer, and outputs the decoding result to the corresponding decoding buffer.

In addition, the control unit further controls the transmission unit to read the decoding result from the at least one input buffer and transmit the result to the merging unit for the merging unit to perform the merging process. The merging unit may return the merge processing result to the control unit when the merging process is completed. Based on this, the control unit may know whether the merging unit completes the current merging process, and when determining that the merging unit completes the current merging process, the control transmitting unit reads the decoding result from the at least one input buffer and transmits the decoding result to the merging unit for merging unit The merging process is continued, and when the current merging process result needs to retain the decoding result, the control transmission unit stores the decoding result that needs to be retained as the data to be encoded into the encoding buffer. Optionally, the transmission unit may, under the control of the control unit, read the new decoding result from the at least one decoding buffer and transmit the result to the merging unit after the merging unit completes the current merging process.

In addition, the control unit further controls the encoding unit to encode the data to be encoded in the encoding buffer to obtain a new data record of the N+jth layer and store it in the storage unit for the database system to utilize the new Nth The +j layer data record replaces the Nth to Nth jth layer data records that need to be merged in the database system.

In this embodiment, the control unit needs to control the decoding unit, the transmission unit, the merging unit, and the encoding unit to perform corresponding operations, thereby completing the merging process of the data records of the Nth layer to the N+jth layer. The present embodiment does not limit the control logic that the control unit controls the decoding unit, the transmission unit, the merging unit, and the encoding unit to perform the corresponding operations. These control logics can be independent of each other or can cooperate with each other.

In an exemplary embodiment, the data records that need to be processed are divided into multiple levels, as shown in Figure 5b. Each data record set includes at least one data record block, each data record block including at least one data record interval, each data record interval including at least one data record. In the exemplary embodiment, the control unit buffers one data record block to the input buffer each time in units of data recording blocks; for the decoding unit, it can be in units of data recording intervals, each time from the corresponding input A data record interval is read in the buffer for decoding processing.

Taking the input buffer corresponding to the first data record group as an example, after monitoring that the last data record interval in the input buffer corresponding to the first data record group is sent to the corresponding decoding unit, the control unit may A new data record block is read in the data record group and buffered into the corresponding input buffer. Correspondingly, after the decoding unit completes the decoding process, the control unit may read from the input buffer corresponding to the first data record group according to the offset of the data block interval (the interval offset shown in FIG. 5b). A new data recording interval is sent to the corresponding decoding unit. The first data record group is any one of the at least one data record group.

Further, as shown in FIG. 5b, the encoded data record may include a field of a keyword prefix length, a keyword suffix length, a keyword suffix, a data value length, and a data value. Based on this, for any data record, the decoding unit can decode the keyword prefix length, the keyword suffix length, the keyword suffix, the data value length, and the data value of the data record from the data record; The keyword prefix length, the keyword suffix length, the keyword suffix, and the keyword of the previous keyword are spliced out to obtain a decoding result, where the decoding result includes: a length of the keyword of the data record, and a data value Length, keywords and data values for this data record. Wherein, if the decoding result needs to be retained, the data to be encoded is stored in the encoding buffer. Based on this, the coding unit can encode the key length, the data value length, the keyword, and the data value in the data to be encoded (ie, the decoding result to be retained) using the character stream to obtain a new data record of the N+jth layer.

In this embodiment, the decoding unit and the encoding unit are added, and the combined processing of the encoded data records can be supported, and the encoding operation can reduce the data amount of the data recording, which is beneficial to saving storage resources such as memory and disk.

FIG. 6a is a schematic structural diagram of an FPGA-based combiner with a compression function according to still another exemplary embodiment of the present application. The FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic. As shown in FIG. 6a, the FPGA-based combiner includes: a storage unit 21, a control unit 22, a merging unit 23, a transmission unit 24, at least one input buffer 25, at least one decoding unit 26, at least one decoding buffer 27, The encoding unit 28, the encoding buffer 29, the output buffer 201, and the compression unit 202. There is a one-to-one correspondence between the decoding unit 26, the input buffer 25 and the decoding buffer 27, that is, each decoding unit 26 is responsible for decoding the data record buffered by one input buffer 25 and outputting the decoding result to the corresponding decoding buffer. In area 27.

The difference between the embodiment shown in Fig. 6a and the embodiment shown in Fig. 5a is that the output buffer 201 and the compression unit 202 are added. The output buffer 201 is mainly used to buffer the encoded data record output by the encoding unit 28, that is, the new N+j layer data record. After the encoded data records output by the encoding unit 28 are accumulated to a certain number, the control unit 22 can control the compression unit 202 to compress the encoded data records and output the compression results to the storage unit 21. For the compression unit 202, the data record of the new N+jth layer in the output buffer 201 can be compressed under the control of the control unit 22, and the compression result is output to the storage unit 21. The compression of the encoded data record by the compression unit 202 can reduce the occupation of the storage resources of the storage unit 21 and reduce the bandwidth resources consumed by the data transmission between the processor and the combiner.

Optionally, in the foregoing embodiments, the input buffer, the decoding buffer, the encoding buffer, and the output buffer may be implemented by using dual-port RAM, and one end is sequentially written, and the other end is sequentially read to improve data reading and writing efficiency. Further, the decoding buffer and the encoding buffer may use a ring buffer (Ring Buffer). In one implementation, the input buffer and output buffer are sized to buffer two data record blocks with a bit width of 64 bits and a theoretical read bandwidth of 2.4 GB/s at 300 MHz.

In each of the above embodiments, the FPGA-based combiner is composed of a functional module, a control module, and a storage module. The function module can be implemented by DSP and LUT resources on the FPGA chip, and the memory module can be implemented by BRAM resources on the FPGA chip. The execution status of each functional module is managed by the corresponding control module and can be executed in a pipeline mode, which is beneficial to improving the utilization efficiency of the FPGA chip.

The FPGA-based combiner provided by the embodiments of the present application can be applied to various database systems, for example, can be applied to LevelDB or RocksDB. The working process of the combiner provided by the embodiment of the present application is described in detail by taking LevelDB or RocksDB as an example.

See Figure 6b, which is the implementation structure of LevelDB or RocksDB with FPGA-based combiner. LevelDB or RocksDB is a KV-based database based on log delta storage, which actually stores a series of KV records. In LevelDB or RocksDB, when a KV record needs to be written, the KV record is first written into a log file; when the log file is successfully written, the KV record is written into the memory memtable file; When the size of the memtable file reaches a certain value, the memtable file is converted into an immutable memtable file, and then the key (Key) of the KV record in the immutable memtable file is traversed from small to large, and sequentially written to a level_0 layer of the disk. In the SST file. The immutable memtable file is a multi-level queue SkipList in which KV records are ordered according to Key. Storing most of the KV records to disk using tiered storage reduces the consumption of memory resources and enables persistent storage.

The KV records in each SST file are stored in the order of Key from small to large, and the Key range between different SST files except the SST file under level_0 (between the minimum key and the maximum key in the SST file) There will be no overlap. Because the level_0 file comes directly from memory, the key range of any two SST files under level_0 may overlap.

In LevelDB or RocksDB, when reading KV records, you need to search in the memtable file, immutable memtable file, and SST file of each level on the disk according to the freshness of KV records. It is more complicated and slower to find. In order to speed up the reading of KV records, the prior art adopts a compaction method to sort and compress existing KV records, remove some invalid KV records, reduce the query complexity and reduce the query efficiency by reducing the number of files.

The KV records in the immutable memtable file can be merged when the key (Key) recorded by the KV in the immutable memtable file is traversed from small to large and sequentially written into a new SST file on the level_0 layer of the disk. Alternatively, when the number of SST files under a certain level (for example, level_L) on the disk exceeds a preset value, the SST file under the level_L and the SST file at the higher level level_L+1 may be merged.

After selecting a certain level for merging, you can take turns to select the files that need to participate in the merging at the level_L. For example, the first time file A is selected for merging, and the second time, the key range can be selected next to file B of file A for merging, so that each file has the opportunity to merge and merge the higher level files.

When it is determined that the files of the file A and the level_L+1 layer of the level_L are merged, all the files whose key range overlaps with the file A in the Key range, such as files B, C, and D, may be selected from the files in the level_L+1 layer. And merge all files with file A.

Optionally, the processor may sort the KV records in the files A, B, C, and D according to the Key from small to large, and divide the corresponding input buffers according to the number of input buffers included in the FPGA-based combiner. The KV records the group and then notifies the FPGA-based combiner. The combiner reads each KV record set from memory and stores it in the memory unit (DDR) of the combiner. As shown in FIG. 6b, assuming that the FPGA-based combiner includes four input buffers, the KV records in files A, B, C, and D are divided into four groups, corresponding to Way0 to Way3, respectively. In each way, the KV records are sorted in ascending order by Key and version number, and the range of Key values between any two paths may overlap.

On the one hand, after the KV recording process in the input buffer is completed, the control unit controls the input buffer to read the next pending KV Block from the DDR according to the offset address of the next KV block. The control unit here can be implemented as a load controller.

On the one hand, after the decoding unit completes decoding of a KV interval, the control unit reads the next KV interval from the input buffer according to the offset address of the KV interval in the input buffer, and sends it to the decoding unit. The control unit here can be implemented as a Decoder Controller. The decoding unit decodes the KV record in the KV interval and outputs the decoded result to the decoding buffer.

On one hand, the control unit controls the transmission unit to transmit the decoding result corresponding to the minimum key from the corresponding decoding buffer to the encoding buffer according to the minimum key fed back by the merging unit, and controls the transmission unit to continue to read from the four decoding buffers respectively. A decoding result (KV0, KV1, KV2, KV3 as shown in FIG. 6b) is taken and supplied to the merging unit for the merging unit to continue the merging process. For the merging unit, the four decoding results KV0, KV1, KV2, and KV3 transmitted by the transmission unit are received, and the minimum key in the previous merging process is compared with the Key in the four decoding results, and is sent to the control unit. Feedback minimum Key. The control unit here can be implemented as a Compaction Controller.

In one aspect, the control unit may read the next decoding result to be encoded from the encoding buffer after the encoding unit completes encoding of the decoding result, and send the decoding result to the encoding unit. The control unit here can be implemented as an encoder controller.

In this embodiment, the entire data merging process is divided into three stages of decoding (Decoder), comparison merging (Compaction) and encoding (Encoder), and solidifying certain computing resources and data buffer resources for each functional module on the FPGA. The various stages are executed in a pipeline by the control unit, which greatly improves the efficiency of the data merge process. At the same time, the CPU resources occupied by the data merge operation are released, the overall performance of the database is improved, and the performance jitter problem is improved. In addition, in conjunction with the processor in the database system, there is no need to modify the triggering conditions of the merge operation, so there is no special requirement for the application scenario, and it can be applied to different load scenarios.

It should be noted that the execution bodies of the steps of the method provided by the foregoing embodiments may all be the same device, or the method may also be performed by different devices. For example, the execution body of steps 20a to 22a may be device A; for example, the execution body of steps 20a and 21a may be device A, the execution body of step 22a may be device B, and the like.

In addition, some of the processes described in the above-described embodiments and the accompanying drawings include a plurality of operations occurring in a specific order, but it should be clearly understood that the operations may be performed in the order in which they are presented or executed in parallel. The serial number of the operation, such as 20a, 22a, etc., is only used to distinguish the different operations, and the serial number itself does not represent any execution order. Additionally, these processes may include more or fewer operations, and these operations may be performed sequentially or in parallel. It should be noted that the descriptions of “first” and “second” in this document are used to distinguish different messages, devices, modules, etc., and do not represent the order, nor the “first” and “second”. It is a different type.

Correspondingly, the embodiment of the present application further provides a computer readable storage medium storing a computer program, which can implement the steps performed by the control unit in the foregoing method embodiment when the computer program is executed.

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.

Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.

It is also to be understood that the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, Other elements not explicitly listed, or elements that are inherent to such a process, method, commodity, or equipment. An element defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device including the element.

The above description is only an embodiment of the present application and is not intended to limit the application. Various changes and modifications can be made to the present application by those skilled in the art. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included within the scope of the appended claims.

Claims

An FPGA-based combiner, comprising: a control unit, a storage unit, and a merging unit;

The control unit is configured to load data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit according to a data merge instruction of the database system, and control the merged unit pair The data records of the Nth layer to the N+jth layer to be merged are subjected to a merge process; wherein N is a non-negative integer and j is a non-negative integer;

The merging unit is configured to perform a merging process on the data records of the Nth layer to the N+thth layer that need to be merged to obtain a new N+j layer data record and store the data record in the storage unit. And replacing, by the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the data record of the new N+j layer.
The FPGA-based combiner according to claim 1, further comprising: a transmission unit, wherein the transmission unit is connected to the control unit and the merging unit;

The transmission unit is configured to, under the control of the control unit, transmit, to the merging unit, the data records of the Nth layer to the N+jth layer that need to be merged;

The control unit is specifically configured to: control the transmission unit to transmit, to the merging unit, the data records of the Nth layer to the N+jth layer that need to be merged, to control the merging unit to the merging The data records of the Nth layer to the N+jth layer are merged.
The FPGA-based combiner of claim 2, further comprising: at least one input buffer; wherein the data records of the Nth layer to the N+jth layer to be merged comprise the at least one input At least one data record group corresponding to the buffer;

The control unit is further configured to: respectively cache data records in the at least one data record group in the storage unit into a corresponding input buffer;

The transmission unit is specifically configured to: transmit, by the control unit, a data record in the at least one input buffer to the merging unit.
The FPGA-based combiner according to claim 3, further comprising: at least one decoding unit corresponding to the at least one input buffer, at least one decoding buffer corresponding to the at least one decoding unit, and encoding a unit and an encoding buffer corresponding to the coding unit;

The decoding unit is configured to perform decoding processing on the data record in the corresponding input buffer under the control of the control unit, and output the decoding result to the corresponding decoding buffer;

The encoding unit is configured to perform encoding processing on the data to be encoded in the encoding buffer under the control of the control unit to obtain the data record of the new N+j layer and store the data to the Storage unit

The transmission unit is specifically configured to: after the merging unit completes the current merging process, read a new decoding result from the at least one decoding buffer and transmit the merging to the merging under the control of the control unit a unit for the merging unit to perform a merging process on the new decoding result, and storing, when the result of the current merging process needs to retain the decoding result, the decoding result to be retained as the to-be-encoded data to the encoding Buffer.
The FPGA-based combiner according to claim 4, wherein the transmission unit is specifically configured to: under the control of the control unit, each time the merging unit completes the current merging process, from the at least A new decoding result is read in a decoding buffer and transmitted to the merging unit.
The FPGA-based combiner of claim 4, further comprising: an output buffer and a compression unit;

The output buffer is configured to buffer the data record of the new N+j layer output by the coding unit;

The compressing unit is configured to perform compression processing on the data record of the new N+j layer in the output buffer under the control of the control unit, and output the compression result to the storage unit .
The FPGA-based combiner according to claim 4, wherein the decoding unit is specifically configured to:

For each data record, a keyword prefix length, a keyword suffix length, a keyword suffix, a data value length, and a data value of the data record are decoded from the data record;

And inserting, by the keyword prefix length, the keyword suffix length, the keyword suffix, and the previous keyword, the keyword of the data record to obtain a decoding result, where the decoding result includes: a keyword of the data record Length, length of the data value, keywords and data values of the data record;

The encoding unit is specifically configured to: encode a keyword length, a data value length, a keyword, and a data value in the data to be encoded by using a character stream to obtain a data record of the new N+j layer.
The FPGA-based combiner according to any one of claims 4 to 7, wherein each data record group includes at least one data record block, each data record block including at least one data record interval, each data record The interval includes at least one data record;

The control unit is configured to: after the decoding unit corresponding to the first data record group completes the current decoding process, the control unit is configured to read one from the input buffer corresponding to the first data record group. a new data recording interval is sent to the corresponding decoding unit, and after the last data recording interval in the corresponding input buffer is sent to the corresponding decoding unit, from the first data recording group Reading a new data record block and buffering it into the corresponding input buffer; wherein the first data record group is any one of the at least one data record group.
A data merging method is applicable to an FPGA-based combiner, the method comprising:

Loading, according to the data merge instruction of the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system into the storage unit of the FPGA-based combiner; wherein N is a non-negative integer, j is a non-negative integer;

Performing a merge process on the data records of the Nth layer to the N+jth layer that need to be merged to obtain a new data record of the N+jth layer and storing the data record in the storage unit for use by the database system The data record of the new N+j layer replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system.
A database system, comprising: a memory, a processor, and an FPGA-based combiner;

The memory is configured to store a computer program and at least two layers of data records in the database system;

The processor is coupled to the memory and the combiner for executing the computer program for:

Identifying an Nth layer to an N+jth layer data record to be merged from the at least two layers of data records; transmitting a data merge instruction to the FPGA based combiner to indicate the FPGA based combiner pair Data records of the Nth layer to the N+jth layer to be merged are subjected to a merge process; and a data record of the new N+jth layer output by the FPGA-based combiner is replaced with a block to be merged in the memory Data records from the Nth layer to the N+jth layer; wherein N is a non-negative integer and j is a non-negative integer;

The FPGA-based combiner is configured to receive the data merge instruction, and perform a merge process on the Nth layer to the N+jth layer data record that needs to be merged according to the data merge instruction to obtain the new The data of the N+jth layer is recorded and output to the processor.
The system of claim 10, wherein the FPGA-based combiner comprises: a storage unit, a control unit, and a merging unit;

The control unit is configured to receive the data merge instruction, load the data records of the Nth layer to the N+jth layer that need to be merged from the memory into the storage unit, and control the merge The unit performs a merge process on the data records of the Nth layer to the N+jth layer that need to be merged;

The merging unit is configured to perform a merging process on the data records of the Nth layer to the N+thth layer that need to be merged to obtain the data record of the new N+jth layer and store the data record in the storage unit And replacing, by the processor, the data records of the Nth layer to the N+jth layer that need to be merged in the memory by using the data record of the new N+j layer.
The system according to claim 11, wherein said FPGA-based combiner further comprises: a transmission unit, said transmission unit being coupled to said control unit and said merging unit;

The transmission unit is configured to, under the control of the control unit, transmit, to the merging unit, the data records of the Nth layer to the N+jth layer that need to be merged;

The control unit is specifically configured to: control the transmission unit to transmit, to the merging unit, the data records of the Nth layer to the N+jth layer that need to be merged, to control the merging unit to the merging The data records of the Nth layer to the N+jth layer are merged.
The system of claim 12, wherein the FPGA-based combiner further comprises: at least one input buffer; the data records of the Nth layer to the N+jth layer to be merged include At least one data record group corresponding to at least one input buffer;

The control unit is further configured to: respectively cache data records in the at least one data record group in the storage unit into a corresponding input buffer;

The transmission unit is specifically configured to: transmit, by the control unit, a data record in the at least one input buffer to the merging unit.
The system according to claim 13, wherein the FPGA-based combiner further comprises: at least one decoding unit corresponding to the at least one input buffer, and at least one decoding buffer corresponding to the at least one decoding unit a region, a coding unit, and an encoding buffer corresponding to the coding unit;

The decoding unit is configured to perform decoding processing on the data record in the corresponding input buffer under the control of the control unit, and output the decoding result to the corresponding decoding buffer;

The encoding unit is configured to perform encoding processing on the data to be encoded in the encoding buffer under the control of the control unit to obtain the data record of the new N+j layer and store the data to the Storage unit

The transmission unit is specifically configured to: after the merging unit completes the current merging process, read a new decoding result from the at least one decoding buffer and transmit the merging to the merging under the control of the control unit a unit for the merging unit to perform a merging process on the new decoding result, and storing, when the result of the current merging process needs to retain the decoding result, the decoding result to be retained as the to-be-encoded data to the encoding Buffer.