WO2019165901A1 - Data merging method, fpga-based merger and database system - Google Patents

Data merging method, fpga-based merger and database system Download PDF

Info

Publication number
WO2019165901A1
WO2019165901A1 PCT/CN2019/075322 CN2019075322W WO2019165901A1 WO 2019165901 A1 WO2019165901 A1 WO 2019165901A1 CN 2019075322 W CN2019075322 W CN 2019075322W WO 2019165901 A1 WO2019165901 A1 WO 2019165901A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
unit
layer
merging
data record
Prior art date
Application number
PCT/CN2019/075322
Other languages
French (fr)
Chinese (zh)
Inventor
许浩
周军蕊
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019165901A1 publication Critical patent/WO2019165901A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Definitions

  • the present application relates to the field of database technologies, and in particular, to a data merging method, an FPGA-based merging device, and a database system.
  • KV key-value pair
  • LevelDB or RocksDB most of the KV records are stored on disk using tiered storage, which reduces memory resource consumption and enables persistent storage.
  • reading KV records needs to be searched in order according to the freshness of KV records in the data files of each level in the memory and disk, which is more complicated and slower to find.
  • aspects of the present application provide a data merge method, a FPAG-based combiner, and a database system to reduce the impact of the data merge process on the write and query performance of the database system.
  • An embodiment of the present application provides an FPGA-based combiner, including: a control unit, a storage unit, and a merging unit;
  • the control unit is configured to load data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit according to a data merge instruction of the database system, and control the merged unit pair
  • the data records of the Nth layer to the N+jth layer to be merged are subjected to a merge process; wherein N is a non-negative integer and j is a non-negative integer;
  • the merging unit is configured to perform a merging process on the data records of the Nth layer to the N+thth layer that need to be merged to obtain a new N+j layer data record and store the data record in the storage unit. And replacing, by the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the data record of the new N+j layer.
  • the embodiment of the present application further provides a data merging method, which is applicable to an FPGA-based combiner, and the method includes:
  • the embodiment of the present application further provides a database system, including: a memory, a processor, and an FPGA-based combiner;
  • the memory is configured to store a computer program and at least two layers of data records in the database system
  • the FPGA-based combiner is configured to receive the data merge instruction, and perform a merge process on the Nth layer to the N+jth layer data record that needs to be merged according to the data merge instruction to obtain the new The data of the N+jth layer is recorded and output to the processor.
  • a merger is implemented based on the FPGA, and the FPGA-based combiner is applied to the database system, and is responsible for merging the data records of the Nth layer to the N+j layer that need to be merged in the database system. Processing, reducing the data mining operation on the CPU resources in the database system, reducing the impact on the database system write and query performance, improve the overall capacity of the database system, improve performance jitter.
  • FIG. 1 is a schematic structural diagram of a database system according to an exemplary embodiment of the present application
  • FIG. 1b is a schematic diagram of a hierarchical structure of data file formation on a disk according to an exemplary embodiment of the present disclosure
  • 1c is a schematic diagram of a hierarchical structure formed by a data file on a memory and a disk according to an exemplary embodiment of the present application;
  • 1d is a schematic flowchart of a data merging method described from the perspective of an FPGA-based combiner according to an exemplary embodiment of the present application;
  • FIG. 2a is a schematic structural diagram of an FPGA-based combiner according to another exemplary embodiment of the present application.
  • FIG. 2b is a schematic flowchart diagram of a data merging method based on the combiner shown in FIG. 2a according to another exemplary embodiment of the present application;
  • FIG. 2b is a schematic flowchart diagram of a data merging method based on the combiner shown in FIG. 2a according to another exemplary embodiment of the present application;
  • FIG. 3a is a schematic structural diagram of another FPGA-based combiner provided by another exemplary embodiment of the present application.
  • FIG. 4a is a schematic structural diagram of an FPGA-based combiner with an on-chip cache function according to still another exemplary embodiment of the present application;
  • FIG. 4b is a schematic flowchart diagram of a data merging method based on the merging device shown in FIG. 4a according to another exemplary embodiment of the present application;
  • FIG. 4b is a schematic flowchart diagram of a data merging method based on the merging device shown in FIG. 4a according to another exemplary embodiment of the present application;
  • FIG. 5a is a schematic structural diagram of an FPGA-based combiner with a codec function according to still another exemplary embodiment of the present application.
  • FIG. 5b is a schematic diagram of a data record hierarchy provided by another exemplary embodiment of the present application.
  • FIG. 6a is a schematic structural diagram of an FPGA-based combiner with a compression function according to still another exemplary embodiment of the present application
  • FIG. 6b is an implementation structure of a database system LevelDB or RocksDB according to another exemplary embodiment of the present application.
  • data merge can be used to remove some invalid data records, which is beneficial to improve query efficiency, but will reduce the write and query performance of the database system.
  • the embodiment of the present application provides a solution, and the main idea is: implementing a combiner based on the FPGA, applying the FPGA-based combiner to the database system, and the merger needs to merge in the database system
  • the data records of adjacent layers are merged to reduce the occupancy rate of the CPU resources of the database system by the data merge operation, reduce the impact on the write and query performance of the database system, improve the overall capacity of the database system, and improve the performance jitter problem. .
  • the FPGA-based combiner will be referred to as a combiner in some descriptions of the following embodiments of the present application, and those skilled in the art can understand that the "combiner” in the embodiments of the present application "The same concept as "FPGA-based merger”.
  • FIG. 1 is a schematic structural diagram of a database system according to an exemplary embodiment of the present application.
  • the database system 100 includes a memory 10, an FPGA-based combiner 20, and a processor 30.
  • the memory 10 is coupled to the processor 30 and the FPGA-based combiner 20, respectively.
  • the memory 10 is mainly used as a storage space of the database system 100, and may include at least one storage medium, and at least one storage medium may be the same type of storage medium or a different type of storage medium.
  • the memory 10 may include a volatile storage medium such as a RAM, and may also include a non-volatile storage medium such as a Read-Only Memory (ROM), a flash memory, or the like.
  • the memory 10 mainly includes a memory and a magnetic disk.
  • the memory is generally implemented by a volatile storage medium
  • the magnetic disk is generally implemented by a non-volatile storage medium.
  • the memory 10 can store various data associated with the database system 100, such as data records stored by the database system 100, an operating system (OS) of the database system 100, various computer programs running on the database system 100, program data, and the like.
  • OS operating system
  • data records are stored in the memory 10 in a hierarchical storage manner. There are at least two layers of data records in the memory 10.
  • at least two levels of data files may be included in the memory 10, and each level may include at least one data file, and each data file stores some or all of the data records of the level to which the file belongs.
  • the data record when writing a data record, the data record is first written into a log file on the disk; when the log file is successfully written, the data record is written into the memory; After the space occupancy reaches a certain limit, the data records in the memory are exported to a new data file on the disk.
  • the data file on the disk is a hierarchical structure. For example, the first layer (the layer closest to the memory) is Level_0, the second layer is Level_1, and so on, and the level is gradually increased.
  • the data records need to be searched in order according to the freshness of the data records in the memory and the data files on the disk.
  • the search speed is relatively slow and the search speed is slow.
  • the existing data record can be collated and compressed by using a compaction method to remove some invalid data records, thereby reducing the number of data records and reducing the number of data files of each level. Reduce query complexity and improve query efficiency.
  • a memory program associated with the data merge process is also stored in the memory 10, and the processor 30 executes the computer program to implement a new data merge scheme in conjunction with the FPGA-based combiner 20.
  • the processor 30 executes a computer program related to the data merge process stored in the memory 10, and can identify the Nth layer to the N+jth layer data record to be merged from at least two layers of data records.
  • the FPGA-based combiner 20 sends a data merge instruction to instruct the FPGA-based combiner 20 to merge the data records of the Nth to N+thth layers that need to be merged; and the FPGA-based combiner 20 pairs need to be merged After the data records of the Nth layer to the N+jth layer are merged and the data record of the new N+jth layer is output, the new N+j layer data output by the FPGA-based combiner 20 can also be utilized.
  • N is a non-negative integer, for example, may be 0, 1, 2, 3, etc.
  • j is a non-negative integer, for example, may be 0, 1, 2, 3, and the like.
  • the data records on the disks included in the memory 10 may be merged.
  • the at least two layers of data records described in the embodiments of the present application mainly include data records of the layers stored on the disk.
  • the data file a is included in the memory, and the data file level_0, the data file level_1, the data file level_2, ..., the data file level_n are stored in the hierarchical structure on the disk, and only the data records on the disk are needed.
  • the data file level_0 can be regarded as the 0th layer
  • the data file level_1 is regarded as the first layer
  • the data file level_2 layer is regarded as the second layer
  • the data file level_n is regarded as the nth layer
  • the at least two layers of data records described in the embodiments of the present application mainly include data files level_0, data file level_1, data files level_2, ..., and data records stored in the data file level_n; in other words, the processor 30 only needs to
  • the data file level_0, the data file level_1, the data file level_2, ..., and the data file level_n may identify the data records of the Nth layer to the N+jth layer to be merged. Where N + j ⁇ n, n is a non-negative integer.
  • the data records in the memory need to be accumulated to a certain amount before being exported to the data files on the disk, and overlapping data records may occur during the accumulation of the data records, so
  • the data records in the memory are merged.
  • the two-tier data record mainly includes the data records of each layer stored in the memory and on the disk. Among them, the data record stored in the memory forms a hierarchical structure with the data records of each layer stored on the disk. As shown in FIG.
  • the memory includes a data file a, and the data file level_0, the data file level_1, the data file level_2, ..., the data file level_n are stored in the hierarchical structure on the disk, and the data file a is regarded as the 0th layer, and the data is The file level_0 is treated as the first layer, the data file level_1 is regarded as the second layer, the data file level_2 layer is regarded as the third layer, and so on, and the data file level_n is regarded as the n+1th layer, so that the data record stored in the memory is A low-to-high hierarchical structure is formed between the data records of the layers stored on the disk.
  • the at least two data records described in the embodiment of the present application mainly include the data file a, the data file level_0, the data file level_1, the data file level_2, ... ..., and the data record stored in the data file level_n; in other words, the processor 30 needs to identify the Nth to be merged from the data file a, the data file level_0, the data file level_1, the data file level_2, ..., and the data file level_n. Data record from layer to N+j layer.
  • the value of j is a positive integer such as 1, 2, and 3.
  • the value of j can also be zero.
  • the value of j is 0, which means that the data records of the same layer can be merged.
  • the embodiment of the present application does not limit the correspondence between the value of N and the level.
  • the processor 30 may be triggered by different events or conditions to identify the Nth to Nth jth layer data records to be merged from the at least two layers of data records.
  • Example 1 A data merge period can be set, and each time the data merge period arrives, the processor 30 can be triggered to identify the Nth layer to the N+jth layer data record to be merged from at least two layers of data records.
  • the Nth to Nth jth layer data records that need to be merged may be data records of all the layers.
  • Example 2 When the number of data records of a certain level reaches a set upper limit, the processor 30 can be triggered to identify the hierarchical data record and the data of several upper levels adjacent to the hierarchy from at least two layers of data records. Record as the Nth to N+jth layer data records that need to be merged.
  • Example 3 In the scenario where the memory 10 includes a memory and a disk, the data record is continuously written into the memory, and when the memory space occupancy reaches a certain limit, the processor 30 can be triggered to record the data in the memory and the first on the disk. Hierarchical data records are recorded as the Nth to Nth jth layer data records that need to be merged.
  • the processor 30 may carry the identifier information related to the data records of the Nth layer to the N+jth layer in the data merge instruction, for example, may be a layer identifier and/or an identifier of the data file where the data record is located, etc., to facilitate The FPGA-based combiner 20 can learn from the data merge instruction that the Nth layer to the N+thth layer data record needs to be merged.
  • the FPGA-based combiner 20 is further connected to the processor 30, and is configured to receive a data merge instruction sent by the processor 30, and perform, according to the data merge instruction, a combination processing of the Nth layer to the N+jth layer data record that needs to be merged, Obtaining a new N+j layer data record and outputting it to the processor 30 for the processor 30 to replace the Nth layer to the N+j in the memory 10 that need to be merged with the new N+j layer data record.
  • Layer data record The data records of the Nth layer to the N+jth layer are replaced with the data records of the new N+jth layer, and the data record is merged.
  • the processor 30 may load the Nth to Nth jth layer data records that need to be merged into the memory included in the memory 10.
  • the processor 30 may load the Nth to Nth jth layer data records that need to be merged into the memory included in the memory 10.
  • the efficiency of the FPGA-based combiner 20 to read the data record is improved, and further Improve the overall efficiency of the data consolidation process.
  • the FPGA-based combiner 20 can be mounted in the database system 100 in the form of a PCIE board, and the FPGA-based combiner 20 can read the Nth that needs to be merged from the memory of the database system 100 through the PCIE channel. Layer to N+j layer data record.
  • the FPGA-based combiner 20 is added to the database system 100, and the data merge process is mainly performed by the FPGA-based combiner 20, which can save the computing resources of the processor 30 and reduce the processing load of the processor 30, so that The processor 30 can focus more on the writing and querying of data records, realize the separation of data storage (writing and query) and data merging, thereby reducing the impact of data merging operations on data writing and query performance, and improving the database system 100. Overall ability to improve performance jitter issues.
  • the embodiment can fully utilize the resource advantages of the FPGA, and can merge data records of two adjacent layers or even two layers (j ⁇ 2) without affecting data writing and query performance.
  • the data merge process is more flexible, and the merge efficiency is higher, which is not limited by the application scenario.
  • an exemplary embodiment of the present application further provides a data merging method.
  • the method is primarily described from the perspective of the FPGA-based combiner 20, as shown in Figure 1d, the method comprising:
  • the data merge instructions may be generated and transmitted by a processor in the database system.
  • the FPGA-based combiner includes a storage unit for storing data records of the Nth layer to the N+jth layer that need to be merged loaded from the database system.
  • the processor in the database system identifies the data records of the Nth layer to the N+th layer that need to be merged
  • the data records of the Nth layer to the N+th layer that need to be merged are loaded into the database system.
  • the FPGA-based combiner can directly read the Nth to Nth jth data records that need to be merged from the memory of the database system and store them in the storage unit of the FPGA-based combiner.
  • the storage unit of the FPGA-based combiner may be double-rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR RAM), but is not limited thereto.
  • the FPGA-based combiner loads the data records of the Nth layer to the N+jth layer that need to be merged into its own storage unit according to the data merge instruction of the database system, and then targets the storage unit.
  • the data records of the N layer to the N+j layer are data merged to free the processor from the data merge operation, which can save the computing resources of the processor, reduce the processing load of the processor, and enable the processor to focus more on the data record.
  • the write and query realize the separation of data storage (write and query) and data merge, which reduces the impact of data merge operation on data write and query performance, improves the overall capacity of the database system, and improves performance jitter.
  • the embodiment can fully utilize the resource advantages of the FPGA, and can merge data records of two adjacent layers or even two layers (j ⁇ 2) without affecting data writing and query performance.
  • the data merge process is more flexible, and the merge efficiency is higher, which is not limited by the application scenario.
  • the FPGA-based combiner may have multiple implementation structures, and accordingly, the process of combining the Nth to N+jth data records by the combiner having different implementation structures may also be different.
  • the embodiment of the present application does not limit the internal implementation structure of the FPGA-based combiner. Any combiner structure that can be implemented by the FPGA and can perform the data merge method shown in FIG. 1d is applicable to the embodiment of the present application.
  • the following embodiments of the present application provide an internal implementation structure of several FPGA-based combiners, and a detailed description of the data merge process of a combiner having different internal implementation structures.
  • FIG. 2a is a schematic structural diagram of an FPGA-based combiner according to another exemplary embodiment of the present application.
  • the FPGA-based combiner provided in this embodiment can be applied to a database system and cooperates with a processor in a database system to implement a new data merge logic.
  • the FPGA-based combiner mainly includes a storage unit 21, a control unit 22, and a merging unit 23.
  • the storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like. As shown in FIG. 2a, both the control unit 22 and the merging unit 23 can access the storage unit 21. Alternatively, storage unit 21 may include on-chip memory implemented internal to the combiner, and/or off-chip memory implemented external to the combiner. In the illustration of the embodiments of the present application, the storage unit 21 is located outside the combiner as an example, but is not limited thereto.
  • the control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner.
  • the control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction. Storage unit 21.
  • the control unit 22 can also control the merging unit 23 to perform merging processing on the data records of the Nth layer to the N+jth layer that need to be merged.
  • N is a non-negative integer, for example, may be 0, 1, 2, 3, etc.
  • j is a non-negative integer, for example, may be 1, 2, 3, or the like.
  • the value of j is a positive integer such as 1, 2, and 3.
  • the value of j can also be zero.
  • the value of j is 0, which means that the data records of the same layer can be merged.
  • the merging unit 23 is a functional module in the FPGA-based combiner, and under the control of the control unit 22, accesses the data records of the Nth layer to the N+jth layer stored in the storage unit 21, and the Nth layer to the The data records of the N+j layer are merged to obtain a new N+j layer data record and stored in the storage unit 21 for the database system to replace the database system with the new N+j layer data record.
  • the merged Nth to N+thth layer data records are used to achieve the purpose of data merge.
  • the merging unit 23 combines the data records of the Nth layer to the N+jth layer mainly by comparing the data records of the Nth layer to the N+jth layer, and removing duplicate or invalid data records, thereby obtaining the The process of being retained for data logging. Depending on the application scenario or service requirements, the merging unit 23 compares the Nth to Nth jth layer data records, and the process of removing duplicate or invalid data records may be different.
  • another exemplary embodiment of the present application further provides a data merging method that describes the operation principle of the FPGA-based combiner shown in FIG. 2a. As shown in Figure 2b, the method includes:
  • the control unit receives a data merge instruction from a database system, which is a database system in which the FPGA-based combiner is located.
  • the control unit loads, according to the data merge instruction, the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner.
  • the control unit controls the merging unit to combine the data records of the Nth layer to the N+jth layer stored in the storage unit to obtain a new data record of the N+jth layer and store the data record in the storage unit for
  • the database system replaces the Nth to Nth jth layer data records that need to be merged in the database system with the new N+j layer data record.
  • control unit receives a data merge instruction from the database system, and loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the FPGA-based combiner according to the data merge instruction.
  • the storage unit then controls the merging unit to merge the data records of the Nth layer to the N+jth layer stored in the storage unit.
  • This embodiment does not limit the control logic that the control unit controls the merging unit to perform the merging processing of the Nth layer to the N+jth layer data records stored in the storage unit.
  • a processing cycle may be set according to the capabilities of the merging unit, such as the number of data records that the merging unit can process at most for each merging process and the time required to complete a merging process.
  • the control unit may periodically control the merging unit to read the partial data record in the storage unit according to the processing cycle (less than or equal to the maximum number of data records that can be processed by the merging unit each time the merging process) or periodically read from the storage unit. Take a partial data record (less than or equal to the maximum number of data records that the merging unit can process at each merge process) and feed it into the merging unit.
  • a partial data record may be periodically read into the storage unit or a partial data record periodically sent by the control unit may be received, and based on some attribute information of the data record, The partial data records read by the control unit or sent by the control unit are merged until all the data records stored in the storage unit are merged.
  • the merging unit sends a merge merge completion notification message to the control unit after completing the merging process of the current data record.
  • the control unit may read the data in the storage unit when receiving the merge completion notification message sent by the merging unit. Record (less than or equal to the number of data records that the merging unit can process at most for each merging process) and provide it to the merging unit or control the merging unit to read the data record again in the storage unit (less than or equal to the merging unit can process at most each merging process) Number of data records).
  • each part of the data record is read from the storage unit or a part of the data record sent by the control unit is received, and based on some attribute information of the data record, for each read
  • the partial data records sent by the control unit or the control unit are merged until all the data records stored in the storage unit are merged.
  • the data records of the Nth layer to the N+jth layer that need to be merged in the database system are loaded into the storage unit of the combiner by the control unit in the combiner, and the merge in the combiner is controlled.
  • the unit accesses the logic of the storage unit, thereby controlling the merging unit to perform merging processing on the data records of the Nth layer to the N+thth layer stored in the storage unit, and outputting the merged processing as the data record of the new N+j layer.
  • FIG. 3 is a schematic structural diagram of another FPGA-based combiner provided by another exemplary embodiment of the present application.
  • the FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic.
  • the FPGA-based combiner includes a storage unit 21, a control unit 22, a merging unit 23, and a transmission unit 24.
  • the storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like.
  • the control unit 22 and the transmission unit 24 can directly access the storage unit 21, and the merging unit 23 no longer directly accesses the storage unit 21, but accesses the storage unit 21 through the transmission unit 24.
  • the control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner.
  • the control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction. Storage unit 21.
  • the control unit 22 can also control the transmission unit 24 to transfer the data records of the Nth layer to the N+jth layer that need to be merged stored in the storage unit 21 to the merging unit 23, thereby achieving the control merge unit 23 to merge.
  • the data records of the Nth layer to the N+jth layer are subjected to the purpose of the merge processing.
  • N and j are non-negative integers. For the values of N and j, refer to the description of the foregoing embodiment, and details are not described herein again.
  • the transmission unit 24 is a data channel in the FPGA-based combiner, and is mainly responsible for the data transfer logic inside the combiner.
  • the transmission unit 24 can read the data records of the Nth layer to the N+jth layer to be merged from the storage unit 21 under the control of the control unit 22, and transfer the needs stored in the storage unit 21 to the merging unit 23. Data records of the merged Nth to N+jth layers.
  • the merging unit 23 is a functional module in the FPGA-based combiner, and can receive the data records of the Nth layer to the N+jth layer transmitted by the transmission unit 24, and perform data records of the Nth layer to the N+jth layer. Merging processing to obtain a new N+j layer data record and storing it in the storage unit 21 for the database system to replace the Nth layer to the Nth in the database system that need to be merged with the new N+j layer data record +j layer data records to achieve the purpose of data consolidation.
  • yet another exemplary embodiment of the present application also provides a data merging method that describes the operation of the FPGA-based combiner shown in Figure 3a. As shown in Figure 3b, the method includes:
  • the control unit receives a data merge instruction from a database system, which is a database system in which the FPGA-based combiner is located.
  • the control unit loads, according to the data merge instruction, the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner.
  • the control unit controls the transmission unit to read the data records of the Nth layer to the N+jth layer to be merged from the storage unit and transmit the data records to the merging unit.
  • the merging unit receives the data records of the Nth layer to the N+jth layer transmitted by the transmission unit, and combines the data records of the Nth layer to the N+jth layer to obtain a new N+j layer.
  • the data is recorded and stored in the storage unit for the database system to replace the Nth to Nth jth layer data records in the database system that need to be merged with the new N+j layer data record.
  • control unit receives a data merge instruction from the database system, and loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the FPGA-based combiner according to the data merge instruction.
  • the storage unit then controls the transfer unit to transfer the data records of the Nth layer to the N+jth layer stored in the storage unit to the merging unit, so that the merging unit can merge the data records of the Nth layer to the N+jth layer .
  • the embodiment does not limit the control logic that the control unit controls the transmission unit to transmit the Nth layer to the N+jth layer data record to the merging unit.
  • a processing cycle may be set according to the capabilities of the merging unit, such as the number of data records that the merging unit can process at most for each merging process and the time required to complete a merging process.
  • the control unit may periodically control the transmission unit to read the partial data record in the storage unit according to the processing cycle (less than or equal to the maximum number of data records that can be processed by the merging unit per merge process) and read the partial data record. Transfer to the merging unit for the merging unit to merge for this part of the data record.
  • the merge processing unit sends a merge merge completion notification message to the control unit each time the merge processing of the current data record is completed; the control unit may control the transfer unit to the storage unit again upon receiving the merge completion notification message sent by the merge unit.
  • Reading a partial data record (less than or equal to the number of data records that the merging unit can process at most for each merging process) and transmitting the read partial data record to the merging unit for the merging unit to merge for the part of the data record .
  • a transmission unit dedicated to data transmission is added, and the transmission unit is responsible for reading the data record from the storage unit and providing the data record to the merging unit, which can simplify the function of the merging unit, so that the merging unit can be more focused on Data merging can simplify the control logic of the control unit. While completing the data merging, it can simplify the implementation logic of the FPGA-based combiner and improve the efficiency of data merging by the FPGA-based combiner.
  • FIG. 4a is a schematic structural diagram of an FPGA-based combiner with an on-chip cache function according to still another exemplary embodiment of the present application.
  • the FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic.
  • the FPGA-based combiner includes a storage unit 21, a control unit 22, a merging unit 23, a transmission unit 24, and at least one input buffer 25.
  • the storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like.
  • the input buffer 25 is an input buffer of the FPGA-based combiner, and the data records of the Nth layer to the N+jth layer stored in the storage unit 21 can be buffered under the control of the control unit 22. As shown in FIG. 4a, the input buffer 25 may be one or more, and the control unit 22 and the transfer unit 24 may directly access the input buffer 25, and the merging unit 23 may access the input buffer 25 through the transfer unit 24.
  • the control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner.
  • the control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction.
  • Storage unit 21 the control unit 22 may also buffer the data records of the Nth layer to the N+jth layer stored in the storage unit 21 into at least one input buffer 25, and control the transmission unit 24 to input at least one input buffer.
  • the data record in 25 is transferred to the merging unit 23, thereby achieving the purpose of combining the data records of the Nth layer to the N+jth layer by the control merging unit 23.
  • N, j is a non-negative integer.
  • N, j refer to the description of the foregoing embodiment, and details are not described herein again.
  • control unit 22 can determine whether to cache new data records into the input buffer 25 based on whether there is available space in the input buffer 25. For example, after the transfer unit 24 transfers the data record in the input buffer 25 to the merging unit 23, the control unit 22 can read the new data record from the segment storage unit 21 and cache it into the input buffer 25.
  • the database system (mainly the processor in the database system) can know the number of input buffers 25 included in the combiner through the API interface of the combiner.
  • the database system may pre-divide the data records of the Nth layer to the N+jth layer to be merged into at least one input buffer according to the number of input buffers 25 included in the combiner.
  • At least one data record group corresponding to the area 25, and the related information of the data record group that should be buffered by each input buffer 25 is carried in the data merge instruction to the control unit 22.
  • the information about the data record group may be different according to the merge requirement, for example, the identifier of the data record in the data record group, the offset address, the snapshot version number, and the like.
  • Each data record set includes at least one data record.
  • control unit 22 may buffer the data records in the at least one data record group in the storage unit 21 into the corresponding input buffers 25, respectively, and control the transmission unit 24 to transfer the data records in the at least one input buffer 25.
  • the merging unit 23 is given, thereby achieving the purpose of combining the data records of the Nth layer to the N+jth layer by the control merging unit 23.
  • the transmission unit 24 is a data channel in the FPGA-based combiner, and is mainly responsible for the data transfer logic inside the combiner.
  • the transmission unit 24 may transmit the data record in the at least one input buffer 25 to the merging unit 23 under the control of the control unit 22.
  • the transmission unit 24 can read one data record from each of the at least one input buffer 25 and transmit it to the merging unit 23 each time under the control of the control unit 22.
  • the merging unit 23 is a functional module in the FPGA-based combiner, and can receive the data records transmitted by the transport unit 24, and combine these data records to obtain a new N+j layer data record and store it in the storage.
  • the database system replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the new N+j layer data record, thereby achieving the purpose of data merge.
  • yet another exemplary embodiment of the present application further provides a data merging method that describes the operation principle of the FPGA-based combiner shown in FIG. 4a. As shown in Figure 4b, the method includes:
  • the control unit receives a data merge instruction from a database system, which is a database system in which the FPGA-based combiner is located.
  • the control unit loads, according to the data merge instruction, the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner, the Nth layer to the N+j
  • the data record of the layer includes at least one data record group corresponding to at least one input buffer.
  • the control unit reads the data records in the at least one data record group from the storage unit according to the correspondence between the data record group and the input buffer, and caches the data records in the corresponding input buffer.
  • the control unit controls the transmission unit to read the data record from the at least one input buffer and transmit the data record to the merging unit.
  • the merging unit receives the data record transmitted by the transmission unit, and combines the data records transmitted by the transmission unit each time to obtain a new data record of the N+jth layer and stores the data record in the storage unit for the database.
  • the system replaces the Nth to Nth jth layer data records that need to be merged in the database system with the new N+j layer data record.
  • the control unit receives a data merge instruction from the database system, and loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the FPGA-based combiner according to the data merge instruction.
  • the storage unit then reads the data records in each data record group from the storage unit and caches them in the corresponding input buffer.
  • the control transfer unit reads the data records from the input buffers and transmits them to the storage unit.
  • the merging unit enables the merging unit to merge the data records of the Nth layer to the N+jth layer.
  • control unit may read the new data record buffer from the corresponding data record group in the storage unit to the corresponding input according to whether there is available space in the input buffer. Inside the buffer.
  • the transmission unit can read the data records from the input buffer in batches and transmit them to the merging unit. For example, the transmission unit reads one data record from each input buffer and transmits it to the merging unit each time. As another example, the transmission unit reads several data records from one input buffer at a time and transmits them to the merging unit. As another example, the transmission unit reads a number of data records from a portion of the input buffer and transmits them to the merging unit each time. Regardless of the transmission mode, the number of data records transmitted by the transmission unit to the merging unit each time is less than or equal to the maximum number of data records that the merging unit can process at a time.
  • the logic for the control unit to buffer the data record in the input buffer and the logic for the transfer unit to read the data record from the input buffer are not limited. These two logics can be independent of each other and can also work together.
  • control unit may monitor whether the data record in the input buffer has been completely transmitted by the transmission unit to the merging unit; when the data records in the input buffer are all transmitted to the merging unit by the transmission unit, the control unit The new data record is read from the corresponding data record group in the storage unit and cached into the input buffer.
  • an input buffer is added inside the combiner for buffering the data record in the storage unit, so that the transmission unit can directly read the data record from the input buffer, which is beneficial to improving the transmission unit to read the data record.
  • Efficiency which in turn increases data transfer efficiency, helps to further improve the overall efficiency of the data consolidation process.
  • FIG. 5 is a schematic structural diagram of an FPGA-based combiner with a codec function according to still another exemplary embodiment of the present application.
  • the FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic.
  • the FPGA-based combiner includes: a storage unit 21, a control unit 22, a merging unit 23, a transmission unit 24, at least one input buffer 25, at least one decoding unit 26, at least one decoding buffer 27, Encoding unit 28 and encoding buffer 29.
  • each decoding unit 26 is responsible for decoding the data record buffered by one input buffer 25 and outputting the decoding result to the corresponding decoding buffer.
  • area 27 there is a one-to-one correspondence between the decoding unit 26, the input buffer 25 and the decoding buffer 27, that is, each decoding unit 26 is responsible for decoding the data record buffered by one input buffer 25 and outputting the decoding result to the corresponding decoding buffer.
  • the storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like.
  • the input buffer 25 is an on-chip buffer of the FPGA-based combiner, and the data record stored in the storage unit 21 can be cached under the control of the control unit 22. As shown in FIG. 5a, the input buffer 25 can be one or more, and the control unit 22 and the decoding unit 26 can directly access the input buffer 25, and the transmission unit 24 can directly access the decoding buffer 27.
  • the control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner.
  • the control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction.
  • Storage unit 21 the control unit 22 can also buffer the data records in the storage unit 21 into at least one input buffer 25, and control the decoding unit 26 to decode the data records buffered in the corresponding input buffer 25 and output the decoding result. To the corresponding decoding buffer 27.
  • control unit 22 can also control the transmission unit 24 to transmit the decoding result in the at least one decoding buffer 27 to the merging unit 23, thereby achieving the control merging unit 23 merging the data records of the Nth layer to the N+jth layer.
  • N, j is a non-negative integer.
  • N, j refer to the description of the foregoing embodiment, and details are not described herein again.
  • the decoding unit 26 is a functional module in the FPGA-based combiner, and mainly performs decoding processing on the data record in the corresponding input buffer 25 under the control of the control unit 22, and outputs the decoding result to the corresponding decoding buffer 27 Inside.
  • the transmission unit 24 is a data channel in the FPGA-based combiner, and is mainly responsible for the data transfer logic inside the combiner.
  • the transmission unit 24 may, under the control of the control unit 22, read the new decoding result from the at least one decoding buffer and transmit it to the merging unit 23 for the merging unit 23, each time the merging unit 23 completes the current merging process.
  • the new decoding result is subjected to the merging process, and when the result of the current merging process needs to retain the decoding result, the decoding result to be retained is stored as the data to be encoded in the encoding buffer 29.
  • the merging unit 23 is a functional module in the FPGA-based combiner, and can receive the decoding result transmitted by the transmission unit 24, and combine the decoding results. In addition, the merging unit 23 also feeds back the merge processing result to the control unit 22, for example, whether the current merging process is completed and whether there is a decoding result or the like that needs to be reserved, so that the control unit 22 can perform the splicing processing result fed back by the merging unit 23 on the transmission unit. Control accordingly.
  • the coding unit 28 is a functional module in the FPGA-based combiner, and corresponds to the decoding unit 26, and is mainly used for encoding the data to be encoded in the code buffer 29 under the control of the control unit 22 to obtain a new one.
  • the data of the N+jth layer is recorded and stored in the storage unit 21, so that the database system replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the new N+j layer data record, thereby Achieve the purpose of data consolidation.
  • the database system can know the number of input buffers 25 included in the combiner through the API interface of the combiner.
  • the database system may pre-divide the data records of the Nth layer to the N+jth layer to be merged into at least one input buffer according to the number of input buffers 25 included in the combiner.
  • At least one data record group corresponding to the area 25, and the related information of the data record group that should be buffered by each input buffer 25 is carried in the data merge instruction to the control unit 22.
  • the information about the data record group may be different according to the merge requirement, for example, the identifier of the data record in the data record group, the offset address, the snapshot version number, and the like.
  • Each data record set includes at least one data record.
  • the control unit receives data merge instructions from the database system, which is the database system in which the FPGA-based combiner resides.
  • the control unit loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner according to the data merge instruction.
  • the data records of the Nth layer to the N+jth layer include at least one data record group corresponding to at least one input buffer.
  • the control unit reads the data records in the at least one data record group from the storage unit and caches them in the corresponding input buffer according to the corresponding relationship between the data record group and the input buffer.
  • the control unit controls the decoding unit to perform a decoding operation on the data record in the corresponding input buffer according to the correspondence between the decoding unit and the input buffer, and outputs the decoding result to the corresponding decoding buffer.
  • control unit further controls the transmission unit to read the decoding result from the at least one input buffer and transmit the result to the merging unit for the merging unit to perform the merging process.
  • the merging unit may return the merge processing result to the control unit when the merging process is completed. Based on this, the control unit may know whether the merging unit completes the current merging process, and when determining that the merging unit completes the current merging process, the control transmitting unit reads the decoding result from the at least one input buffer and transmits the decoding result to the merging unit for merging unit
  • the merging process is continued, and when the current merging process result needs to retain the decoding result, the control transmission unit stores the decoding result that needs to be retained as the data to be encoded into the encoding buffer.
  • the transmission unit may, under the control of the control unit, read the new decoding result from the at least one decoding buffer and transmit the result to the merging unit after the merging unit completes the current merging process.
  • control unit further controls the encoding unit to encode the data to be encoded in the encoding buffer to obtain a new data record of the N+jth layer and store it in the storage unit for the database system to utilize the new Nth
  • the +j layer data record replaces the Nth to Nth jth layer data records that need to be merged in the database system.
  • control unit needs to control the decoding unit, the transmission unit, the merging unit, and the encoding unit to perform corresponding operations, thereby completing the merging process of the data records of the Nth layer to the N+jth layer.
  • the present embodiment does not limit the control logic that the control unit controls the decoding unit, the transmission unit, the merging unit, and the encoding unit to perform the corresponding operations. These control logics can be independent of each other or can cooperate with each other.
  • each data record set includes at least one data record block, each data record block including at least one data record interval, each data record interval including at least one data record.
  • the control unit buffers one data record block to the input buffer each time in units of data recording blocks; for the decoding unit, it can be in units of data recording intervals, each time from the corresponding input A data record interval is read in the buffer for decoding processing.
  • the control unit may A new data record block is read in the data record group and buffered into the corresponding input buffer.
  • the control unit may read from the input buffer corresponding to the first data record group according to the offset of the data block interval (the interval offset shown in FIG. 5b).
  • a new data recording interval is sent to the corresponding decoding unit.
  • the first data record group is any one of the at least one data record group.
  • the encoded data record may include a field of a keyword prefix length, a keyword suffix length, a keyword suffix, a data value length, and a data value.
  • the decoding unit can decode the keyword prefix length, the keyword suffix length, the keyword suffix, the data value length, and the data value of the data record from the data record;
  • the keyword prefix length, the keyword suffix length, the keyword suffix, and the keyword of the previous keyword are spliced out to obtain a decoding result, where the decoding result includes: a length of the keyword of the data record, and a data value Length, keywords and data values for this data record.
  • the coding unit can encode the key length, the data value length, the keyword, and the data value in the data to be encoded (ie, the decoding result to be retained) using the character stream to obtain a new data record of the N+jth layer.
  • the decoding unit and the encoding unit are added, and the combined processing of the encoded data records can be supported, and the encoding operation can reduce the data amount of the data recording, which is beneficial to saving storage resources such as memory and disk.
  • FIG. 6a is a schematic structural diagram of an FPGA-based combiner with a compression function according to still another exemplary embodiment of the present application.
  • the FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic.
  • the FPGA-based combiner includes: a storage unit 21, a control unit 22, a merging unit 23, a transmission unit 24, at least one input buffer 25, at least one decoding unit 26, at least one decoding buffer 27, The encoding unit 28, the encoding buffer 29, the output buffer 201, and the compression unit 202.
  • each decoding unit 26 is responsible for decoding the data record buffered by one input buffer 25 and outputting the decoding result to the corresponding decoding buffer.
  • area 27 there is a one-to-one correspondence between the decoding unit 26, the input buffer 25 and the decoding buffer 27, that is, each decoding unit 26 is responsible for decoding the data record buffered by one input buffer 25 and outputting the decoding result to the corresponding decoding buffer.
  • the difference between the embodiment shown in Fig. 6a and the embodiment shown in Fig. 5a is that the output buffer 201 and the compression unit 202 are added.
  • the output buffer 201 is mainly used to buffer the encoded data record output by the encoding unit 28, that is, the new N+j layer data record.
  • the control unit 22 can control the compression unit 202 to compress the encoded data records and output the compression results to the storage unit 21.
  • the data record of the new N+jth layer in the output buffer 201 can be compressed under the control of the control unit 22, and the compression result is output to the storage unit 21.
  • the compression of the encoded data record by the compression unit 202 can reduce the occupation of the storage resources of the storage unit 21 and reduce the bandwidth resources consumed by the data transmission between the processor and the combiner.
  • the input buffer, the decoding buffer, the encoding buffer, and the output buffer may be implemented by using dual-port RAM, and one end is sequentially written, and the other end is sequentially read to improve data reading and writing efficiency.
  • the decoding buffer and the encoding buffer may use a ring buffer (Ring Buffer).
  • the input buffer and output buffer are sized to buffer two data record blocks with a bit width of 64 bits and a theoretical read bandwidth of 2.4 GB/s at 300 MHz.
  • the FPGA-based combiner is composed of a functional module, a control module, and a storage module.
  • the function module can be implemented by DSP and LUT resources on the FPGA chip, and the memory module can be implemented by BRAM resources on the FPGA chip.
  • the execution status of each functional module is managed by the corresponding control module and can be executed in a pipeline mode, which is beneficial to improving the utilization efficiency of the FPGA chip.
  • the FPGA-based combiner provided by the embodiments of the present application can be applied to various database systems, for example, can be applied to LevelDB or RocksDB.
  • the working process of the combiner provided by the embodiment of the present application is described in detail by taking LevelDB or RocksDB as an example.
  • LevelDB or RocksDB is a KV-based database based on log delta storage, which actually stores a series of KV records.
  • LevelDB or RocksDB when a KV record needs to be written, the KV record is first written into a log file; when the log file is successfully written, the KV record is written into the memory memtable file; When the size of the memtable file reaches a certain value, the memtable file is converted into an immutable memtable file, and then the key (Key) of the KV record in the immutable memtable file is traversed from small to large, and sequentially written to a level_0 layer of the disk.
  • the immutable memtable file is a multi-level queue SkipList in which KV records are ordered according to Key. Storing most of the KV records to disk using tiered storage reduces the consumption of memory resources and enables persistent storage.
  • the KV records in each SST file are stored in the order of Key from small to large, and the Key range between different SST files except the SST file under level_0 (between the minimum key and the maximum key in the SST file) There will be no overlap. Because the level_0 file comes directly from memory, the key range of any two SST files under level_0 may overlap.
  • the KV records in the immutable memtable file can be merged when the key (Key) recorded by the KV in the immutable memtable file is traversed from small to large and sequentially written into a new SST file on the level_0 layer of the disk.
  • the key (Key) recorded by the KV in the immutable memtable file is traversed from small to large and sequentially written into a new SST file on the level_0 layer of the disk.
  • the SST file under the level_L and the SST file at the higher level level_L+1 may be merged.
  • the key range can be selected next to file B of file A for merging, so that each file has the opportunity to merge and merge the higher level files.
  • all the files whose key range overlaps with the file A in the Key range such as files B, C, and D, may be selected from the files in the level_L+1 layer. And merge all files with file A.
  • the processor may sort the KV records in the files A, B, C, and D according to the Key from small to large, and divide the corresponding input buffers according to the number of input buffers included in the FPGA-based combiner.
  • the KV records the group and then notifies the FPGA-based combiner.
  • the combiner reads each KV record set from memory and stores it in the memory unit (DDR) of the combiner.
  • DDR memory unit
  • the KV records in files A, B, C, and D are divided into four groups, corresponding to Way0 to Way3, respectively. In each way, the KV records are sorted in ascending order by Key and version number, and the range of Key values between any two paths may overlap.
  • control unit controls the input buffer to read the next pending KV Block from the DDR according to the offset address of the next KV block.
  • the control unit here can be implemented as a load controller.
  • the control unit reads the next KV interval from the input buffer according to the offset address of the KV interval in the input buffer, and sends it to the decoding unit.
  • the control unit here can be implemented as a Decoder Controller.
  • the decoding unit decodes the KV record in the KV interval and outputs the decoded result to the decoding buffer.
  • the control unit controls the transmission unit to transmit the decoding result corresponding to the minimum key from the corresponding decoding buffer to the encoding buffer according to the minimum key fed back by the merging unit, and controls the transmission unit to continue to read from the four decoding buffers respectively.
  • a decoding result (KV0, KV1, KV2, KV3 as shown in FIG. 6b) is taken and supplied to the merging unit for the merging unit to continue the merging process.
  • the four decoding results KV0, KV1, KV2, and KV3 transmitted by the transmission unit are received, and the minimum key in the previous merging process is compared with the Key in the four decoding results, and is sent to the control unit. Feedback minimum Key.
  • the control unit here can be implemented as a Compaction Controller.
  • control unit may read the next decoding result to be encoded from the encoding buffer after the encoding unit completes encoding of the decoding result, and send the decoding result to the encoding unit.
  • the control unit here can be implemented as an encoder controller.
  • the entire data merging process is divided into three stages of decoding (Decoder), comparison merging (Compaction) and encoding (Encoder), and solidifying certain computing resources and data buffer resources for each functional module on the FPGA.
  • the various stages are executed in a pipeline by the control unit, which greatly improves the efficiency of the data merge process.
  • the CPU resources occupied by the data merge operation are released, the overall performance of the database is improved, and the performance jitter problem is improved.
  • there is no need to modify the triggering conditions of the merge operation so there is no special requirement for the application scenario, and it can be applied to different load scenarios.
  • execution bodies of the steps of the method provided by the foregoing embodiments may all be the same device, or the method may also be performed by different devices.
  • the execution body of steps 20a to 22a may be device A; for example, the execution body of steps 20a and 21a may be device A, the execution body of step 22a may be device B, and the like.
  • the embodiment of the present application further provides a computer readable storage medium storing a computer program, which can implement the steps performed by the control unit in the foregoing method embodiment when the computer program is executed.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.

Abstract

Provided are a data merging method, an FPGA-based merger, and a database system. The FPGA-based merger (30) is applied to a database system (100), and is responsible for carrying out merging processing on data records in the database system (100), thereby reducing the rate to which a data merging operation occupies CPU resources in the database system (100), reducing the impact on the write-in and query performances of the database system (100), improving the overall capability of the database system (100), and eliminating the problem of performance jitter.

Description

数据合并方法、基于FPGA的合并器及数据库系统Data merge method, FPGA-based combiner and database system
本申请要求2018年03月01日递交的申请号为201810172456.4、发明名称为“数据合并方法、基于FPGA的合并器及数据库系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 20110117 245 6.4, filed on March 1, 2018, entitled "Data merging method, FPGA-based merging and database system", the entire contents of which are incorporated herein by reference. in.
技术领域Technical field
本申请涉及数据库技术领域,尤其涉及一种数据合并方法、基于FPGA的合并器及数据库系统。The present application relates to the field of database technologies, and in particular, to a data merging method, an FPGA-based merging device, and a database system.
背景技术Background technique
随着互联网和大数据应用的兴起,非关系型数据库(Not Only SQL,NoSQL)得到迅速发展。在非关系型数据库中,有一些基于日志增量存储的键值对(Key-Value,KV)型的数据库,例如LevelDB和基于LevelDB演变的RocksDB。With the rise of the Internet and big data applications, non-relational databases (Not Only SQL, NoSQL) have developed rapidly. In non-relational databases, there are some key-value pair (KV)-based databases based on log delta storage, such as LevelDB and RocksDB based on LevelDB evolution.
在LevelDB或RocksDB中,采用分层存储方式将大部分KV记录存储到磁盘中,这可以减少内存资源消耗,实现持久化存储。但是,读取KV记录需要在内存和磁盘上各层级数据文件中依照KV记录的新鲜程度依次查找,比较复杂,查找速度较慢。In LevelDB or RocksDB, most of the KV records are stored on disk using tiered storage, which reduces memory resource consumption and enables persistent storage. However, reading KV records needs to be searched in order according to the freshness of KV records in the data files of each level in the memory and disk, which is more complicated and slower to find.
为了加快读取KV记录的速度,现有技术采取合并(compaction)方式对已有KV记录进行整理压缩,去除一些无效KV记录,通过减少文件数量来降低查询复杂度,提高查询效率。但是,现有数据合并过程会降低数据库系统的写入和查询性能。In order to speed up the reading of KV records, the prior art adopts a compaction method to sort and compress existing KV records, remove some invalid KV records, reduce the query complexity and reduce the query efficiency by reducing the number of files. However, the existing data merge process reduces the write and query performance of the database system.
发明内容Summary of the invention
本申请的多个方面提供一种数据合并方法、基于FPAG的合并器及数据库系统,用以降低数据合并过程对数据库系统的写入和查询性能的影响。Aspects of the present application provide a data merge method, a FPAG-based combiner, and a database system to reduce the impact of the data merge process on the write and query performance of the database system.
本申请实施例提供一种基于FPGA的合并器,包括:控制单元、存储单元以及合并单元;An embodiment of the present application provides an FPGA-based combiner, including: a control unit, a storage unit, and a merging unit;
所述控制单元,用于根据数据库系统的数据合并指令,将所述数据库系统中需要合并的第N层至第N+j层的数据记录加载至所述存储单元,以及控制所述合并单元对所述需要合并的第N层至第N+j层的数据记录进行合并处理;其中,N是非负整数,j是非负整数;The control unit is configured to load data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit according to a data merge instruction of the database system, and control the merged unit pair The data records of the Nth layer to the N+jth layer to be merged are subjected to a merge process; wherein N is a non-negative integer and j is a non-negative integer;
所述合并单元,用于对所述需要合并的第N层至第N+j层的数据记录进行合并处理, 以获得新的第N+j层的数据记录并存储至所述存储单元中,以供所述数据库系统利用所述新的第N+j层的数据记录替换所述数据库系统中所述需要合并的第N层至第N+j层的数据记录。The merging unit is configured to perform a merging process on the data records of the Nth layer to the N+thth layer that need to be merged to obtain a new N+j layer data record and store the data record in the storage unit. And replacing, by the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the data record of the new N+j layer.
本申请实施例还提供一种数据合并方法,适用于基于FPGA的合并器,该方法包括:The embodiment of the present application further provides a data merging method, which is applicable to an FPGA-based combiner, and the method includes:
根据数据库系统的数据合并指令,将所述数据库系统中需要合并的第N层至第N+j层的数据记录加载至所述基于FPGA的合并器的存储单元中;其中,N是非负整数,j是非负整数;Loading, according to the data merge instruction of the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system into the storage unit of the FPGA-based combiner; wherein N is a non-negative integer, j is a non-negative integer;
对所述需要合并的第N层至第N+j层的数据记录进行合并处理,以获得新的第N+j层的数据记录并存储至所述存储单元中,以供所述数据库系统利用所述新的第N+j层的数据记录替换所述数据库系统中所述需要合并的第N层至第N+j层的数据记录。Performing a merge process on the data records of the Nth layer to the N+jth layer that need to be merged to obtain a new data record of the N+jth layer and storing the data record in the storage unit for use by the database system The data record of the new N+j layer replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system.
本申请实施例还提供一种数据库系统,包括:存储器、处理器以及基于FPGA的合并器;The embodiment of the present application further provides a database system, including: a memory, a processor, and an FPGA-based combiner;
所述存储器,用于存储计算机程序以及所述数据库系统中的至少两层数据记录;The memory is configured to store a computer program and at least two layers of data records in the database system;
所述处理器与所述存储器和所述合并器耦合,用于执行所述计算机程序,以用于:The processor is coupled to the memory and the combiner for executing the computer program for:
从所述至少两层数据记录中识别出需要合并的第N层至第N+j层数据记录;向所述基于FPGA的合并器发送数据合并指令,以指示所述基于FPGA的合并器对所述需要合并的第N层至第N+j层的数据记录进行合并处理;以及利用所述基于FPGA的合并器输出的新的第N+j层的数据记录替换所述存储器中需要合并的第N层至第N+j层的数据记录;其中,N是非负整数,j是非负整数;Identifying an Nth layer to an N+jth layer data record to be merged from the at least two layers of data records; transmitting a data merge instruction to the FPGA based combiner to indicate the FPGA based combiner pair Data records of the Nth layer to the N+jth layer to be merged are subjected to a merge process; and a data record of the new N+jth layer output by the FPGA-based combiner is replaced with a block to be merged in the memory Data records from the Nth layer to the N+jth layer; wherein N is a non-negative integer and j is a non-negative integer;
所述基于FPGA的合并器,用于接收所述数据合并指令,根据所述数据合并指令,对所述需要合并的第N层至第N+j层数据记录进行合并处理,以获得所述新的第N+j层的数据记录并输出给所述处理器。The FPGA-based combiner is configured to receive the data merge instruction, and perform a merge process on the Nth layer to the N+jth layer data record that needs to be merged according to the data merge instruction to obtain the new The data of the N+jth layer is recorded and output to the processor.
在本申请实施例中,基于FPGA实现一种合并器,并将基于FPGA的合并器应用于数据库系统中,负责对数据库系统中需要合并的第N层至第N+j层的数据记录进行合并处理,降低数据合并操作对数据库系统中CPU资源的占用率,降低对数据库系统的写入和查询性能的影响,提高数据库系统的整体能力,改善性能抖动问题。In the embodiment of the present application, a merger is implemented based on the FPGA, and the FPGA-based combiner is applied to the database system, and is responsible for merging the data records of the Nth layer to the N+j layer that need to be merged in the database system. Processing, reducing the data mining operation on the CPU resources in the database system, reducing the impact on the database system write and query performance, improve the overall capacity of the database system, improve performance jitter.
附图说明DRAWINGS
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the present application, and are intended to be a part of this application. In the drawing:
图1a为本申请一示例性实施例提供的一种数据库系统的结构示意图;FIG. 1 is a schematic structural diagram of a database system according to an exemplary embodiment of the present application;
图1b为本申请一示例性实施例提供的磁盘上的数据文件形成的层级结构的示意图;1b is a schematic diagram of a hierarchical structure of data file formation on a disk according to an exemplary embodiment of the present disclosure;
图1c为本申请一示例性实施例提供的内存和磁盘上的数据文件形成的层级结构的示意图;1c is a schematic diagram of a hierarchical structure formed by a data file on a memory and a disk according to an exemplary embodiment of the present application;
图1d为本申请一示例性实施例提供的从基于FPGA的合并器的角度描述的数据合并方法的流程示意图;1d is a schematic flowchart of a data merging method described from the perspective of an FPGA-based combiner according to an exemplary embodiment of the present application;
图2a为本申请另一示例性实施例提供的一种基于FPGA的合并器的结构示意图;2a is a schematic structural diagram of an FPGA-based combiner according to another exemplary embodiment of the present application;
图2b为本申请另一示例性实施例提供的基于图2a所示合并器的数据合并方法的流程示意图;FIG. 2b is a schematic flowchart diagram of a data merging method based on the combiner shown in FIG. 2a according to another exemplary embodiment of the present application; FIG.
图3a为本申请又一示例性实施例提供的另一种基于FPGA的合并器的结构示意图;3a is a schematic structural diagram of another FPGA-based combiner provided by another exemplary embodiment of the present application;
图3b为本申请又一示例性实施例提供的基于图3a所示合并器的数据合并方法的流程示意图;FIG. 3b is a schematic flowchart diagram of a data merging method based on the combiner shown in FIG. 3a according to another exemplary embodiment of the present application; FIG.
图4a为本申请又一示例性实施例提供的带有片内缓存功能的基于FPGA的合并器的结构示意图;4a is a schematic structural diagram of an FPGA-based combiner with an on-chip cache function according to still another exemplary embodiment of the present application;
图4b为本申请又一示例性实施例提供的基于图4a所示合并器的数据合并方法的流程示意图;FIG. 4b is a schematic flowchart diagram of a data merging method based on the merging device shown in FIG. 4a according to another exemplary embodiment of the present application; FIG.
图5a为本申请又一示例性实施例提供的带有编解码功能的基于FPGA的合并器的结构示意图;5a is a schematic structural diagram of an FPGA-based combiner with a codec function according to still another exemplary embodiment of the present application;
图5b为本申请又一示例性实施例提供的一种数据记录层次结构的示意图;FIG. 5b is a schematic diagram of a data record hierarchy provided by another exemplary embodiment of the present application; FIG.
图6a为本申请又一示例性实施例提供的带有压缩功能的基于FPGA的合并器的结构示意图;6a is a schematic structural diagram of an FPGA-based combiner with a compression function according to still another exemplary embodiment of the present application;
图6b为本申请又一示例性实施例提供的数据库系统LevelDB或RocksDB的实现结构。FIG. 6b is an implementation structure of a database system LevelDB or RocksDB according to another exemplary embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions of the present application will be clearly and completely described in the following with reference to the specific embodiments of the present application and the corresponding drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
在采用分层存储方式的数据库系统中,可以采取数据合并方式去除一些无效数据记 录,这有利于提高查询效率,但是会降低数据库系统的写入和查询性能。针对该技术问题,本申请实施例提供一种解决方案,主要思路是:基于FPGA实现一种合并器,将该基于FPGA实现的合并器应用于数据库系统,由该合并器对数据库系统中需要合并的相邻几层的数据记录进行合并处理,减少数据合并操作对数据库系统的CPU资源的占用率,降低对数据库系统的写入和查询性能的影响,提高数据库系统的整体能力,改善性能抖动问题。In a database system with tiered storage, data merge can be used to remove some invalid data records, which is beneficial to improve query efficiency, but will reduce the write and query performance of the database system. For the technical problem, the embodiment of the present application provides a solution, and the main idea is: implementing a combiner based on the FPGA, applying the FPGA-based combiner to the database system, and the merger needs to merge in the database system The data records of adjacent layers are merged to reduce the occupancy rate of the CPU resources of the database system by the data merge operation, reduce the impact on the write and query performance of the database system, improve the overall capacity of the database system, and improve the performance jitter problem. .
值得说明的一点是,为简化描述,在本申请下述实施例的有些描述中会将基于FPGA的合并器简称为合并器,本领域技术人员可以理解,本申请各实施例中的“合并器”与“基于FPGA的合并器”是同一概念。It should be noted that, in order to simplify the description, the FPGA-based combiner will be referred to as a combiner in some descriptions of the following embodiments of the present application, and those skilled in the art can understand that the "combiner" in the embodiments of the present application "The same concept as "FPGA-based merger".
以下结合附图,详细说明本申请各实施例提供的技术方案。The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
图1a为本申请一示例性实施例提供的一种数据库系统的结构示意图。如图1a所示,该数据库系统100包括:存储器10、基于FPGA的合并器20以及处理器30。存储器10分别与处理器30和基于FPGA的合并器20连接。FIG. 1 is a schematic structural diagram of a database system according to an exemplary embodiment of the present application. As shown in FIG. 1a, the database system 100 includes a memory 10, an FPGA-based combiner 20, and a processor 30. The memory 10 is coupled to the processor 30 and the FPGA-based combiner 20, respectively.
存储器10主要用作数据库系统100的存储空间,可以包括至少一块存储介质,至少一块存储介质可以是相同类型的存储介质,也可以是不同类型的存储介质。例如,存储器10可以包括易失性的存储介质,例如RAM,也可以包括非易失性的存储介质,例如只读存储器(Read-Only Memory,ROM)、闪存等。如图1a所示,存储器10主要包括内存和磁盘,内存一般采用易失性的存储介质实现,而磁盘一般采用非易失性的存储介质实现。The memory 10 is mainly used as a storage space of the database system 100, and may include at least one storage medium, and at least one storage medium may be the same type of storage medium or a different type of storage medium. For example, the memory 10 may include a volatile storage medium such as a RAM, and may also include a non-volatile storage medium such as a Read-Only Memory (ROM), a flash memory, or the like. As shown in FIG. 1a, the memory 10 mainly includes a memory and a magnetic disk. The memory is generally implemented by a volatile storage medium, and the magnetic disk is generally implemented by a non-volatile storage medium.
存储器10可以存储与数据库系统100相关的各种数据,例如需要数据库系统100存储的数据记录、数据库系统100的操作系统(OS)、数据库系统100上运行的各种计算机程序以及程序数据等等。The memory 10 can store various data associated with the database system 100, such as data records stored by the database system 100, an operating system (OS) of the database system 100, various computer programs running on the database system 100, program data, and the like.
在数据库系统100中,采用分层存储方式将数据记录存储到存储器10中。在存储器10中存在至少两层数据记录。可选地,存储器10中可以包括至少两个层级的数据文件,每个层级可以包括至少一个数据文件,每个数据文件中存储该文件所属层级的部分或全部数据记录。In the database system 100, data records are stored in the memory 10 in a hierarchical storage manner. There are at least two layers of data records in the memory 10. Optionally, at least two levels of data files may be included in the memory 10, and each level may include at least one data file, and each data file stores some or all of the data records of the level to which the file belongs.
在数据库系统100中,当写入数据记录时,先将该数据记录写入磁盘上的日志(log)文件中;当日志文件写入成功后,再将该数据记录写入内存中;当内存空间占用率达到一定界限后,再将内存中的数据记录导出到磁盘上一个新的数据文件中。其中,磁盘上的数据文件是一种层级结构,例如第一层(最靠近内存的一层)为Level_0,第二层为 Level_1,依次类推,层级逐渐增高。In the database system 100, when writing a data record, the data record is first written into a log file on the disk; when the log file is successfully written, the data record is written into the memory; After the space occupancy reaches a certain limit, the data records in the memory are exported to a new data file on the disk. The data file on the disk is a hierarchical structure. For example, the first layer (the layer closest to the memory) is Level_0, the second layer is Level_1, and so on, and the level is gradually increased.
在采用分层存储方式的应用场景中,读取数据记录时需要在内存以及磁盘上各层级数据文件中依照数据记录的新鲜程度依次查找,比较复杂,查找速度较慢。为了加速读取数据记录的速度,可以采用合并(compaction)方式对已有数据记录进行整理压缩,去除一些无效数据记录,通过减少数据记录的数量,达到减少各层级数据文件的数量的目的,从而降低查询复杂度,提高查询效率。In the application scenario where the tiered storage mode is used, the data records need to be searched in order according to the freshness of the data records in the memory and the data files on the disk. The search speed is relatively slow and the search speed is slow. In order to speed up the reading of the data record, the existing data record can be collated and compressed by using a compaction method to remove some invalid data records, thereby reducing the number of data records and reducing the number of data files of each level. Reduce query complexity and improve query efficiency.
在本实施例中,存储器10中还存储有与数据合并流程相关的计算机程序,处理器30执行该计算机程序后可配合基于FPGA的合并器20实现一种新的数据合并方案。In this embodiment, a memory program associated with the data merge process is also stored in the memory 10, and the processor 30 executes the computer program to implement a new data merge scheme in conjunction with the FPGA-based combiner 20.
在本实施例中,处理器30执行存储器10中存储的与数据合并流程相关的计算机程序,可从至少两层数据记录中识别出需要合并的第N层至第N+j层数据记录,向基于FPGA的合并器20发送数据合并指令,以指示基于FPGA的合并器20对需要合并的第N层至第N+j层的数据记录进行合并处理;以及在基于FPGA的合并器20对需要合并的第N层至第N+j层的数据记录进行合并处理并输出新的第N+j层的数据记录之后,还可以利用基于FPGA的合并器20输出的新的第N+j层的数据记录替换存储器10中需要合并的第N层至第N+j层的数据记录。其中,N是非负整数,例如,可以是0,1,2,3等;j是非负整数,例如可以是0,1,2,3等。In this embodiment, the processor 30 executes a computer program related to the data merge process stored in the memory 10, and can identify the Nth layer to the N+jth layer data record to be merged from at least two layers of data records. The FPGA-based combiner 20 sends a data merge instruction to instruct the FPGA-based combiner 20 to merge the data records of the Nth to N+thth layers that need to be merged; and the FPGA-based combiner 20 pairs need to be merged After the data records of the Nth layer to the N+jth layer are merged and the data record of the new N+jth layer is output, the new N+j layer data output by the FPGA-based combiner 20 can also be utilized. The data records of the Nth layer to the N+jth layer that need to be merged in the memory 10 are recorded. Where N is a non-negative integer, for example, may be 0, 1, 2, 3, etc.; j is a non-negative integer, for example, may be 0, 1, 2, 3, and the like.
在一些应用场景中,可能仅需针对存储器10所包含的磁盘上的数据记录进行合并处理,则本申请实施例中所述的至少两层数据记录主要包括存储于磁盘上的各层数据记录。如图1b所示,内存中包括数据文件a,磁盘上按照层级结构存储有数据文件level_0、数据文件level_1、数据文件level_2、……、数据文件level_n,则在仅需针对磁盘上的数据记录进行合并处理的场景中,可将数据文件level_0视为第0层,数据文件level_1视为第1层,数据文件level_2层视为第2层,以此类推,数据文件level_n视为第n层,即本申请实施例中所述的至少两层数据记录主要包括数据文件level_0、数据文件level_1、数据文件level_2、……、以及数据文件level_n中存储的数据记录;换句话说,处理器30只需从数据文件level_0、数据文件level_1、数据文件level_2、……以及数据文件level_n中识别需要合并的第N层至第N+j层的数据记录即可。其中,N+j≤n,n是非负整数。In some application scenarios, the data records on the disks included in the memory 10 may be merged. The at least two layers of data records described in the embodiments of the present application mainly include data records of the layers stored on the disk. As shown in FIG. 1b, the data file a is included in the memory, and the data file level_0, the data file level_1, the data file level_2, ..., the data file level_n are stored in the hierarchical structure on the disk, and only the data records on the disk are needed. In the merged scene, the data file level_0 can be regarded as the 0th layer, the data file level_1 is regarded as the first layer, the data file level_2 layer is regarded as the second layer, and so on, and the data file level_n is regarded as the nth layer, that is, The at least two layers of data records described in the embodiments of the present application mainly include data files level_0, data file level_1, data files level_2, ..., and data records stored in the data file level_n; in other words, the processor 30 only needs to The data file level_0, the data file level_1, the data file level_2, ..., and the data file level_n may identify the data records of the Nth layer to the N+jth layer to be merged. Where N + j ≤ n, n is a non-negative integer.
在另一些应用场景中,考虑到内存中的数据记录需要累积到一定数量之后,才会被导出至磁盘上的数据文件中,在该数据记录累积过程中可能出现重叠的数据记录,因此可以对内存中的数据记录进行合并处理。在这种情况下,不仅需要针对存储器10所包含的磁盘上的数据记录进行合并处理,甚至需要针对存储器10所包含的内存上的数据记录 进行合并处理,则本申请实施例中所述的至少两层数据记录主要包括存储于内存以及磁盘上的各层数据记录。其中,内存中存储的数据记录与磁盘上存储的各层数据记录之间形成一种层级结构。如图1c所示,内存中包括数据文件a,磁盘上按照层级结构存储有数据文件level_0、数据文件level_1、数据文件level_2、……、数据文件level_n,且数据文件a视为第0层,数据文件level_0视为第1层,数据文件level_1视为第2层,数据文件level_2层视为第3层,以此类推,数据文件level_n视为第n+1层,使得内存中存储的数据记录与磁盘上存储的各层数据记录之间形成由低到高的层级结构。则在需要针对内存和磁盘上的数据记录进行合并处理的场景中,本申请实施例中所述的至少两层数据记录主要包括数据文件a、数据文件level_0、数据文件level_1、数据文件level_2、……、以及数据文件level_n中存储的数据记录;换句话说,处理器30需要从数据文件a、数据文件level_0、数据文件level_1、数据文件level_2、……以及数据文件level_n中识别需要合并的第N层至第N+j层的数据记录。In other application scenarios, it is considered that the data records in the memory need to be accumulated to a certain amount before being exported to the data files on the disk, and overlapping data records may occur during the accumulation of the data records, so The data records in the memory are merged. In this case, not only the data processing on the disk included in the memory 10 needs to be merged, but also the data records on the memory included in the memory 10 need to be merged. The two-tier data record mainly includes the data records of each layer stored in the memory and on the disk. Among them, the data record stored in the memory forms a hierarchical structure with the data records of each layer stored on the disk. As shown in FIG. 1c, the memory includes a data file a, and the data file level_0, the data file level_1, the data file level_2, ..., the data file level_n are stored in the hierarchical structure on the disk, and the data file a is regarded as the 0th layer, and the data is The file level_0 is treated as the first layer, the data file level_1 is regarded as the second layer, the data file level_2 layer is regarded as the third layer, and so on, and the data file level_n is regarded as the n+1th layer, so that the data record stored in the memory is A low-to-high hierarchical structure is formed between the data records of the layers stored on the disk. In the scenario that the data processing on the memory and the disk is required to be combined, the at least two data records described in the embodiment of the present application mainly include the data file a, the data file level_0, the data file level_1, the data file level_2, ... ..., and the data record stored in the data file level_n; in other words, the processor 30 needs to identify the Nth to be merged from the data file a, the data file level_0, the data file level_1, the data file level_2, ..., and the data file level_n. Data record from layer to N+j layer.
一般来说,j的取值是1,2,3等正整数。但是,在一些应用场景中,j的取值也可以是0。例如,对于在同一层级上可能存在重叠的数据记录的情况,j的取值为0,这意味着可以对同一层的数据记录进行合并处理。例如,在需要对内存中的数据记录进行合并处理时,j=0。又例如,磁盘上处于层级level_0的数据记录直接来自于内存,有可能存在重叠,因此需要对处于层级level_0的数据记录进行合并处理,此时j=0。另外,本申请实施例并不限定N的取值与层级之间的对应关系,例如,可以通过N=0表示第0层,并依次类推,或者也可以通过N=1表示第0层,并依次类推,或者也可以通过N=10表示第0层,并依次类推等。In general, the value of j is a positive integer such as 1, 2, and 3. However, in some application scenarios, the value of j can also be zero. For example, for the case where there may be overlapping data records on the same level, the value of j is 0, which means that the data records of the same layer can be merged. For example, when a data record in memory needs to be merged, j=0. For another example, the data record at the level level_0 on the disk is directly from the memory, and there may be overlap. Therefore, the data records at the level level_0 need to be merged, and j=0. In addition, the embodiment of the present application does not limit the correspondence between the value of N and the level. For example, the layer 0 may be represented by N=0, and so on, or the layer 0 may be represented by N=1. And so on, or N = 10 can also be used to represent the 0th layer, and so on.
在本实施例中,可由不同的事件或条件,触发处理器30从所述的至少两层数据记录中识别出需要合并的第N层至第N+j层数据记录。In this embodiment, the processor 30 may be triggered by different events or conditions to identify the Nth to Nth jth layer data records to be merged from the at least two layers of data records.
示例1:可以设定数据合并周期,每当数据合并周期到达时,可以触发处理器30从至少两层数据记录中识别出需要合并的第N层至第N+j层数据记录。Example 1: A data merge period can be set, and each time the data merge period arrives, the processor 30 can be triggered to identify the Nth layer to the N+jth layer data record to be merged from at least two layers of data records.
可选地,在示例1中,需要合并的第N层至第N+j层数据记录可以是所有各层的数据记录。或者,在示例1中,可由其它条件进一步来确定哪些层的数据记录是需要合并的。例如,当数据合并周期到达时,可以根据各层数据记录的数量是否达到预设的上限值识别出数据记录的数量达到上限值的相邻若干层的数据记录作为需要合并的第N层至第N+j层数据记录。Alternatively, in Example 1, the Nth to Nth jth layer data records that need to be merged may be data records of all the layers. Alternatively, in Example 1, it can be further determined by other conditions which data records of which layers are required to be merged. For example, when the data merge period arrives, the data records of the adjacent layers whose number of data records reaches the upper limit value may be identified as the Nth layer to be merged according to whether the number of data records of each layer reaches a preset upper limit value. To the N+j layer data record.
示例2:当某一层级的数据记录的数量达到设定的上限时,可以触发处理器30从至 少两层数据记录中识别出该层级数据记录以及与该层级相邻的若干个上层层级的数据记录作为需要合并的第N层至第N+j层数据记录。Example 2: When the number of data records of a certain level reaches a set upper limit, the processor 30 can be triggered to identify the hierarchical data record and the data of several upper levels adjacent to the hierarchy from at least two layers of data records. Record as the Nth to N+jth layer data records that need to be merged.
示例3:在存储器10包括内存和磁盘的场景中,数据记录不断被写入内存,则当内存空间占用率达到一定界限后,可以触发处理器30将内存中的数据记录以及磁盘上的第一层级的数据记录作为需要合并的第N层至第N+j层数据记录。Example 3: In the scenario where the memory 10 includes a memory and a disk, the data record is continuously written into the memory, and when the memory space occupancy reaches a certain limit, the processor 30 can be triggered to record the data in the memory and the first on the disk. Hierarchical data records are recorded as the Nth to Nth jth layer data records that need to be merged.
可选地,处理器30可以在数据合并指令中携带与第N层至第N+j层数据记录相关的标识信息,例如可以是层级标识和/或数据记录所在数据文件的标识等,以便于基于FPGA的合并器20可以根据该数据合并指令获知需要对第N层至第N+j层数据记录进行合并处理。Optionally, the processor 30 may carry the identifier information related to the data records of the Nth layer to the N+jth layer in the data merge instruction, for example, may be a layer identifier and/or an identifier of the data file where the data record is located, etc., to facilitate The FPGA-based combiner 20 can learn from the data merge instruction that the Nth layer to the N+thth layer data record needs to be merged.
基于FPGA的合并器20还与处理器30连接,用于接收处理器30发送的数据合并指令,根据该数据合并指令,对需要合并的第N层至第N+j层数据记录进行合并处理,以获得新的第N+j层的数据记录并输出给处理器30,以供处理器30利用新的第N+j层的数据记录替换存储器10中需要合并的第N层至第N+j层的数据记录。其中,第N层至第N+j层的数据记录被替换为新的第N+j层的数据记录,实现了数据记录的合并。The FPGA-based combiner 20 is further connected to the processor 30, and is configured to receive a data merge instruction sent by the processor 30, and perform, according to the data merge instruction, a combination processing of the Nth layer to the N+jth layer data record that needs to be merged, Obtaining a new N+j layer data record and outputting it to the processor 30 for the processor 30 to replace the Nth layer to the N+j in the memory 10 that need to be merged with the new N+j layer data record. Layer data record. The data records of the Nth layer to the N+jth layer are replaced with the data records of the new N+jth layer, and the data record is merged.
可选地,处理器30在识别出需要合并的第N层至第N+j层数据记录之后,可以将需要合并的第N层至第N+j层数据记录加载至存储器10所包含的内存中,以便于基于FPGA的合并器20直接从数据库系统100的内存中读取需要合并的第N层至第N+j层数据记录,提高基于FPGA的合并器20读取数据记录的效率,进而提高数据合并过程的整体效率。可选地,基于FPGA的合并器20可以采用PCIE板卡的形式挂载在数据库系统100中,则基于FPGA的合并器20可以通过PCIE通道从数据库系统100的内存中读取需要合并的第N层至第N+j层数据记录。Optionally, after identifying the Nth to Nth jth layer data records that need to be merged, the processor 30 may load the Nth to Nth jth layer data records that need to be merged into the memory included in the memory 10. In order to facilitate the FPGA-based combiner 20 to directly read the Nth layer to the N+jth layer data record to be merged from the memory of the database system 100, the efficiency of the FPGA-based combiner 20 to read the data record is improved, and further Improve the overall efficiency of the data consolidation process. Alternatively, the FPGA-based combiner 20 can be mounted in the database system 100 in the form of a PCIE board, and the FPGA-based combiner 20 can read the Nth that needs to be merged from the memory of the database system 100 through the PCIE channel. Layer to N+j layer data record.
在本实施例中,数据库系统100中增加基于FPGA的合并器20,且数据合并过程主要由基于FPGA的合并器20完成,可节约处理器30的计算资源,降低处理器30的处理负担,使得处理器30可以更加专注数据记录的写入和查询,实现数据存储(写入和查询)与数据合并的分离,进而降低了数据合并操作对数据写入和查询性能的影响,提高数据库系统100的整体能力,改善性能抖动问题。另外,本实施例可以充分利用FPGA的资源优势,在几乎不影响数据写入和查询性能的情况下,可以做到对相邻两层甚至两层以上(j≥2时)的数据记录进行合并,使得数据合并过程更加灵活,合并效率更高,不受应用场景的限制。In this embodiment, the FPGA-based combiner 20 is added to the database system 100, and the data merge process is mainly performed by the FPGA-based combiner 20, which can save the computing resources of the processor 30 and reduce the processing load of the processor 30, so that The processor 30 can focus more on the writing and querying of data records, realize the separation of data storage (writing and query) and data merging, thereby reducing the impact of data merging operations on data writing and query performance, and improving the database system 100. Overall ability to improve performance jitter issues. In addition, the embodiment can fully utilize the resource advantages of the FPGA, and can merge data records of two adjacent layers or even two layers (j≥2) without affecting data writing and query performance. The data merge process is more flexible, and the merge efficiency is higher, which is not limited by the application scenario.
基于图1a所示数据库系统100,本申请一示例性实施例还提供一种数据合并方法。 该方法主要是从基于FPGA的合并器20的角度进行的描述,如图1d所示,该方法包括:Based on the database system 100 shown in FIG. 1a, an exemplary embodiment of the present application further provides a data merging method. The method is primarily described from the perspective of the FPGA-based combiner 20, as shown in Figure 1d, the method comprising:
101、根据数据库系统的数据合并指令,将数据库系统中需要合并的第N层至第N+j层的数据记录加载至基于FPGA的合并器的存储单元中;其中,N是非负整数,j是非负整数。101. Load, according to the data merge instruction of the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system into the storage unit of the FPGA-based combiner; wherein N is a non-negative integer, and j is a non- Negative integer.
102、对上述需要合并的第N层至第N+j层的数据记录进行合并处理,以获得新的第N+j层的数据记录并存储至存储单元中,以供数据库系统利用新的第N+j层的数据记录替换数据库系统中需要合并的第N层至第N+j层的数据记录。102. Perform a combination process on the data records of the Nth layer to the N+th layer that need to be merged to obtain a new data record of the N+jth layer and store the data in the storage unit for the database system to utilize the new The data record of the N+j layer replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system.
在本实施例中,数据合并指令可由数据库系统中的处理器生成并发送。在本实施例中,基于FPGA的合并器包括存储单元,用于存储从数据库系统中加载过来的需要合并的第N层至第N+j层的数据记录。In this embodiment, the data merge instructions may be generated and transmitted by a processor in the database system. In this embodiment, the FPGA-based combiner includes a storage unit for storing data records of the Nth layer to the N+jth layer that need to be merged loaded from the database system.
可选地,若数据库系统中的处理器在识别出需要合并的第N层至第N+j层数据记录之后,将需要合并的第N层至第N+j层数据记录加载至数据库系统的内存中,则基于FPGA的合并器可以直接从数据库系统的内存中读取需要合并的第N层至第N+j层数据记录,并存储至基于FPGA的合并器的存储单元中。Optionally, if the processor in the database system identifies the data records of the Nth layer to the N+th layer that need to be merged, the data records of the Nth layer to the N+th layer that need to be merged are loaded into the database system. In memory, the FPGA-based combiner can directly read the Nth to Nth jth data records that need to be merged from the memory of the database system and store them in the storage unit of the FPGA-based combiner.
可选地,基于FPGA的合并器的存储单元可以是双倍速率同步动态随机存储器(DoubleDataRateSDRAM,DDR RAM),但不限于此。Optionally, the storage unit of the FPGA-based combiner may be double-rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR RAM), but is not limited thereto.
在本实施例中,基于FPGA的合并器根据数据库系统的数据合并指令,将需要合并的第N层至第N+j层的数据记录加载至自身的存储单元中,然后针对存储单元中的第N层至第N+j层的数据记录进行数据合并处理,将处理器从数据合并操作中解放出来,可节约处理器的计算资源,降低处理器的处理负担,使得处理器可以更加专注数据记录的写入和查询,实现数据存储(写入和查询)与数据合并的分离,进而降低了数据合并操作对数据写入和查询性能的影响,提高数据库系统的整体能力,改善性能抖动问题。另外,本实施例可以充分利用FPGA的资源优势,在几乎不影响数据写入和查询性能的情况下,可以做到对相邻两层甚至两层以上(j≥2时)的数据记录进行合并,使得数据合并过程更加灵活,合并效率更高,不受应用场景的限制。In this embodiment, the FPGA-based combiner loads the data records of the Nth layer to the N+jth layer that need to be merged into its own storage unit according to the data merge instruction of the database system, and then targets the storage unit. The data records of the N layer to the N+j layer are data merged to free the processor from the data merge operation, which can save the computing resources of the processor, reduce the processing load of the processor, and enable the processor to focus more on the data record. The write and query realize the separation of data storage (write and query) and data merge, which reduces the impact of data merge operation on data write and query performance, improves the overall capacity of the database system, and improves performance jitter. In addition, the embodiment can fully utilize the resource advantages of the FPGA, and can merge data records of two adjacent layers or even two layers (j≥2) without affecting data writing and query performance. The data merge process is more flexible, and the merge efficiency is higher, which is not limited by the application scenario.
在本申请实施例中,基于FPGA的合并器可以有多种实现结构,相应地,具有不同实现结构的合并器对第N层至第N+j层数据记录进行合并处理的流程也会有所不同。本申请实施例并不限定基于FPGA的合并器的内部实现结构,凡是能够由FPGA实现并且可以执行图1d所示数据合并方法的合并器结构均适用于本申请实施例。本申请以下实施例给出几种基于FPGA的合并器的内部实现结构,并对具有不同内部实现结构的合并器 的数据合并过程进行详细说明。In the embodiment of the present application, the FPGA-based combiner may have multiple implementation structures, and accordingly, the process of combining the Nth to N+jth data records by the combiner having different implementation structures may also be different. The embodiment of the present application does not limit the internal implementation structure of the FPGA-based combiner. Any combiner structure that can be implemented by the FPGA and can perform the data merge method shown in FIG. 1d is applicable to the embodiment of the present application. The following embodiments of the present application provide an internal implementation structure of several FPGA-based combiners, and a detailed description of the data merge process of a combiner having different internal implementation structures.
图2a为本申请另一示例性实施例提供的一种基于FPGA的合并器的结构示意图。本实施例提供的基于FPGA的合并器可应用于数据库系统中,并与数据库系统中的处理器相配合实现一种新的数据合并逻辑。如图2a所示,该基于FPGA的合并器主要包括:存储单元21、控制单元22和合并单元23。2a is a schematic structural diagram of an FPGA-based combiner according to another exemplary embodiment of the present application. The FPGA-based combiner provided in this embodiment can be applied to a database system and cooperates with a processor in a database system to implement a new data merge logic. As shown in FIG. 2a, the FPGA-based combiner mainly includes a storage unit 21, a control unit 22, and a merging unit 23.
其中,存储单元21主要用作基于FPGA的合并器的存储空间,负责存储与该合并器相关的数据,例如该合并器的配置文件、需要该合并器合并处理的数据记录等。如图2a所示,控制单元22和合并单元23均可访问存储单元21。可选地,存储单元21可以包括位于合并器内部实现的片内存储器,和/或,位于合并器外部实现的片外存储器。在本申请各实施例的图示中,以存储单元21位于合并器外部为例,但并不限于此。The storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like. As shown in FIG. 2a, both the control unit 22 and the merging unit 23 can access the storage unit 21. Alternatively, storage unit 21 may include on-chip memory implemented internal to the combiner, and/or off-chip memory implemented external to the combiner. In the illustration of the embodiments of the present application, the storage unit 21 is located outside the combiner as an example, but is not limited thereto.
控制单元22是基于FPGA的合并器的控制模块,主要实现该合并器的控制逻辑。控制单元22可接收来自基于FPGA的合并器所在的数据库系统的数据合并指令,并可根据该数据合并指令,将该数据库系统中需要合并的第N层至第N+j层的数据记录加载至存储单元21。除此之外,控制单元22还可以控制合并单元23对需要合并的第N层至第N+j层的数据记录进行合并处理。其中,N是非负整数,例如可以是0,1,2,3等;j是非负整数,例如可以是1,2,3等。一般来说,j的取值是1,2,3等正整数。但是,在一些应用场景中,j的取值也可以是0。例如,对于在同一层级上可能存在重叠的数据记录的情况,j的取值为0,这意味着可以对同一层的数据记录进行合并处理。The control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner. The control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction. Storage unit 21. In addition to this, the control unit 22 can also control the merging unit 23 to perform merging processing on the data records of the Nth layer to the N+jth layer that need to be merged. Where N is a non-negative integer, for example, may be 0, 1, 2, 3, etc.; j is a non-negative integer, for example, may be 1, 2, 3, or the like. In general, the value of j is a positive integer such as 1, 2, and 3. However, in some application scenarios, the value of j can also be zero. For example, for the case where there may be overlapping data records on the same level, the value of j is 0, which means that the data records of the same layer can be merged.
合并单元23是基于FPGA的合并器中的功能模块,可在控制单元22的控制下,访问存储单元21中存储的第N层至第N+j层的数据记录,并对第N层至第N+j层的数据记录进行合并处理,以获得新的第N+j层的数据记录并存储至存储单元21中,以供数据库系统利用新的第N+j层数据记录替换数据库系统中需要合并的第N层至第N+j层数据记录,从而达到数据合并的目的。The merging unit 23 is a functional module in the FPGA-based combiner, and under the control of the control unit 22, accesses the data records of the Nth layer to the N+jth layer stored in the storage unit 21, and the Nth layer to the The data records of the N+j layer are merged to obtain a new N+j layer data record and stored in the storage unit 21 for the database system to replace the database system with the new N+j layer data record. The merged Nth to N+thth layer data records are used to achieve the purpose of data merge.
其中,合并单元23对第N层至第N+j层的数据记录进行合并处理主要是将第N层至第N+j层数据记录进行比较,去除重复的或者无效的数据记录,进而获得应该被保留下来的数据记录的过程。根据应用场景或业务需求的不同,合并单元23将第N层至第N+j层数据记录进行比较,去除重复的或者无效的数据记录的过程可能不同。The merging unit 23 combines the data records of the Nth layer to the N+jth layer mainly by comparing the data records of the Nth layer to the N+jth layer, and removing duplicate or invalid data records, thereby obtaining the The process of being retained for data logging. Depending on the application scenario or service requirements, the merging unit 23 compares the Nth to Nth jth layer data records, and the process of removing duplicate or invalid data records may be different.
基于图2a所示的基于FPGA的合并器的内部实现结构,本申请另一示例性实施例还提供一种数据合并方法,该方法描述了图2a所示的基于FPGA的合并器的工作原理。如图2b所示,该方法包括:Based on the internal implementation structure of the FPGA-based combiner shown in FIG. 2a, another exemplary embodiment of the present application further provides a data merging method that describes the operation principle of the FPGA-based combiner shown in FIG. 2a. As shown in Figure 2b, the method includes:
20a、控制单元接收来自于数据库系统的数据合并指令,该数据库系统是基于FPGA的合并器所在的数据库系统。20a. The control unit receives a data merge instruction from a database system, which is a database system in which the FPGA-based combiner is located.
21a、控制单元根据该数据合并指令,将数据库系统中需要合并的第N层至第N+j层的数据记录加载至该基于FPGA的合并器的存储单元。21a. The control unit loads, according to the data merge instruction, the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner.
22a、控制单元控制合并单元对存储单元中存储的第N层至第N+j层的数据记录进行合并处理,以获得新的第N+j层的数据记录并存储至存储单元中,以供数据库系统利用新的第N+j层数据记录替换数据库系统中需要合并的第N层至第N+j层数据记录。22a. The control unit controls the merging unit to combine the data records of the Nth layer to the N+jth layer stored in the storage unit to obtain a new data record of the N+jth layer and store the data record in the storage unit for The database system replaces the Nth to Nth jth layer data records that need to be merged in the database system with the new N+j layer data record.
在本实施例中,控制单元接收来自于数据库系统的数据合并指令,根据该数据合并指令将数据库系统中需要合并的第N层至第N+j层的数据记录加载至该基于FPGA的合并器的存储单元,然后控制合并单元对存储单元中存储的第N层至第N+j层的数据记录进行合并处理。In this embodiment, the control unit receives a data merge instruction from the database system, and loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the FPGA-based combiner according to the data merge instruction. The storage unit then controls the merging unit to merge the data records of the Nth layer to the N+jth layer stored in the storage unit.
本实施例并不限定控制单元控制合并单元对存储单元中存储的第N层至第N+j层数据记录进行合并处理的控制逻辑。This embodiment does not limit the control logic that the control unit controls the merging unit to perform the merging processing of the Nth layer to the N+jth layer data records stored in the storage unit.
例如,可以根据合并单元的能力,例如合并单元每次合并处理最多能够处理的数据记录数以及完成一次合并处理大概需要的时间等信息,设定一处理周期。控制单元可以根据该处理周期,周期性地控制合并单元到存储单元中读取部分数据记录(小于或等于合并单元每次合并处理最多能够处理的数据记录数)或者周期性地从存储单元中读取部分数据记录(小于或等于合并单元每次合并处理最多能够处理的数据记录数)并送入合并单元。对合并单元来说,可在控制单元的控制下,周期性地到存储单元中读取部分数据记录或者接收控制单元周期性地发送的部分数据记录,并基于数据记录的一些属性信息,针对每次读取到的或控制单元发送的部分数据记录进行合并处理,直到针对存储单元中存储的所有数据记录都进行合并处理为止。For example, a processing cycle may be set according to the capabilities of the merging unit, such as the number of data records that the merging unit can process at most for each merging process and the time required to complete a merging process. The control unit may periodically control the merging unit to read the partial data record in the storage unit according to the processing cycle (less than or equal to the maximum number of data records that can be processed by the merging unit each time the merging process) or periodically read from the storage unit. Take a partial data record (less than or equal to the maximum number of data records that the merging unit can process at each merge process) and feed it into the merging unit. For the merging unit, under the control of the control unit, a partial data record may be periodically read into the storage unit or a partial data record periodically sent by the control unit may be received, and based on some attribute information of the data record, The partial data records read by the control unit or sent by the control unit are merged until all the data records stored in the storage unit are merged.
又例如,合并单元每次完成当前数据记录的合并处理后,向控制单元发送合并合并完成通知消息;控制单元可以在接收到合并单元发出的合并完成通知消息时,可以到存储单元中读取数据记录(小于或等于合并单元每次合并处理最多能够处理的数据记录数)并提供给合并单元或者控制合并单元再次到存储单元中读取数据记录(小于或等于合并单元每次合并处理最多能够处理的数据记录数)。对合并单元来说,可在控制单元的控制下,每次从存储单元中读取部分数据记录或者接收控制单元发送的部分数据记录,并基于数据记录的一些属性信息,针对每次读取到的或控制单元发送的部分数据记录进行合并处理,直到针对存储单元中存储的所有数据记录都进行合并处理为止。For another example, the merging unit sends a merge merge completion notification message to the control unit after completing the merging process of the current data record. The control unit may read the data in the storage unit when receiving the merge completion notification message sent by the merging unit. Record (less than or equal to the number of data records that the merging unit can process at most for each merging process) and provide it to the merging unit or control the merging unit to read the data record again in the storage unit (less than or equal to the merging unit can process at most each merging process) Number of data records). For the merging unit, under the control of the control unit, each part of the data record is read from the storage unit or a part of the data record sent by the control unit is received, and based on some attribute information of the data record, for each read The partial data records sent by the control unit or the control unit are merged until all the data records stored in the storage unit are merged.
在本实施例中,由合并器中的控制单元将数据库系统中需要合并的第N层至第N+j层的数据记录加载至该合并器的存储单元中,并控制该合并器中的合并单元访问存储单元的逻辑,从而控制合并单元针对存储单元中存储的第N层至第N+j层的数据记录进行合并处理,并输出合并处理后作为新的N+j层的数据记录。In this embodiment, the data records of the Nth layer to the N+jth layer that need to be merged in the database system are loaded into the storage unit of the combiner by the control unit in the combiner, and the merge in the combiner is controlled. The unit accesses the logic of the storage unit, thereby controlling the merging unit to perform merging processing on the data records of the Nth layer to the N+thth layer stored in the storage unit, and outputting the merged processing as the data record of the new N+j layer.
图3a为本申请又一示例性实施例提供的另一种基于FPGA的合并器的结构示意图。本实施例提供的基于FPGA的合并器可应用于数据库系统中,并可与数据库系统中的处理器相配合实现一种新的数据合并逻辑。如图3a所示,该基于FPGA的合并器包括:存储单元21、控制单元22、合并单元23以及传输单元24。FIG. 3 is a schematic structural diagram of another FPGA-based combiner provided by another exemplary embodiment of the present application. The FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic. As shown in FIG. 3a, the FPGA-based combiner includes a storage unit 21, a control unit 22, a merging unit 23, and a transmission unit 24.
其中,存储单元21主要用作基于FPGA的合并器的存储空间,负责存储与该合并器相关的数据,例如该合并器的配置文件、需要该合并器合并处理的数据记录等。如图3a可知,控制单元22和传输单元24可直接访问存储单元21,而合并单元23不再直接访问存储单元21,而是通过传输单元24来访问存储单元21。The storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like. As can be seen from FIG. 3a, the control unit 22 and the transmission unit 24 can directly access the storage unit 21, and the merging unit 23 no longer directly accesses the storage unit 21, but accesses the storage unit 21 through the transmission unit 24.
控制单元22是基于FPGA的合并器的控制模块,主要实现该合并器的控制逻辑。控制单元22可接收来自基于FPGA的合并器所在的数据库系统的数据合并指令,并可根据该数据合并指令,将该数据库系统中需要合并的第N层至第N+j层的数据记录加载至存储单元21。除此之外,控制单元22还可以控制传输单元24向合并单元23传输存储单元21中存储的需要合并的第N层至第N+j层的数据记录,从而达到控制合并单元23对需要合并的第N层至第N+j层的数据记录进行合并处理的目的。其中,N、j是非负整数。关于N、j的取值可参见前述实施例的描述,在此不再赘述。The control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner. The control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction. Storage unit 21. In addition, the control unit 22 can also control the transmission unit 24 to transfer the data records of the Nth layer to the N+jth layer that need to be merged stored in the storage unit 21 to the merging unit 23, thereby achieving the control merge unit 23 to merge. The data records of the Nth layer to the N+jth layer are subjected to the purpose of the merge processing. Where N and j are non-negative integers. For the values of N and j, refer to the description of the foregoing embodiment, and details are not described herein again.
传输单元24是基于FPGA的合并器中的数据通道,主要负责该合并器内部的数据传输逻辑。例如,传输单元24可在控制单元22的控制下,从存储单元21中读取需要合并的第N层至第N+j层的数据记录,并向合并单元23传输存储单元21中存储的需要合并的第N层至第N+j层的数据记录。The transmission unit 24 is a data channel in the FPGA-based combiner, and is mainly responsible for the data transfer logic inside the combiner. For example, the transmission unit 24 can read the data records of the Nth layer to the N+jth layer to be merged from the storage unit 21 under the control of the control unit 22, and transfer the needs stored in the storage unit 21 to the merging unit 23. Data records of the merged Nth to N+jth layers.
合并单元23是基于FPGA的合并器中的功能模块,可接收传输单元24传输过来的第N层至第N+j层的数据记录,并对第N层至第N+j层的数据记录进行合并处理,以获得新的第N+j层的数据记录并存储至存储单元21中,以供数据库系统利用新的第N+j层数据记录替换数据库系统中需要合并的第N层至第N+j层数据记录,从而达到数据合并的目的。The merging unit 23 is a functional module in the FPGA-based combiner, and can receive the data records of the Nth layer to the N+jth layer transmitted by the transmission unit 24, and perform data records of the Nth layer to the N+jth layer. Merging processing to obtain a new N+j layer data record and storing it in the storage unit 21 for the database system to replace the Nth layer to the Nth in the database system that need to be merged with the new N+j layer data record +j layer data records to achieve the purpose of data consolidation.
基于图3a所示的基于FPGA的合并器的内部实现结构,本申请又一示例性实施例还 提供一种数据合并方法,该方法描述了图3a所示的基于FPGA的合并器的工作原理。如图3b所示,该方法包括:Based on the internal implementation structure of the FPGA-based combiner shown in Figure 3a, yet another exemplary embodiment of the present application also provides a data merging method that describes the operation of the FPGA-based combiner shown in Figure 3a. As shown in Figure 3b, the method includes:
30a、控制单元接收来自于数据库系统的数据合并指令,该数据库系统是基于FPGA的合并器所在的数据库系统。30a. The control unit receives a data merge instruction from a database system, which is a database system in which the FPGA-based combiner is located.
31a、控制单元根据该数据合并指令,将数据库系统中需要合并的第N层至第N+j层的数据记录加载至该基于FPGA的合并器的存储单元。31a. The control unit loads, according to the data merge instruction, the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner.
32a、控制单元控制传输单元从存储单元中读取需要合并的第N层至第N+j层的数据记录并传输给合并单元。32a. The control unit controls the transmission unit to read the data records of the Nth layer to the N+jth layer to be merged from the storage unit and transmit the data records to the merging unit.
33a、合并单元接收传输单元传输过来的第N层至第N+j层的数据记录,并对第N层至第N+j层的数据记录进行合并处理,以获得新的第N+j层的数据记录并存储至存储单元中,以供数据库系统利用新的第N+j层数据记录替换数据库系统中需要合并的第N层至第N+j层数据记录。33a. The merging unit receives the data records of the Nth layer to the N+jth layer transmitted by the transmission unit, and combines the data records of the Nth layer to the N+jth layer to obtain a new N+j layer. The data is recorded and stored in the storage unit for the database system to replace the Nth to Nth jth layer data records in the database system that need to be merged with the new N+j layer data record.
在本实施例中,控制单元接收来自于数据库系统的数据合并指令,根据该数据合并指令将数据库系统中需要合并的第N层至第N+j层的数据记录加载至该基于FPGA的合并器的存储单元,然后控制传输单元向合并单元传输存储单元中存储的第N层至第N+j层的数据记录,使得合并单元可以对第N层至第N+j层的数据记录进行合并处理。In this embodiment, the control unit receives a data merge instruction from the database system, and loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the FPGA-based combiner according to the data merge instruction. The storage unit then controls the transfer unit to transfer the data records of the Nth layer to the N+jth layer stored in the storage unit to the merging unit, so that the merging unit can merge the data records of the Nth layer to the N+jth layer .
本实施例并不限定控制单元控制传输单元向合并单元传输第N层至第N+j层数据记录的控制逻辑。The embodiment does not limit the control logic that the control unit controls the transmission unit to transmit the Nth layer to the N+jth layer data record to the merging unit.
例如,可以根据合并单元的能力,例如合并单元每次合并处理最多能够处理的数据记录数以及完成一次合并处理大概需要的时间等信息,设定一处理周期。控制单元可以根据该处理周期,周期性地控制传输单元到存储单元中读取部分数据记录(小于或等于合并单元每次合并处理最多能够处理的数据记录数)并将读取到的部分数据记录传输给合并单元,以供合并单元针对这部分数据记录进行合并处理。For example, a processing cycle may be set according to the capabilities of the merging unit, such as the number of data records that the merging unit can process at most for each merging process and the time required to complete a merging process. The control unit may periodically control the transmission unit to read the partial data record in the storage unit according to the processing cycle (less than or equal to the maximum number of data records that can be processed by the merging unit per merge process) and read the partial data record. Transfer to the merging unit for the merging unit to merge for this part of the data record.
又例如,合并处理单元每次完成当前数据记录的合并处理后,向控制单元发送合并合并完成通知消息;控制单元可以在接收到合并单元发出的合并完成通知消息时,控制传输单元再次到存储单元中读取部分数据记录(小于或等于合并单元每次合并处理最多能够处理的数据记录数)并将读取到的部分数据记录传输给合并单元,以供合并单元针对这部分数据记录进行合并处理。For another example, the merge processing unit sends a merge merge completion notification message to the control unit each time the merge processing of the current data record is completed; the control unit may control the transfer unit to the storage unit again upon receiving the merge completion notification message sent by the merge unit. Reading a partial data record (less than or equal to the number of data records that the merging unit can process at most for each merging process) and transmitting the read partial data record to the merging unit for the merging unit to merge for the part of the data record .
在本实施例中,增加专门用于数据传输的传输单元,并由该传输单元负责从存储单元中读取数据记录并提供给合并单元,可以简化合并单元的功能,使得合并单元可以更 加专注于数据合并,同时可简化控制单元的控制逻辑,在完成数据合并的同时,可简化基于FPGA的合并器的实现逻辑,提高基于FPGA的合并器进行数据合并的效率。In this embodiment, a transmission unit dedicated to data transmission is added, and the transmission unit is responsible for reading the data record from the storage unit and providing the data record to the merging unit, which can simplify the function of the merging unit, so that the merging unit can be more focused on Data merging can simplify the control logic of the control unit. While completing the data merging, it can simplify the implementation logic of the FPGA-based combiner and improve the efficiency of data merging by the FPGA-based combiner.
图4a为本申请又一示例性实施例提供的带有片内缓存功能的基于FPGA的合并器的结构示意图。本实施例提供的基于FPGA的合并器可应用于数据库系统中,可与数据库系统中的处理器相配合实现一种新的数据合并逻辑。如图4a所示,该基于FPGA的合并器包括:存储单元21、控制单元22、合并单元23、传输单元24以及至少一个输入缓冲区25。4a is a schematic structural diagram of an FPGA-based combiner with an on-chip cache function according to still another exemplary embodiment of the present application. The FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic. As shown in FIG. 4a, the FPGA-based combiner includes a storage unit 21, a control unit 22, a merging unit 23, a transmission unit 24, and at least one input buffer 25.
其中,存储单元21主要用作基于FPGA的合并器的存储空间,负责存储与该合并器相关的数据,例如该合并器的配置文件、需要该合并器合并处理的数据记录等。The storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like.
输入缓冲区25是基于FPGA的合并器的输入缓冲区,可以在控制单元22的控制下缓存存储单元21中存储的第N层至第N+j层的数据记录。如图4a所示,输入缓冲区25可以是一个或多个,且控制单元22和传输单元24可直接访问输入缓冲区25,而合并单元23可通过传输单元24来访问输入缓冲区25。The input buffer 25 is an input buffer of the FPGA-based combiner, and the data records of the Nth layer to the N+jth layer stored in the storage unit 21 can be buffered under the control of the control unit 22. As shown in FIG. 4a, the input buffer 25 may be one or more, and the control unit 22 and the transfer unit 24 may directly access the input buffer 25, and the merging unit 23 may access the input buffer 25 through the transfer unit 24.
控制单元22是基于FPGA的合并器的控制模块,主要实现该合并器的控制逻辑。控制单元22可接收来自基于FPGA的合并器所在的数据库系统的数据合并指令,并可根据该数据合并指令,将该数据库系统中需要合并的第N层至第N+j层的数据记录加载至存储单元21。除此之外,控制单元22还可以将存储单元21中存储的第N层至第N+j层的数据记录缓存到至少一个输入缓冲区25内,并控制传输单元24将至少一个输入缓冲区25中的数据记录传输给合并单元23,从而达到控制合并单元23对第N层至第N+j层的数据记录进行合并处理的目的。其中,N,j是非负整数,关于N,j的取值可参见前述实施例的描述,在此不再赘述。The control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner. The control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction. Storage unit 21. In addition, the control unit 22 may also buffer the data records of the Nth layer to the N+jth layer stored in the storage unit 21 into at least one input buffer 25, and control the transmission unit 24 to input at least one input buffer. The data record in 25 is transferred to the merging unit 23, thereby achieving the purpose of combining the data records of the Nth layer to the N+jth layer by the control merging unit 23. For example, N, j is a non-negative integer. For the value of N, j, refer to the description of the foregoing embodiment, and details are not described herein again.
对任一输入缓冲区25,控制单元22可以根据该输入缓冲区25内是否存在可用空间决定是否往输入缓冲区25中缓存新的数据记录。例如,当传输单元24将该输入缓冲区25内的数据记录传输给合并单元23后,控制单元22可以从片段存储单元21中读取新的数据记录并缓存至该输入缓冲区25内。For any of the input buffers 25, the control unit 22 can determine whether to cache new data records into the input buffer 25 based on whether there is available space in the input buffer 25. For example, after the transfer unit 24 transfers the data record in the input buffer 25 to the merging unit 23, the control unit 22 can read the new data record from the segment storage unit 21 and cache it into the input buffer 25.
可选地,数据库系统(主要是指数据库系统中的处理器)可以通过合并器的API接口,获知该合并器所包含的输入缓冲区25的数量。为了简化该合并器的控制逻辑,数据库系统可根据该合并器所包含的输入缓冲区25的数量,预先将需要合并的第N层至第N+j层的数据记录划分为与至少一个输入缓冲区25对应的至少一个数据记录组,并将每 一个输入缓冲区25应该缓存的数据记录组的相关信息携带在数据合并指令中提供给控制单元22。其中,根据合并需求的不同,数据记录组的相关信息也会有所不同,例如可以是数据记录组中数据记录的标识、偏移地址、快照版本号等。每个数据记录组包括至少一个数据记录。Optionally, the database system (mainly the processor in the database system) can know the number of input buffers 25 included in the combiner through the API interface of the combiner. In order to simplify the control logic of the combiner, the database system may pre-divide the data records of the Nth layer to the N+jth layer to be merged into at least one input buffer according to the number of input buffers 25 included in the combiner. At least one data record group corresponding to the area 25, and the related information of the data record group that should be buffered by each input buffer 25 is carried in the data merge instruction to the control unit 22. The information about the data record group may be different according to the merge requirement, for example, the identifier of the data record in the data record group, the offset address, the snapshot version number, and the like. Each data record set includes at least one data record.
基于上述,控制单元22可将存储单元21中的至少一个数据记录组中的数据记录分别缓存至对应的输入缓冲区25内,并控制传输单元24将至少一个输入缓冲区25中的数据记录传输给合并单元23,从而达到控制合并单元23对第N层至第N+j层的数据记录进行合并处理的目的。Based on the above, the control unit 22 may buffer the data records in the at least one data record group in the storage unit 21 into the corresponding input buffers 25, respectively, and control the transmission unit 24 to transfer the data records in the at least one input buffer 25. The merging unit 23 is given, thereby achieving the purpose of combining the data records of the Nth layer to the N+jth layer by the control merging unit 23.
传输单元24是基于FPGA的合并器中的数据通道,主要负责该合并器内部的数据传输逻辑。例如,传输单元24可在控制单元22的控制下,将至少一个输入缓冲区25中的数据记录传输给合并单元23。例如,传输单元24可在控制单元22的控制下,每次从至少一个输入缓冲区25中分别读取一个数据记录并传输给合并单元23。The transmission unit 24 is a data channel in the FPGA-based combiner, and is mainly responsible for the data transfer logic inside the combiner. For example, the transmission unit 24 may transmit the data record in the at least one input buffer 25 to the merging unit 23 under the control of the control unit 22. For example, the transmission unit 24 can read one data record from each of the at least one input buffer 25 and transmit it to the merging unit 23 each time under the control of the control unit 22.
合并单元23是基于FPGA的合并器中的功能模块,可接收传输单元24传输过来的数据记录,并对这些数据记录进行合并处理,以获得新的第N+j层的数据记录并存储至存储单元21中,以供数据库系统利用新的第N+j层数据记录替换数据库系统中需要合并的第N层至第N+j层数据记录,从而达到数据合并的目的。The merging unit 23 is a functional module in the FPGA-based combiner, and can receive the data records transmitted by the transport unit 24, and combine these data records to obtain a new N+j layer data record and store it in the storage. In the unit 21, the database system replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the new N+j layer data record, thereby achieving the purpose of data merge.
基于图4a所示的基于FPGA的合并器的内部实现结构,本申请又一示例性实施例还提供一种数据合并方法,该方法描述了图4a所示的基于FPGA的合并器的工作原理。如图4b所示,该方法包括:Based on the internal implementation structure of the FPGA-based combiner shown in FIG. 4a, yet another exemplary embodiment of the present application further provides a data merging method that describes the operation principle of the FPGA-based combiner shown in FIG. 4a. As shown in Figure 4b, the method includes:
40a、控制单元接收来自于数据库系统的数据合并指令,该数据库系统是基于FPGA的合并器所在的数据库系统。40a. The control unit receives a data merge instruction from a database system, which is a database system in which the FPGA-based combiner is located.
41a、控制单元根据该数据合并指令,将数据库系统中需要合并的第N层至第N+j层的数据记录加载至该基于FPGA的合并器的存储单元,该第N层至第N+j层的数据记录包括与至少一个输入缓冲区对应的至少一个数据记录组。41a. The control unit loads, according to the data merge instruction, the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner, the Nth layer to the N+j The data record of the layer includes at least one data record group corresponding to at least one input buffer.
42a、控制单元按照数据记录组与输入缓冲区的对应关系,从存储单元中读取至少一个数据记录组中的数据记录并缓存至对应的输入缓冲区内。42a. The control unit reads the data records in the at least one data record group from the storage unit according to the correspondence between the data record group and the input buffer, and caches the data records in the corresponding input buffer.
43a、控制单元控制传输单元从至少一个输入缓冲区中读取数据记录并传输给合并单元。43a. The control unit controls the transmission unit to read the data record from the at least one input buffer and transmit the data record to the merging unit.
44a、合并单元接收传输单元传输过来的数据记录,并对传输单元每次传输过来的数据记录进行合并处理,以获得新的第N+j层的数据记录并存储至存储单元中,以供数据 库系统利用新的第N+j层数据记录替换数据库系统中需要合并的第N层至第N+j层数据记录。44a. The merging unit receives the data record transmitted by the transmission unit, and combines the data records transmitted by the transmission unit each time to obtain a new data record of the N+jth layer and stores the data record in the storage unit for the database. The system replaces the Nth to Nth jth layer data records that need to be merged in the database system with the new N+j layer data record.
在本实施例中,控制单元接收来自于数据库系统的数据合并指令,根据该数据合并指令将数据库系统中需要合并的第N层至第N+j层的数据记录加载至该基于FPGA的合并器的存储单元,然后一方面从存储单元中读取各数据记录组中的数据记录并缓存至对应的输入缓冲区内,另一方面控制传输单元从各输入缓冲区中读取数据记录并传输给合并单元,使得合并单元可以对第N层至第N+j层的数据记录进行合并处理。In this embodiment, the control unit receives a data merge instruction from the database system, and loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the FPGA-based combiner according to the data merge instruction. The storage unit then reads the data records in each data record group from the storage unit and caches them in the corresponding input buffer. On the other hand, the control transfer unit reads the data records from the input buffers and transmits them to the storage unit. The merging unit enables the merging unit to merge the data records of the Nth layer to the N+jth layer.
其中,考虑到输入缓冲区的大小有一定限制,控制单元可以根据输入缓冲区内是否有可用空间,分批次从存储单元中的相应数据记录组内读取新的数据记录缓存至对应的输入缓冲区内。Wherein, considering that the size of the input buffer has a certain limit, the control unit may read the new data record buffer from the corresponding data record group in the storage unit to the corresponding input according to whether there is available space in the input buffer. Inside the buffer.
其中,考虑到合并单元每次能够处理的数据记录也有一定数量限制,传输单元可以分批次从输入缓冲区中读取数据记录并传输给合并单元。例如,传输单元每次从各输入缓冲区中分别读取一个数据记录并传输给合并单元。又例如,传输单元每次从一个输入缓冲区中读取若干个数据记录并传输给合并单元。又例如,传输单元每次从部分输入缓冲区中读取若干个数据记录并传输给合并单元。不论是哪种传输方式,传输单元每次传输给合并单元的数据记录数小于或等于合并单元每次最多能够处理的数据记录数。Among them, considering that the merging unit can also process a certain number of data records each time, the transmission unit can read the data records from the input buffer in batches and transmit them to the merging unit. For example, the transmission unit reads one data record from each input buffer and transmits it to the merging unit each time. As another example, the transmission unit reads several data records from one input buffer at a time and transmits them to the merging unit. As another example, the transmission unit reads a number of data records from a portion of the input buffer and transmits them to the merging unit each time. Regardless of the transmission mode, the number of data records transmitted by the transmission unit to the merging unit each time is less than or equal to the maximum number of data records that the merging unit can process at a time.
在本实施例中,并不限定控制单元向输入缓冲区内缓存数据记录的逻辑以及传输单元从输入缓冲区中读取数据记录的逻辑。这两个逻辑可以相互独立,也可以相互配合。In the present embodiment, the logic for the control unit to buffer the data record in the input buffer and the logic for the transfer unit to read the data record from the input buffer are not limited. These two logics can be independent of each other and can also work together.
在一示例性实施例中,控制单元可以监测输入缓冲区中的数据记录是否已经被传输单元全部传输给合并单元;当输入缓冲区内的数据记录全部被传输单元传输给合并单元后,控制单元继续从存储单元中的相应数据记录组内读取新的数据记录并缓存至该输入缓冲区内。In an exemplary embodiment, the control unit may monitor whether the data record in the input buffer has been completely transmitted by the transmission unit to the merging unit; when the data records in the input buffer are all transmitted to the merging unit by the transmission unit, the control unit The new data record is read from the corresponding data record group in the storage unit and cached into the input buffer.
在本实施例中,在合并器内部增加输入缓冲区,用于缓存存储单元内的数据记录,使得传输单元可以直接从输入缓冲区内读取数据记录,有利于提高传输单元读取数据记录的效率,进而提高数据传输效率,有利于进一步提高数据合并过程的整体效率。In this embodiment, an input buffer is added inside the combiner for buffering the data record in the storage unit, so that the transmission unit can directly read the data record from the input buffer, which is beneficial to improving the transmission unit to read the data record. Efficiency, which in turn increases data transfer efficiency, helps to further improve the overall efficiency of the data consolidation process.
图5a为本申请又一示例性实施例提供的带有编解码功能的基于FPGA的合并器的结构示意图。本实施例提供的基于FPGA的合并器可应用于数据库系统中,并可与数据库系统中的处理器相配合实现一种新的数据合并逻辑。如图5a所示,该基于FPGA的合并器包括:存储单元21、控制单元22、合并单元23、传输单元24、至少一个输入缓冲区 25、至少一个解码单元26、至少一个解码缓冲区27、编码单元28和编码缓冲区29。其中,解码单元26、输入缓冲区25以及解码缓冲区27之间一一对应,即每个解码单元26负责对一个输入缓冲区25缓存的数据记录进行解码并将解码结果输出至对应的解码缓冲区27中。FIG. 5 is a schematic structural diagram of an FPGA-based combiner with a codec function according to still another exemplary embodiment of the present application. The FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic. As shown in FIG. 5a, the FPGA-based combiner includes: a storage unit 21, a control unit 22, a merging unit 23, a transmission unit 24, at least one input buffer 25, at least one decoding unit 26, at least one decoding buffer 27, Encoding unit 28 and encoding buffer 29. There is a one-to-one correspondence between the decoding unit 26, the input buffer 25 and the decoding buffer 27, that is, each decoding unit 26 is responsible for decoding the data record buffered by one input buffer 25 and outputting the decoding result to the corresponding decoding buffer. In area 27.
其中,存储单元21主要用作基于FPGA的合并器的存储空间,负责存储与该合并器相关的数据,例如该合并器的配置文件、需要该合并器合并处理的数据记录等。The storage unit 21 is mainly used as a storage space of the FPGA-based combiner, and is responsible for storing data related to the combiner, such as a configuration file of the combiner, a data record requiring the merger processing, and the like.
输入缓冲区25是基于FPGA的合并器的片内缓冲区,可以在控制单元22的控制下缓存存储单元21中存储的数据记录。如图5a所示,输入缓冲区25可以是一个或多个,且控制单元22和解码单元26可直接访问输入缓冲区25,传输单元24可直接访问解码缓冲区27。The input buffer 25 is an on-chip buffer of the FPGA-based combiner, and the data record stored in the storage unit 21 can be cached under the control of the control unit 22. As shown in FIG. 5a, the input buffer 25 can be one or more, and the control unit 22 and the decoding unit 26 can directly access the input buffer 25, and the transmission unit 24 can directly access the decoding buffer 27.
控制单元22是基于FPGA的合并器的控制模块,主要实现该合并器的控制逻辑。控制单元22可接收来自基于FPGA的合并器所在的数据库系统的数据合并指令,并可根据该数据合并指令,将该数据库系统中需要合并的第N层至第N+j层的数据记录加载至存储单元21。除此之外,控制单元22还可以将存储单元21中的数据记录缓存到至少一个输入缓冲区25内,并控制解码单元26对相应输入缓冲区25中缓存的数据记录进行解码并输出解码结果至相应解码缓冲区27中。另外,控制单元22还可以控制传输单元24将至少一个解码缓冲区27中的解码结果传输给合并单元23,从而达到控制合并单元23对第N层至第N+j层的数据记录进行合并处理的目的。其中,N,j是非负整数,关于N,j的取值可参见前述实施例的描述,在此不再赘述。The control unit 22 is a control module of the FPGA-based combiner, and mainly implements the control logic of the combiner. The control unit 22 can receive the data merge instruction from the database system where the FPGA-based combiner is located, and can load the data records of the Nth layer to the N+jth layer that need to be merged in the database system according to the data merge instruction. Storage unit 21. In addition, the control unit 22 can also buffer the data records in the storage unit 21 into at least one input buffer 25, and control the decoding unit 26 to decode the data records buffered in the corresponding input buffer 25 and output the decoding result. To the corresponding decoding buffer 27. In addition, the control unit 22 can also control the transmission unit 24 to transmit the decoding result in the at least one decoding buffer 27 to the merging unit 23, thereby achieving the control merging unit 23 merging the data records of the Nth layer to the N+jth layer. the goal of. For example, N, j is a non-negative integer. For the value of N, j, refer to the description of the foregoing embodiment, and details are not described herein again.
解码单元26是基于FPGA的合并器中的功能模块,主要在控制单元22的控制下,对对应的输入缓冲区25内的数据记录进行解码处理,并将解码结果输出至对应的解码缓冲区27内。The decoding unit 26 is a functional module in the FPGA-based combiner, and mainly performs decoding processing on the data record in the corresponding input buffer 25 under the control of the control unit 22, and outputs the decoding result to the corresponding decoding buffer 27 Inside.
传输单元24是基于FPGA的合并器中的数据通道,主要负责该合并器内部的数据传输逻辑。例如,传输单元24可在控制单元22的控制下,每当合并单元23完成当前合并处理后,从至少一个解码缓冲区内读取新的解码结果并传输给合并单元23以供合并单元23对新的解码结果进行合并处理,以及在当前合并处理的结果需要保留解码结果时,将需要保留的解码结果作为待编码数据存储至编码缓冲区29。The transmission unit 24 is a data channel in the FPGA-based combiner, and is mainly responsible for the data transfer logic inside the combiner. For example, the transmission unit 24 may, under the control of the control unit 22, read the new decoding result from the at least one decoding buffer and transmit it to the merging unit 23 for the merging unit 23, each time the merging unit 23 completes the current merging process. The new decoding result is subjected to the merging process, and when the result of the current merging process needs to retain the decoding result, the decoding result to be retained is stored as the data to be encoded in the encoding buffer 29.
合并单元23是基于FPGA的合并器中的功能模块,可接收传输单元24传输过来的解码结果,对这些解码结果进行合并处理。另外,合并单元23还向控制单元22反馈合并处理结果,例如当前合并处理是否完成以及是否存在需要保留的解码结果等,以便于 控制单元22可以根据合并单元23反馈的合并处理结果对传输单元进行相应控制。The merging unit 23 is a functional module in the FPGA-based combiner, and can receive the decoding result transmitted by the transmission unit 24, and combine the decoding results. In addition, the merging unit 23 also feeds back the merge processing result to the control unit 22, for example, whether the current merging process is completed and whether there is a decoding result or the like that needs to be reserved, so that the control unit 22 can perform the splicing processing result fed back by the merging unit 23 on the transmission unit. Control accordingly.
编码单元28是基于FPGA的合并器中的功能模块,与解码单元26相对应,主要用于在控制单元22的控制下,对编码缓冲区29内的待编码数据进行编码处理,以获得新的第N+j层的数据记录并存储至存储单元21中,以供数据库系统利用新的第N+j层数据记录替换数据库系统中需要合并的第N层至第N+j层数据记录,从而达到数据合并的目的。The coding unit 28 is a functional module in the FPGA-based combiner, and corresponds to the decoding unit 26, and is mainly used for encoding the data to be encoded in the code buffer 29 under the control of the control unit 22 to obtain a new one. The data of the N+jth layer is recorded and stored in the storage unit 21, so that the database system replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the new N+j layer data record, thereby Achieve the purpose of data consolidation.
在一示例性实施例中,数据库系统可以通过合并器的API接口,获知该合并器所包含的输入缓冲区25的数量。为了简化该合并器的控制逻辑,数据库系统可根据该合并器所包含的输入缓冲区25的数量,预先将需要合并的第N层至第N+j层的数据记录划分为与至少一个输入缓冲区25对应的至少一个数据记录组,并将每一个输入缓冲区25应该缓存的数据记录组的相关信息携带在数据合并指令中提供给控制单元22。其中,根据合并需求的不同,数据记录组的相关信息也会有所不同,例如可以是数据记录组中数据记录的标识、偏移地址、快照版本号等。每个数据记录组包括至少一个数据记录。In an exemplary embodiment, the database system can know the number of input buffers 25 included in the combiner through the API interface of the combiner. In order to simplify the control logic of the combiner, the database system may pre-divide the data records of the Nth layer to the N+jth layer to be merged into at least one input buffer according to the number of input buffers 25 included in the combiner. At least one data record group corresponding to the area 25, and the related information of the data record group that should be buffered by each input buffer 25 is carried in the data merge instruction to the control unit 22. The information about the data record group may be different according to the merge requirement, for example, the identifier of the data record in the data record group, the offset address, the snapshot version number, and the like. Each data record set includes at least one data record.
基于上述,图5a所示的基于FPGA的合并器的一种工作原理如下:Based on the above, one of the working principles of the FPGA-based combiner shown in Figure 5a is as follows:
控制单元接收来自于数据库系统的数据合并指令,该数据库系统是基于FPGA的合并器所在的数据库系统。控制单元根据该数据合并指令,将数据库系统中需要合并的第N层至第N+j层的数据记录加载至该基于FPGA的合并器的存储单元。在该实施例中,第N层至第N+j层的数据记录包括与至少一个输入缓冲区对应的至少一个数据记录组。基于此,控制单元一方面按照数据记录组与输入缓冲区的对应关系,从存储单元中读取至少一个数据记录组中的数据记录并缓存至对应的输入缓冲区内。另一方面,控制单元按照解码单元与输入缓冲区的对应关系,控制解码单元对相应输入缓冲区内的数据记录进行解码操作并将解码结果输出至对应的解码缓冲区。The control unit receives data merge instructions from the database system, which is the database system in which the FPGA-based combiner resides. The control unit loads the data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit of the FPGA-based combiner according to the data merge instruction. In this embodiment, the data records of the Nth layer to the N+jth layer include at least one data record group corresponding to at least one input buffer. Based on this, the control unit reads the data records in the at least one data record group from the storage unit and caches them in the corresponding input buffer according to the corresponding relationship between the data record group and the input buffer. On the other hand, the control unit controls the decoding unit to perform a decoding operation on the data record in the corresponding input buffer according to the correspondence between the decoding unit and the input buffer, and outputs the decoding result to the corresponding decoding buffer.
另外,控制单元还会控制传输单元从至少一个输入缓冲区中读取解码结果并传输给合并单元,以供合并单元执行合并处理。其中,合并单元在完成合并处理时,可以向控制单元返回合并处理结果。基于此,控制单元可以获知合并单元是否完成当前合并处理,并在确定合并单元完成当前合并处理时,控制传输单元从至少一个输入缓冲区中读取解码结果并传输给合并单元,以供合并单元继续执行合并处理,以及在当前合并处理结果需要保留解码结果时,控制传输单元将需要保留的解码结果作为待编码数据存储至编码缓冲区。可选地,传输单元可以在控制单元的控制下,每当合并单元完成当前合并处理后,从至少一个解码缓冲区内分别读取新的解码结果并传输给合并单元。In addition, the control unit further controls the transmission unit to read the decoding result from the at least one input buffer and transmit the result to the merging unit for the merging unit to perform the merging process. The merging unit may return the merge processing result to the control unit when the merging process is completed. Based on this, the control unit may know whether the merging unit completes the current merging process, and when determining that the merging unit completes the current merging process, the control transmitting unit reads the decoding result from the at least one input buffer and transmits the decoding result to the merging unit for merging unit The merging process is continued, and when the current merging process result needs to retain the decoding result, the control transmission unit stores the decoding result that needs to be retained as the data to be encoded into the encoding buffer. Optionally, the transmission unit may, under the control of the control unit, read the new decoding result from the at least one decoding buffer and transmit the result to the merging unit after the merging unit completes the current merging process.
此外,控制单元还会控制编码单元对编码缓冲区内的待编码数据进行编码处理,以获得新的第N+j层的数据记录并存储至存储单元中,以供数据库系统利用新的第N+j层数据记录替换数据库系统中需要合并的第N层至第N+j层数据记录。In addition, the control unit further controls the encoding unit to encode the data to be encoded in the encoding buffer to obtain a new data record of the N+jth layer and store it in the storage unit for the database system to utilize the new Nth The +j layer data record replaces the Nth to Nth jth layer data records that need to be merged in the database system.
在本实施例中,控制单元需要控制解码单元、传输单元、合并单元以及编码单元执行相应操作,从而完成对第N层至第N+j层的数据记录进行合并处理。本实施例并不限定控制单元控制解码单元、传输单元、合并单元以及编码单元执行相应操作的控制逻辑。这些控制逻辑之间可以相互独立,也可以相互配合。In this embodiment, the control unit needs to control the decoding unit, the transmission unit, the merging unit, and the encoding unit to perform corresponding operations, thereby completing the merging process of the data records of the Nth layer to the N+jth layer. The present embodiment does not limit the control logic that the control unit controls the decoding unit, the transmission unit, the merging unit, and the encoding unit to perform the corresponding operations. These control logics can be independent of each other or can cooperate with each other.
在一示例性实施例中,将需要处理的数据记录划分为多个层次,如图5b所示。每个数据记录组包括至少一个数据记录块(block),每个数据记录块包括至少一个数据记录区间,每个数据记录区间包括至少一个数据记录。在该示例性实施例中,控制单元以数据记录块为单位,每次可向输入缓冲区缓存一个数据记录块;对解码单元来说,可以以数据记录区间为单位,每次从对应的输入缓冲区内读取一个数据记录区间进行解码处理。In an exemplary embodiment, the data records that need to be processed are divided into multiple levels, as shown in Figure 5b. Each data record set includes at least one data record block, each data record block including at least one data record interval, each data record interval including at least one data record. In the exemplary embodiment, the control unit buffers one data record block to the input buffer each time in units of data recording blocks; for the decoding unit, it can be in units of data recording intervals, each time from the corresponding input A data record interval is read in the buffer for decoding processing.
以第一数据记录组对应的输入缓冲区为例,则当监测到第一数据记录组对应的输入缓冲区内的最后一个数据记录区间被送入对应的解码单元后,控制单元可以从第一数据记录组中读取一个新的数据记录块并缓存至对应的输入缓冲区内。相应地,控制单元可以在解码单元每次完成解码处理后,可以根据数据块区间的偏移量(图5b所示区间偏移量),从第一数据记录组对应的输入缓冲区内读取一个新的数据记录区间并送入对应的解码单元。其中,第一数据记录组是至少一个数据记录组中的任一个数据记录组。Taking the input buffer corresponding to the first data record group as an example, after monitoring that the last data record interval in the input buffer corresponding to the first data record group is sent to the corresponding decoding unit, the control unit may A new data record block is read in the data record group and buffered into the corresponding input buffer. Correspondingly, after the decoding unit completes the decoding process, the control unit may read from the input buffer corresponding to the first data record group according to the offset of the data block interval (the interval offset shown in FIG. 5b). A new data recording interval is sent to the corresponding decoding unit. The first data record group is any one of the at least one data record group.
进一步,如图5b所示,编码后的数据记录可以包括关键字前缀长度、关键字后缀长度、关键字后缀、数据值长度以及数据值等字段。基于此,对任一数据记录来说,解码单元可以从该数据记录中解码出关键字前缀长度、关键字后缀长度、关键字后缀、数据值长度以及该数据记录的数据值;然后将解码出的关键字前缀长度、关键字后缀长度、关键字后缀以及上一个关键字拼接出该数据记录的关键字,以获得解码结果,所述解码结果包括:该数据记录的关键字的长度、数据值长度、该数据记录的关键字和数据值。其中,若该解码结果需要被保留下来,则将作为待编码数据被存储至编码缓冲区内。基于此,编码单元可以利用字符流编码待编码数据(即需要保留的解码结果)中的关键字长度、数据值长度、关键字和数据值,以获得新的第N+j层的数据记录。Further, as shown in FIG. 5b, the encoded data record may include a field of a keyword prefix length, a keyword suffix length, a keyword suffix, a data value length, and a data value. Based on this, for any data record, the decoding unit can decode the keyword prefix length, the keyword suffix length, the keyword suffix, the data value length, and the data value of the data record from the data record; The keyword prefix length, the keyword suffix length, the keyword suffix, and the keyword of the previous keyword are spliced out to obtain a decoding result, where the decoding result includes: a length of the keyword of the data record, and a data value Length, keywords and data values for this data record. Wherein, if the decoding result needs to be retained, the data to be encoded is stored in the encoding buffer. Based on this, the coding unit can encode the key length, the data value length, the keyword, and the data value in the data to be encoded (ie, the decoding result to be retained) using the character stream to obtain a new data record of the N+jth layer.
在本实施例中,增加了解码单元和编码单元,可支持对经过编码的数据记录进行合并处理,编码操作可以降低数据记录的数据量,有利于节约内存和磁盘等存储资源。In this embodiment, the decoding unit and the encoding unit are added, and the combined processing of the encoded data records can be supported, and the encoding operation can reduce the data amount of the data recording, which is beneficial to saving storage resources such as memory and disk.
图6a为本申请又一示例性实施例提供的带有压缩功能的基于FPGA的合并器的结构 示意图。本实施例提供的基于FPGA的合并器可应用于数据库系统中,可与数据库系统中的处理器相配合实现一种新的数据合并逻辑。如图6a所示,该基于FPGA的合并器包括:存储单元21、控制单元22、合并单元23、传输单元24、至少一个输入缓冲区25、至少一个解码单元26、至少一个解码缓冲区27、编码单元28、编码缓冲区29、输出缓冲区201和压缩单元202。其中,解码单元26、输入缓冲区25以及解码缓冲区27之间一一对应,即每个解码单元26负责对一个输入缓冲区25缓存的数据记录进行解码并将解码结果输出至对应的解码缓冲区27中。FIG. 6a is a schematic structural diagram of an FPGA-based combiner with a compression function according to still another exemplary embodiment of the present application. The FPGA-based combiner provided in this embodiment can be applied to a database system, and can cooperate with a processor in a database system to implement a new data merge logic. As shown in FIG. 6a, the FPGA-based combiner includes: a storage unit 21, a control unit 22, a merging unit 23, a transmission unit 24, at least one input buffer 25, at least one decoding unit 26, at least one decoding buffer 27, The encoding unit 28, the encoding buffer 29, the output buffer 201, and the compression unit 202. There is a one-to-one correspondence between the decoding unit 26, the input buffer 25 and the decoding buffer 27, that is, each decoding unit 26 is responsible for decoding the data record buffered by one input buffer 25 and outputting the decoding result to the corresponding decoding buffer. In area 27.
图6a所示实施例与图5a所示实施例的区别在于:增加了输出缓冲区201和压缩单元202。输出缓冲区201主要用于缓存编码单元28输出的经过编码后的数据记录,即新的第N+j层的数据记录。当编码单元28输出的经过编码后的数据记录累积到一定数量后,控制单元22可以控制压缩单元202对这些经过编码后的数据记录进行压缩并将压缩结果输出至存储单元21中。对压缩单元202来说,可在控制单元22的控制下对输出缓冲区201内的新的第N+j层的数据记录进行压缩处理,并将压缩结果输出至存储单元21。其中,通过压缩单元202对经过编码后的数据记录进行压缩,可以减少对存储单元21的存储资源的占用,减少处理器与合并器之间进行数据传输所消耗的带宽资源。The difference between the embodiment shown in Fig. 6a and the embodiment shown in Fig. 5a is that the output buffer 201 and the compression unit 202 are added. The output buffer 201 is mainly used to buffer the encoded data record output by the encoding unit 28, that is, the new N+j layer data record. After the encoded data records output by the encoding unit 28 are accumulated to a certain number, the control unit 22 can control the compression unit 202 to compress the encoded data records and output the compression results to the storage unit 21. For the compression unit 202, the data record of the new N+jth layer in the output buffer 201 can be compressed under the control of the control unit 22, and the compression result is output to the storage unit 21. The compression of the encoded data record by the compression unit 202 can reduce the occupation of the storage resources of the storage unit 21 and reduce the bandwidth resources consumed by the data transmission between the processor and the combiner.
可选地,在上述各实施例中,输入缓冲区、解码缓冲区、编码缓冲区和输出缓冲区可以采用双端口RAM实现,一端顺序写,另外一端顺序读,提高数据读写效率。进一步,解码缓冲区和编码缓冲区可采用环形缓冲区(Ring Buffer)。在一种实现中,输入缓冲区和输出缓冲区的大小被设计为可缓存两个数据记录块,位宽是64bit,在300MHz频率下,理论读取带宽为2.4GB/s。Optionally, in the foregoing embodiments, the input buffer, the decoding buffer, the encoding buffer, and the output buffer may be implemented by using dual-port RAM, and one end is sequentially written, and the other end is sequentially read to improve data reading and writing efficiency. Further, the decoding buffer and the encoding buffer may use a ring buffer (Ring Buffer). In one implementation, the input buffer and output buffer are sized to buffer two data record blocks with a bit width of 64 bits and a theoretical read bandwidth of 2.4 GB/s at 300 MHz.
在上述各实施例中,基于FPGA的合并器由功能模块、控制模块和存储模块三部分组成。其中,功能模块可由FPGA芯片上的DSP和LUT资源实现,存储模块可由FPGA芯片上的BRAM资源等实现。各功能模块的执行状态由相应的控制模块管理,可按照流水方式执行,有利于提高FPGA芯片的计算资源利用效率。In each of the above embodiments, the FPGA-based combiner is composed of a functional module, a control module, and a storage module. The function module can be implemented by DSP and LUT resources on the FPGA chip, and the memory module can be implemented by BRAM resources on the FPGA chip. The execution status of each functional module is managed by the corresponding control module and can be executed in a pipeline mode, which is beneficial to improving the utilization efficiency of the FPGA chip.
本申请各实施例提供的基于FPGA的合并器可应用在各种数据库系统中,例如可以应用在LevelDB或RocksDB。以LevelDB或RocksDB为例,详细说明本申请实施例提供的合并器的工作过程。The FPGA-based combiner provided by the embodiments of the present application can be applied to various database systems, for example, can be applied to LevelDB or RocksDB. The working process of the combiner provided by the embodiment of the present application is described in detail by taking LevelDB or RocksDB as an example.
参见图6b,是包含基于FPGA的合并器的LevelDB或RocksDB的实现结构。LevelDB 或RocksDB是基于日志增量存储的KV型的数据库,实际存储的是一系列KV记录。在LevelDB或RocksDB中,当需要写入KV记录时,先将该KV记录写入日志(log)文件中;当日志文件写入成功后,再将该KV记录写入内存的memtable文件中;当memtable文件的大小到了一定值时,将该memtable文件转换为immutable memtable文件,然后按照immutable memtable文件中KV记录的关键字(Key)由小到大遍历,并依次写入磁盘上一个level_0层的新建SST文件中。immutable memtable文件是一个多层级队列SkipList,其中的KV记录是根据Key有序排列的。采用分层存储方式将大部分KV记录存储至磁盘中,可以减少对内存资源的消耗,实现持久化存储。See Figure 6b, which is the implementation structure of LevelDB or RocksDB with FPGA-based combiner. LevelDB or RocksDB is a KV-based database based on log delta storage, which actually stores a series of KV records. In LevelDB or RocksDB, when a KV record needs to be written, the KV record is first written into a log file; when the log file is successfully written, the KV record is written into the memory memtable file; When the size of the memtable file reaches a certain value, the memtable file is converted into an immutable memtable file, and then the key (Key) of the KV record in the immutable memtable file is traversed from small to large, and sequentially written to a level_0 layer of the disk. In the SST file. The immutable memtable file is a multi-level queue SkipList in which KV records are ordered according to Key. Storing most of the KV records to disk using tiered storage reduces the consumption of memory resources and enables persistent storage.
在每个SST文件内的KV记录是按照Key由小到大的顺序存储的,且除level_0下的SST文件之外,不同SST文件之间的Key范围(SST文件内最小key和最大key之间)不会有任何重叠。因为level_0的文件直接来自于内存,所以level_0下的任意两个SST文件的key范围可能重叠。The KV records in each SST file are stored in the order of Key from small to large, and the Key range between different SST files except the SST file under level_0 (between the minimum key and the maximum key in the SST file) There will be no overlap. Because the level_0 file comes directly from memory, the key range of any two SST files under level_0 may overlap.
在LevelDB或RocksDB中,当读取KV记录时需要在memtable文件、immutable memtable文件以及磁盘上各层级的SST文件中依照KV记录的新鲜程度依次查找,比较复杂,查找速度较慢。为了加快读取KV记录的速度,现有技术采取合并(compaction)方式对已有KV记录进行整理压缩,去除一些无效KV记录,通过减少文件数量来降低查询复杂度,提高查询效率。In LevelDB or RocksDB, when reading KV records, you need to search in the memtable file, immutable memtable file, and SST file of each level on the disk according to the freshness of KV records. It is more complicated and slower to find. In order to speed up the reading of KV records, the prior art adopts a compaction method to sort and compress existing KV records, remove some invalid KV records, reduce the query complexity and reduce the query efficiency by reducing the number of files.
当按照immutable memtable文件中KV记录的关键字(Key)由小到大遍历,并依次写入磁盘上一个level_0层的新建SST文件中的过程中,可以对immutable memtable文件中KV记录进行合并处理。或者,当磁盘上某个level(例如level_L)下的SST文件数目超过预设值时,可以将这个level_L下的SST文件和高一层级的level_L+1下的SST文件进行合并。The KV records in the immutable memtable file can be merged when the key (Key) recorded by the KV in the immutable memtable file is traversed from small to large and sequentially written into a new SST file on the level_0 layer of the disk. Alternatively, when the number of SST files under a certain level (for example, level_L) on the disk exceeds a preset value, the SST file under the level_L and the SST file at the higher level level_L+1 may be merged.
在选定某个level进行合并后,可以轮流选择该level_L下需要参与合并的文件。例如,第一次选择文件A进行合并,第二次可以选择Key范围紧挨着文件A的文件B进行合并,这样每个文件都会有机会轮流和高一层级的文件进行合并。After selecting a certain level for merging, you can take turns to select the files that need to participate in the merging at the level_L. For example, the first time file A is selected for merging, and the second time, the key range can be selected next to file B of file A for merging, so that each file has the opportunity to merge and merge the higher level files.
当确定level_L的文件A和level_L+1层的文件进行合并时,可以从level_L+1层中的文件中选择Key范围与文件A在Key范围有重叠的所有文件,例如文件B、C、D,并将所有文件与文件A进行合并。When it is determined that the files of the file A and the level_L+1 layer of the level_L are merged, all the files whose key range overlaps with the file A in the Key range, such as files B, C, and D, may be selected from the files in the level_L+1 layer. And merge all files with file A.
可选地,处理器可以按照Key由小到大的顺序将文件A、B、C、D中的KV记录进行排序,并根据基于FPGA的合并器所包含的输入缓冲区的数量划分为相应的KV记录 组,然后通知基于FPGA的合并器。该合并器从内存中读取各KV记录组并存储至合并器的存储单元(DDR)中。如图6b所示,假设基于FPGA的合并器包含4个输入缓冲区,则文件A、B、C、D中的KV记录被划分为4组,分别对应Way0~Way3。在每一路内,KV记录按照Key和版本号升序排列,而任意两路之间Key取值范围可能重叠。Optionally, the processor may sort the KV records in the files A, B, C, and D according to the Key from small to large, and divide the corresponding input buffers according to the number of input buffers included in the FPGA-based combiner. The KV records the group and then notifies the FPGA-based combiner. The combiner reads each KV record set from memory and stores it in the memory unit (DDR) of the combiner. As shown in FIG. 6b, assuming that the FPGA-based combiner includes four input buffers, the KV records in files A, B, C, and D are divided into four groups, corresponding to Way0 to Way3, respectively. In each way, the KV records are sorted in ascending order by Key and version number, and the range of Key values between any two paths may overlap.
一方面,控制单元在输入缓冲区中KV记录处理完成后,根据下一个KV Block的偏移地址控制输入缓冲区从DDR中读取下一个待处理KV Block。这里的控制单元可以实现为加载控制器(Load Controller)。On the one hand, after the KV recording process in the input buffer is completed, the control unit controls the input buffer to read the next pending KV Block from the DDR according to the offset address of the next KV block. The control unit here can be implemented as a load controller.
一方面,控制单元在解码单元完成一个KV区间的解码后,根据KV区间在输入缓冲区的偏移地址,从输入缓冲区中读取下一个KV区间,并发送给解码单元。这里的控制单元可实现为解码控制器(Decoder Controller)。解码单元对KV区间内的KV记录进行解码处理,并将解码结果输出至解码缓冲区。On the one hand, after the decoding unit completes decoding of a KV interval, the control unit reads the next KV interval from the input buffer according to the offset address of the KV interval in the input buffer, and sends it to the decoding unit. The control unit here can be implemented as a Decoder Controller. The decoding unit decodes the KV record in the KV interval and outputs the decoded result to the decoding buffer.
一方面,控制单元根据合并单元反馈最小的Key,控制传输单元将最小Key对应的解码结果从相应解码缓冲区中传输到编码缓冲区中,并控制传输单元继续从四个解码缓冲区中分别读取一个解码结果(如图6b所示KV0、KV1、KV2、KV3)并提供给合并单元,以供合并单元继续执行合并处理。对合并单元来说,接收传输单元传输过来的4路解码结果KV0、KV1、KV2、KV3,并将上一次合并处理中的最小Key与这4路解码结果中的Key进行比较,并向控制单元反馈最小Key。这里的控制单元可实现为合并控制器(Compaction Controller)。On one hand, the control unit controls the transmission unit to transmit the decoding result corresponding to the minimum key from the corresponding decoding buffer to the encoding buffer according to the minimum key fed back by the merging unit, and controls the transmission unit to continue to read from the four decoding buffers respectively. A decoding result (KV0, KV1, KV2, KV3 as shown in FIG. 6b) is taken and supplied to the merging unit for the merging unit to continue the merging process. For the merging unit, the four decoding results KV0, KV1, KV2, and KV3 transmitted by the transmission unit are received, and the minimum key in the previous merging process is compared with the Key in the four decoding results, and is sent to the control unit. Feedback minimum Key. The control unit here can be implemented as a Compaction Controller.
一方面,控制单元可在编码单元完成一个解码结果的编码后,从编码缓冲区中读取下一个待编码的解码结果,并发送给编码单元。这里的控制单元可实现为编码控制器(Encoder Controller)。In one aspect, the control unit may read the next decoding result to be encoded from the encoding buffer after the encoding unit completes encoding of the decoding result, and send the decoding result to the encoding unit. The control unit here can be implemented as an encoder controller.
在本实施例中,整个数据合并过程被分成解码(Decoder)、比较合并(Compaction)和编码(Encoder)三个阶段,并在FPGA上为每个功能模块固化一定的计算资源和数据缓冲资源,通过控制单元使各个阶段以流水方式执行,充分提高数据合并过程的效率。同时,释放数据合并操作占用的CPU资源,提高数据库整体性能,改善性能抖动问题。另外,与数据库系统中的处理器相配合,无需修改合并操作的触发条件,因此对应用场景无特殊要求,可适用于不同的负载场景。In this embodiment, the entire data merging process is divided into three stages of decoding (Decoder), comparison merging (Compaction) and encoding (Encoder), and solidifying certain computing resources and data buffer resources for each functional module on the FPGA. The various stages are executed in a pipeline by the control unit, which greatly improves the efficiency of the data merge process. At the same time, the CPU resources occupied by the data merge operation are released, the overall performance of the database is improved, and the performance jitter problem is improved. In addition, in conjunction with the processor in the database system, there is no need to modify the triggering conditions of the merge operation, so there is no special requirement for the application scenario, and it can be applied to different load scenarios.
需要说明的是,上述实施例所提供方法的各步骤的执行主体均可以是同一设备,或者,该方法也由不同设备作为执行主体。比如,步骤20a至步骤22a的执行主体可以为设备A;又比如,步骤20a和21a的执行主体可以为设备A,步骤22a的执行主体可以 为设备B;等等。It should be noted that the execution bodies of the steps of the method provided by the foregoing embodiments may all be the same device, or the method may also be performed by different devices. For example, the execution body of steps 20a to 22a may be device A; for example, the execution body of steps 20a and 21a may be device A, the execution body of step 22a may be device B, and the like.
另外,在上述实施例及附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如20a、22a等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。In addition, some of the processes described in the above-described embodiments and the accompanying drawings include a plurality of operations occurring in a specific order, but it should be clearly understood that the operations may be performed in the order in which they are presented or executed in parallel. The serial number of the operation, such as 20a, 22a, etc., is only used to distinguish the different operations, and the serial number itself does not represent any execution order. Additionally, these processes may include more or fewer operations, and these operations may be performed sequentially or in parallel. It should be noted that the descriptions of “first” and “second” in this document are used to distinguish different messages, devices, modules, etc., and do not represent the order, nor the “first” and “second”. It is a different type.
相应地,本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,计算机程序被执行时能够实现上述方法实施例中由控制单元执行的各步骤。Correspondingly, the embodiment of the present application further provides a computer readable storage medium storing a computer program, which can implement the steps performed by the control unit in the foregoing method embodiment when the computer program is executed.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It is also to be understood that the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, Other elements not explicitly listed, or elements that are inherent to such a process, method, commodity, or equipment. An element defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device including the element.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above description is only an embodiment of the present application and is not intended to limit the application. Various changes and modifications can be made to the present application by those skilled in the art. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included within the scope of the appended claims.

Claims (14)

  1. 一种基于FPGA的合并器,其特征在于,包括:控制单元、存储单元以及合并单元;An FPGA-based combiner, comprising: a control unit, a storage unit, and a merging unit;
    所述控制单元,用于根据数据库系统的数据合并指令,将所述数据库系统中需要合并的第N层至第N+j层的数据记录加载至所述存储单元,以及控制所述合并单元对所述需要合并的第N层至第N+j层的数据记录进行合并处理;其中,N是非负整数,j是非负整数;The control unit is configured to load data records of the Nth layer to the N+jth layer that need to be merged in the database system to the storage unit according to a data merge instruction of the database system, and control the merged unit pair The data records of the Nth layer to the N+jth layer to be merged are subjected to a merge process; wherein N is a non-negative integer and j is a non-negative integer;
    所述合并单元,用于对所述需要合并的第N层至第N+j层的数据记录进行合并处理,以获得新的第N+j层的数据记录并存储至所述存储单元中,以供所述数据库系统利用所述新的第N+j层的数据记录替换所述数据库系统中所述需要合并的第N层至第N+j层的数据记录。The merging unit is configured to perform a merging process on the data records of the Nth layer to the N+thth layer that need to be merged to obtain a new N+j layer data record and store the data record in the storage unit. And replacing, by the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system by using the data record of the new N+j layer.
  2. 根据权利要求1所述的基于FPGA的合并器,其特征在于,还包括:传输单元,所述传输单元与所述控制单元和所述合并单元连接;The FPGA-based combiner according to claim 1, further comprising: a transmission unit, wherein the transmission unit is connected to the control unit and the merging unit;
    所述传输单元,用于在所述控制单元的控制下,向所述合并单元传输所述需要合并的第N层至第N+j层的数据记录;The transmission unit is configured to, under the control of the control unit, transmit, to the merging unit, the data records of the Nth layer to the N+jth layer that need to be merged;
    所述控制单元具体用于:控制所述传输单元向所述合并单元传输所述需要合并的第N层至第N+j层的数据记录,以控制所述合并单元对所述需要合并的第N层至第N+j层的数据记录进行合并处理。The control unit is specifically configured to: control the transmission unit to transmit, to the merging unit, the data records of the Nth layer to the N+jth layer that need to be merged, to control the merging unit to the merging The data records of the Nth layer to the N+jth layer are merged.
  3. 根据权利要求2所述的基于FPGA的合并器,其特征在于,还包括:至少一个输入缓冲区;所述需要合并的第N层至第N+j层的数据记录包括与所述至少一个输入缓冲区对应的至少一个数据记录组;The FPGA-based combiner of claim 2, further comprising: at least one input buffer; wherein the data records of the Nth layer to the N+jth layer to be merged comprise the at least one input At least one data record group corresponding to the buffer;
    所述控制单元还用于:将所述存储单元中的所述至少一个数据记录组中的数据记录分别缓存至对应的输入缓冲区内;The control unit is further configured to: respectively cache data records in the at least one data record group in the storage unit into a corresponding input buffer;
    所述传输单元具体用于:在所述控制单元的控制下,将所述至少一个输入缓冲区中的数据记录传输给所述合并单元。The transmission unit is specifically configured to: transmit, by the control unit, a data record in the at least one input buffer to the merging unit.
  4. 根据权利要求3所述的基于FPGA的合并器,其特征在于,还包括:与所述至少一个输入缓冲区对应的至少一个解码单元、所述至少一个解码单元对应的至少一个解码缓冲区、编码单元以及所述编码单元对应的编码缓冲区;The FPGA-based combiner according to claim 3, further comprising: at least one decoding unit corresponding to the at least one input buffer, at least one decoding buffer corresponding to the at least one decoding unit, and encoding a unit and an encoding buffer corresponding to the coding unit;
    所述解码单元,用于在所述控制单元的控制下,对对应的输入缓冲区内的数据记录进行解码处理,并将解码结果输出至对应的解码缓冲区内;The decoding unit is configured to perform decoding processing on the data record in the corresponding input buffer under the control of the control unit, and output the decoding result to the corresponding decoding buffer;
    所述编码单元,用于在所述控制单元的控制下,对所述编码缓冲区内的待编码数据进行编码处理,以获得所述新的第N+j层的数据记录并存储至所述存储单元;The encoding unit is configured to perform encoding processing on the data to be encoded in the encoding buffer under the control of the control unit to obtain the data record of the new N+j layer and store the data to the Storage unit
    所述传输单元具体用于:在所述控制单元的控制下,每当所述合并单元完成当前合并处理后,从所述至少一个解码缓冲区内读取新的解码结果并传输给所述合并单元以供所述合并单元对所述新的解码结果进行合并处理,以及在当前合并处理的结果需要保留解码结果时,将所述需要保留的解码结果作为所述待编码数据存储至所述编码缓冲区。The transmission unit is specifically configured to: after the merging unit completes the current merging process, read a new decoding result from the at least one decoding buffer and transmit the merging to the merging under the control of the control unit a unit for the merging unit to perform a merging process on the new decoding result, and storing, when the result of the current merging process needs to retain the decoding result, the decoding result to be retained as the to-be-encoded data to the encoding Buffer.
  5. 根据权利要求4所述的基于FPGA的合并器,其特征在于,所述传输单元具体用于:在所述控制单元的控制下,每当所述合并单元完成当前合并处理后,从所述至少一个解码缓冲区内分别读取新的解码结果并传输给所述合并单元。The FPGA-based combiner according to claim 4, wherein the transmission unit is specifically configured to: under the control of the control unit, each time the merging unit completes the current merging process, from the at least A new decoding result is read in a decoding buffer and transmitted to the merging unit.
  6. 根据权利要求4所述的基于FPGA的合并器,其特征在于,还包括:输出缓冲区和压缩单元;The FPGA-based combiner of claim 4, further comprising: an output buffer and a compression unit;
    所述输出缓冲区,用于缓存所述编码单元输出的所述新的第N+j层的数据记录;The output buffer is configured to buffer the data record of the new N+j layer output by the coding unit;
    所述压缩单元,用于在所述控制单元的控制下,对所述输出缓冲区内的所述新的第N+j层的数据记录进行压缩处理,并将压缩结果输出至所述存储单元。The compressing unit is configured to perform compression processing on the data record of the new N+j layer in the output buffer under the control of the control unit, and output the compression result to the storage unit .
  7. 根据权利要求4所述的基于FPGA的合并器,其特征在于,所述解码单元具体用于:The FPGA-based combiner according to claim 4, wherein the decoding unit is specifically configured to:
    对每个数据记录,从所述数据记录中解码出关键字前缀长度、关键字后缀长度、关键字后缀、数据值长度以及所述数据记录的数据值;For each data record, a keyword prefix length, a keyword suffix length, a keyword suffix, a data value length, and a data value of the data record are decoded from the data record;
    将所述关键字前缀长度、关键字后缀长度、关键字后缀以及上一个关键字拼接出所述数据记录的关键字,以获得解码结果,所述解码结果包括:所述数据记录的关键字的长度、所述数据值长度、所述数据记录的关键字和数据值;And inserting, by the keyword prefix length, the keyword suffix length, the keyword suffix, and the previous keyword, the keyword of the data record to obtain a decoding result, where the decoding result includes: a keyword of the data record Length, length of the data value, keywords and data values of the data record;
    所述编码单元具体用于:利用字符流编码所述待编码数据中的关键字长度、数据值长度、关键字和数据值,以获得所述新的第N+j层的数据记录。The encoding unit is specifically configured to: encode a keyword length, a data value length, a keyword, and a data value in the data to be encoded by using a character stream to obtain a data record of the new N+j layer.
  8. 根据权利要求4-7任一项所述的基于FPGA的合并器,其特征在于,每个数据记录组包括至少一个数据记录块,每个数据记录块包括至少一个数据记录区间,每个数据记录区间包括至少一个数据记录;The FPGA-based combiner according to any one of claims 4 to 7, wherein each data record group includes at least one data record block, each data record block including at least one data record interval, each data record The interval includes at least one data record;
    所述控制单元具体用于:对第一数据记录组,当所述第一数据记录组对应的解码单元完成当前解码处理后,从所述第一数据记录组对应的输入缓冲区内读取一个新的数据记录区间并送入所述对应的解码单元,并当所述对应的输入缓冲区内的最后一个数据记录区间被送入所述对应的解码单元后,从所述第一数据记录组中读取一个新的数据记录 块并缓存至所述对应的输入缓冲区内;其中,所述第一数据记录组是所述至少一个数据记录组中的任一个数据记录组。The control unit is configured to: after the decoding unit corresponding to the first data record group completes the current decoding process, the control unit is configured to read one from the input buffer corresponding to the first data record group. a new data recording interval is sent to the corresponding decoding unit, and after the last data recording interval in the corresponding input buffer is sent to the corresponding decoding unit, from the first data recording group Reading a new data record block and buffering it into the corresponding input buffer; wherein the first data record group is any one of the at least one data record group.
  9. 一种数据合并方法,适用于基于FPGA的合并器,其特征在于,所述方法包括:A data merging method is applicable to an FPGA-based combiner, the method comprising:
    根据数据库系统的数据合并指令,将所述数据库系统中需要合并的第N层至第N+j层的数据记录加载至所述基于FPGA的合并器的存储单元中;其中,N是非负整数,j是非负整数;Loading, according to the data merge instruction of the database system, the data records of the Nth layer to the N+jth layer that need to be merged in the database system into the storage unit of the FPGA-based combiner; wherein N is a non-negative integer, j is a non-negative integer;
    对所述需要合并的第N层至第N+j层的数据记录进行合并处理,以获得新的第N+j层的数据记录并存储至所述存储单元中,以供所述数据库系统利用所述新的第N+j层的数据记录替换所述数据库系统中所述需要合并的第N层至第N+j层的数据记录。Performing a merge process on the data records of the Nth layer to the N+jth layer that need to be merged to obtain a new data record of the N+jth layer and storing the data record in the storage unit for use by the database system The data record of the new N+j layer replaces the data records of the Nth layer to the N+jth layer that need to be merged in the database system.
  10. 一种数据库系统,其特征在于,包括:存储器、处理器以及基于FPGA的合并器;A database system, comprising: a memory, a processor, and an FPGA-based combiner;
    所述存储器,用于存储计算机程序以及所述数据库系统中的至少两层数据记录;The memory is configured to store a computer program and at least two layers of data records in the database system;
    所述处理器与所述存储器和所述合并器耦合,用于执行所述计算机程序,以用于:The processor is coupled to the memory and the combiner for executing the computer program for:
    从所述至少两层数据记录中识别出需要合并的第N层至第N+j层数据记录;向所述基于FPGA的合并器发送数据合并指令,以指示所述基于FPGA的合并器对所述需要合并的第N层至第N+j层的数据记录进行合并处理;以及利用所述基于FPGA的合并器输出的新的第N+j层的数据记录替换所述存储器中需要合并的第N层至第N+j层的数据记录;其中,N是非负整数,j是非负整数;Identifying an Nth layer to an N+jth layer data record to be merged from the at least two layers of data records; transmitting a data merge instruction to the FPGA based combiner to indicate the FPGA based combiner pair Data records of the Nth layer to the N+jth layer to be merged are subjected to a merge process; and a data record of the new N+jth layer output by the FPGA-based combiner is replaced with a block to be merged in the memory Data records from the Nth layer to the N+jth layer; wherein N is a non-negative integer and j is a non-negative integer;
    所述基于FPGA的合并器,用于接收所述数据合并指令,根据所述数据合并指令,对所述需要合并的第N层至第N+j层数据记录进行合并处理,以获得所述新的第N+j层的数据记录并输出给所述处理器。The FPGA-based combiner is configured to receive the data merge instruction, and perform a merge process on the Nth layer to the N+jth layer data record that needs to be merged according to the data merge instruction to obtain the new The data of the N+jth layer is recorded and output to the processor.
  11. 根据权利要求10所述的系统,其特征在于,所述基于FPGA的合并器包括:存储单元、控制单元和合并单元;The system of claim 10, wherein the FPGA-based combiner comprises: a storage unit, a control unit, and a merging unit;
    所述控制单元,用于接收所述数据合并指令,将所述需要合并的第N层至第N+j层的数据记录从所述存储器中加载至所述存储单元中,以及控制所述合并单元对所述需要合并的第N层至第N+j层的数据记录进行合并处理;The control unit is configured to receive the data merge instruction, load the data records of the Nth layer to the N+jth layer that need to be merged from the memory into the storage unit, and control the merge The unit performs a merge process on the data records of the Nth layer to the N+jth layer that need to be merged;
    所述合并单元,用于对所述需要合并的第N层至第N+j层的数据记录进行合并处理,以获得所述新的第N+j层的数据记录并存储至所述存储单元中,以供所述处理器利用所述新的第N+j层的数据记录替换所述存储器中所述需要合并的第N层至第N+j层的数据记录。The merging unit is configured to perform a merging process on the data records of the Nth layer to the N+thth layer that need to be merged to obtain the data record of the new N+jth layer and store the data record in the storage unit And replacing, by the processor, the data records of the Nth layer to the N+jth layer that need to be merged in the memory by using the data record of the new N+j layer.
  12. 根据权利要求11所述的系统,其特征在于,所述的基于FPGA的合并器,还包括:传输单元,所述传输单元与所述控制单元和所述合并单元连接;The system according to claim 11, wherein said FPGA-based combiner further comprises: a transmission unit, said transmission unit being coupled to said control unit and said merging unit;
    所述传输单元,用于在所述控制单元的控制下,向所述合并单元传输所述需要合并的第N层至第N+j层的数据记录;The transmission unit is configured to, under the control of the control unit, transmit, to the merging unit, the data records of the Nth layer to the N+jth layer that need to be merged;
    所述控制单元具体用于:控制所述传输单元向所述合并单元传输所述需要合并的第N层至第N+j层的数据记录,以控制所述合并单元对所述需要合并的第N层至第N+j层的数据记录进行合并处理。The control unit is specifically configured to: control the transmission unit to transmit, to the merging unit, the data records of the Nth layer to the N+jth layer that need to be merged, to control the merging unit to the merging The data records of the Nth layer to the N+jth layer are merged.
  13. 根据权利要求12所述的系统,其特征在于,所述基于FPGA的合并器还包括:至少一个输入缓冲区;所述需要合并的第N层至第N+j层的数据记录包括与所述至少一个输入缓冲区对应的至少一个数据记录组;The system of claim 12, wherein the FPGA-based combiner further comprises: at least one input buffer; the data records of the Nth layer to the N+jth layer to be merged include At least one data record group corresponding to at least one input buffer;
    所述控制单元还用于:将所述存储单元中的所述至少一个数据记录组中的数据记录分别缓存至对应的输入缓冲区内;The control unit is further configured to: respectively cache data records in the at least one data record group in the storage unit into a corresponding input buffer;
    所述传输单元具体用于:在所述控制单元的控制下,将所述至少一个输入缓冲区中的数据记录传输给所述合并单元。The transmission unit is specifically configured to: transmit, by the control unit, a data record in the at least one input buffer to the merging unit.
  14. 根据权利要求13所述的系统,其特征在于,所述基于FPGA的合并器还包括:与所述至少一个输入缓冲区对应的至少一个解码单元、所述至少一个解码单元对应的至少一个解码缓冲区、编码单元以及所述编码单元对应的编码缓冲区;The system according to claim 13, wherein the FPGA-based combiner further comprises: at least one decoding unit corresponding to the at least one input buffer, and at least one decoding buffer corresponding to the at least one decoding unit a region, a coding unit, and an encoding buffer corresponding to the coding unit;
    所述解码单元,用于在所述控制单元的控制下,对对应的输入缓冲区内的数据记录进行解码处理,并将解码结果输出至对应的解码缓冲区内;The decoding unit is configured to perform decoding processing on the data record in the corresponding input buffer under the control of the control unit, and output the decoding result to the corresponding decoding buffer;
    所述编码单元,用于在所述控制单元的控制下,对所述编码缓冲区内的待编码数据进行编码处理,以获得所述新的第N+j层的数据记录并存储至所述存储单元;The encoding unit is configured to perform encoding processing on the data to be encoded in the encoding buffer under the control of the control unit to obtain the data record of the new N+j layer and store the data to the Storage unit
    所述传输单元具体用于:在所述控制单元的控制下,每当所述合并单元完成当前合并处理后,从所述至少一个解码缓冲区内读取新的解码结果并传输给所述合并单元以供所述合并单元对所述新的解码结果进行合并处理,以及在当前合并处理的结果需要保留解码结果时,将所述需要保留的解码结果作为所述待编码数据存储至所述编码缓冲区。The transmission unit is specifically configured to: after the merging unit completes the current merging process, read a new decoding result from the at least one decoding buffer and transmit the merging to the merging under the control of the control unit a unit for the merging unit to perform a merging process on the new decoding result, and storing, when the result of the current merging process needs to retain the decoding result, the decoding result to be retained as the to-be-encoded data to the encoding Buffer.
PCT/CN2019/075322 2018-03-01 2019-02-18 Data merging method, fpga-based merger and database system WO2019165901A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810172456.4 2018-03-01
CN201810172456.4A CN110309138B (en) 2018-03-01 2018-03-01 Data merging method, merger based on FPGA and database system

Publications (1)

Publication Number Publication Date
WO2019165901A1 true WO2019165901A1 (en) 2019-09-06

Family

ID=67805950

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/075322 WO2019165901A1 (en) 2018-03-01 2019-02-18 Data merging method, fpga-based merger and database system

Country Status (2)

Country Link
CN (1) CN110309138B (en)
WO (1) WO2019165901A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033396A (en) * 2023-10-08 2023-11-10 北京凌云雀科技有限公司 Redis-based large Key processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130232296A1 (en) * 2012-02-23 2013-09-05 Kabushiki Kaisha Toshiba Memory system and control method of memory system
CN103744617A (en) * 2013-12-20 2014-04-23 北京奇虎科技有限公司 Merging and compressing method and device for data files in key-value storage system
CN103761276A (en) * 2014-01-09 2014-04-30 大唐移动通信设备有限公司 Tree-structure data comparison displaying method and device
CN103812877A (en) * 2014-03-12 2014-05-21 西安电子科技大学 Data compression method based on Bigtable distributed storage system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4319225A (en) * 1974-05-17 1982-03-09 The United States Of America As Represented By The Secretary Of The Army Methods and apparatus for compacting digital data
CN103353891B (en) * 2013-07-05 2017-03-29 北京人大金仓信息技术股份有限公司 Data base management system and its data processing method
CN105989129B (en) * 2015-02-15 2019-03-26 腾讯科技(深圳)有限公司 Real time data statistical method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130232296A1 (en) * 2012-02-23 2013-09-05 Kabushiki Kaisha Toshiba Memory system and control method of memory system
CN103744617A (en) * 2013-12-20 2014-04-23 北京奇虎科技有限公司 Merging and compressing method and device for data files in key-value storage system
CN103761276A (en) * 2014-01-09 2014-04-30 大唐移动通信设备有限公司 Tree-structure data comparison displaying method and device
CN103812877A (en) * 2014-03-12 2014-05-21 西安电子科技大学 Data compression method based on Bigtable distributed storage system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033396A (en) * 2023-10-08 2023-11-10 北京凌云雀科技有限公司 Redis-based large Key processing method and device
CN117033396B (en) * 2023-10-08 2024-01-19 北京凌云雀科技有限公司 Redis-based large Key processing method and device

Also Published As

Publication number Publication date
CN110309138A (en) 2019-10-08
CN110309138B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
US9268783B1 (en) Preferential selection of candidates for delta compression
US8819335B1 (en) System and method for executing map-reduce tasks in a storage device
US9262434B1 (en) Preferential selection of candidates for delta compression
US8972672B1 (en) Method for cleaning a delta storage system
US9405764B1 (en) Method for cleaning a delta storage system
US9092321B2 (en) System and method for performing efficient searches and queries in a storage node
US10860245B2 (en) Method and apparatus for optimizing data storage based on application
US9400610B1 (en) Method for cleaning a delta storage system
US9026740B1 (en) Prefetch data needed in the near future for delta compression
CN103020205A (en) Compression and decompression method based on hardware accelerator card on distributive-type file system
US10223364B2 (en) Managing a binary object in a database system
US11977548B2 (en) Allocating partitions for executing operations of a query
CN113312415A (en) Near memory acceleration for database operations
CN112416654B (en) Database log replay method, device, equipment and storage medium
CN107423425B (en) Method for quickly storing and inquiring data in K/V format
CN111625531B (en) Merging device based on programmable device, data merging method and database system
US9116902B1 (en) Preferential selection of candidates for delta compression
US10635596B2 (en) Information processing device, access controller, information processing method, and computer program for accessing memory having access units of different sizes
WO2019165901A1 (en) Data merging method, fpga-based merger and database system
CN105068875A (en) Intelligence data processing method and apparatus
US20240070120A1 (en) Data processing method and apparatus
US20230418827A1 (en) Processing multi-column streams during query execution via a database system
CN111459400B (en) Method and apparatus for pipeline-based access management in storage servers
US11249916B2 (en) Single producer single consumer buffering in database systems
Zhang et al. A Compatible LZMA ORC-Based Optimization for High Performance Big Data Load

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19760253

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19760253

Country of ref document: EP

Kind code of ref document: A1