CN114741028A - OCSD-based persistent key value storage method, device and system - Google Patents

OCSD-based persistent key value storage method, device and system Download PDF

Info

Publication number
CN114741028A
CN114741028A CN202210272087.2A CN202210272087A CN114741028A CN 114741028 A CN114741028 A CN 114741028A CN 202210272087 A CN202210272087 A CN 202210272087A CN 114741028 A CN114741028 A CN 114741028A
Authority
CN
China
Prior art keywords
key value
data block
sstable
data blocks
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210272087.2A
Other languages
Chinese (zh)
Inventor
童薇
冯丹
詹天奇
黄栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202210272087.2A priority Critical patent/CN114741028A/en
Publication of CN114741028A publication Critical patent/CN114741028A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0616Improving the reliability of storage systems in relation to life time, e.g. increasing Mean Time Between Failures [MTBF]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes

Abstract

The invention discloses a method, a device and a system for storing persistent key values based on OCSSD, belonging to the field of key value storage, wherein the compact operation comprises the following steps: reading SSTable selected from the L layer and SSTable which is overlapped with the selected SSTable in the L +1 layer in a key value range to a memory, and sequencing key value pairs to obtain an ordered key value pair sequence; identifying the key value pair subsequences in the ordered key value pair sequence which are completely the same as the read original data blocks as reusable data blocks, and organizing the rest key value pairs into data blocks to be written back; for the reusable data blocks, allocating first logical addresses for the reusable data blocks, and mapping the first logical addresses to first physical addresses of the corresponding original data blocks in the OCSSD; and for the data block to be written back, writing the data block into a free second physical address in the L +1 layer, allocating a second logical address to the data block, and mapping the data block to the mapping relation of the second physical address. The invention can effectively reduce the data writing amount in the compact operation process, thereby reducing the influence on the writing performance and the SSD life.

Description

OCSD-based persistent key value storage method, device and system
Technical Field
The invention belongs to the field of key value storage, and particularly relates to an OCSSD-based persistent key value storage method, device and system.
Background
Persistent key-value (KV) storage is an integral part of modern large-scale storage infrastructure for storing large amounts of unstructured data. A log-structured merge tree (LSM-tree) is one of the most popular data structures in key-value store implementations because it converts random writes to sequential writes while supporting efficient single-point and range queries. To speed up read operations, the LSM-tree key-value storage system maintains key-value pairs in a multi-tiered structure. In processing write and update operations, the LSM-tree key-value storage system first buffers incoming updates in the memory buffer and dumps the entire buffer in bulk to persistent storage when the buffer is full. The write-in and update operations of the LSM-tree key value storage system are additionally written in a log structure mode, and old data cannot be covered by newly written data. Therefore, the LSM-tree key-value storage system is widely used in large-scale production environments, including BigTable and LevelDB by Google, Cassandra and rocksbb in Facebook, and HBase in Twitter.
The LSM-tree key value storage takes SSTable as a node in a tree structure, the SSTable is a storage file on a hard disk and comprises a plurality of data blocks, and each data block comprises a plurality of key value pairs; the SSTable stores key value pairs in order, in the tree structure of the LSM-tree, the SSTable at the level0 layer is directly generated persistently by Immunable Memtable in a memory, because the SSTable is not merged with other files at the current layer, key values in the SSTable at the level0 layer can be overlapped, SSTable files at the other layers are merged by the SSTable files at the current layer and the previous layer, and the key values cannot be overlapped. In the LSM-tree, when the amount of key-value pair data of one layer reaches its upper limit of capacity, the key-value pair will be merged with the key-value pair data of the next layer, which is called compact operation. As shown in fig. 1, this operation is divided into three steps: firstly, it designates an SSTable in Li layer (such as L2 layer), and reads out the SSTable overlapping with the key value range in Li +1 layer into memory; secondly, merging and sequencing the key value pairs in the SSTable read into the memory, merging and sequencing the key value pairs with the same keys in the process, discarding the overdue key value pairs, keeping the latest key value pairs, discarding the invalid key value pairs according to the deletion marks, and rewriting the effective key value pairs into a new SSTable file according to a certain sequencing (such as dictionary sequencing) of the keys; finally, all SSTable is written back to Li + 1. It can be seen that the compact operation in the LSM-tree key-value store introduces a large amount of extra I/O, resulting in I/O scaling problems.
With the rapid development of NAND flash technology, the characteristics of low access latency, high bandwidth and low cost of the SSD are making Solid State Disks (SSD) gradually replace HDDs in LSM-tree key-value stores, however, since the LSM-tree key-value stores trigger frequent transactions when receiving a large number of write/update operations, the LSM-tree key-value stores on the SSD have the problem of high I/O amplification in write-intensive workloads, which affects the write performance of the key-value store engine, and the additional data writes also affect the lifetime of the SSD.
Disclosure of Invention
Aiming at the defects and improvement requirements of the prior art, the invention provides a method, equipment and a system for storing a persistent key value based on an OCSSD (online storage system), and aims to solve the problem that the writing performance of a key value storage engine and the service life of an SSD are influenced by the compact operation of an LSM-tree key value storage due to high I/O amplification.
In order to achieve the above object, according to one aspect of the present invention, there is provided an ocsd-based persistent key value storage method, which organizes data by using an LSM-tree, wherein a compact operation includes the following steps;
key value pair sorting step: reading the SSTable selected from the L layer and the SSTable which is overlapped with the selected SSTable in the L +1 layer in the key value range from the OCSSD into a memory, and sequencing the key value pairs in the SSTable read into the memory to obtain an ordered key value pair sequence; the read data blocks in the SSTable are marked as original data blocks; l and L +1 are layer serial numbers in the LSM-tree;
a data detection step: identifying key value pair subsequences which are completely the same as the original data blocks in the ordered key value pair sequences as reusable data blocks, and organizing the rest key value pairs into data blocks to be written back;
a remapping step: for the reusable data block, distributing a new first logical address to the reusable data block, obtaining a first physical address of an original data block with the same content in the OCSSD, and establishing a mapping relation from the first logical address to the first physical address;
a write-back step: and for the data block to be written back, distributing a new second logical address for the data block to be written back, writing the new second logical address into a free second physical address in the L + 1-th layer, and establishing a mapping relation from the second logical address to the second physical address.
The method can detect data after the compact operation sorts the key value pairs of the read SSTable, identify the data blocks with the same content as the original data blocks from the sorted key value pair sequence as reusable data blocks, and map the newly allocated logical addresses to the physical addresses of the original data blocks in a data remapping mode for the reusable data blocks without writing the reusable data blocks into the OCSSD, thereby effectively reducing the data writing amount in the compact operation process, further reducing the I/O amplification and further reducing the influence on the writing performance of a key value storage engine and the service life of the SSD.
Further, identifying a key-value pair subsequence in the ordered sequence of key-value pairs that is identical to the original data block as a reusable data block, comprising:
sliding a sliding window with the same size as the data block on the ordered key value pair sequence, judging whether an original data block which is the same as the key value pair subsequence in the sliding window exists or not after each sliding window slides to a position, if so, identifying the key value pair subsequence in the sliding window as a reusable data block, and sliding the sliding window to a next position by taking the size of the data block as the sliding length; if not, the sliding window is slid to the next position by taking 1 as the sliding length.
The reusable data block is identified from the sorted key value pair sequence by utilizing the sliding window, namely the data block with the same content as the original data block, compared with the traditional duplicate removal-based method, the invention does not need to carry out complex hash value calculation in the identification process and also avoids the management expense of the hash value, therefore, the reusable data block can be identified efficiently by the method, the calculation cost is low, and the calculation expense is small.
Further, the method for determining whether the sub-sequence of key values in the sliding window is the same as the original data block includes:
and comparing the key values of the head and tail key value pairs in the sliding window with the key values of the head and tail key value pairs of the original data block respectively, and if the key values of the two key value pairs are correspondingly equal, judging that the key value pair subsequence in the sliding window is the same as the original data block.
Because the key value pairs in the SSTable are ordered, when the head and tail key value pairs of the ordered key value pair sequence in the sliding window are correspondingly equal to the key values of the head and tail key value pairs of a certain original data block read from the SSTable, the ordered key value pair sequence in the sliding window is completely identical to the key value pairs in the original data block.
Further, the ocsd-based persistent key value storage method provided by the present invention further includes:
maintaining a reference count for each physical address in the OCSSD for recording a count of logical addresses pointing to the physical address;
when the reference count of a certain physical address is 0, the physical address is set to be invalid.
Because the invention is in the course of carrying out compact operation, for the reusable data block that is discerned, the corresponding original data block is still valid, the invention maintains the reference count to the physical address, and set the physical address as invalid only when the reference technology of some physical address is 0, have guaranteed the repeated data quoted will not be deleted and invalid by other deletion operations.
Further, in the data detecting step, after the reusable data block is identified, before the remaining key-value pairs are organized as to-be-written back to the data block, the method further includes:
and performing data alignment on the reusable data blocks and the rest key value pairs according to the size of the flash page.
According to the invention, the data alignment is carried out on the reusable data blocks and the other key value pairs in the ordered key value pairs after the ordering according to the size of the flash memory page, so that the situation of crossing physical pages during the writing of the logical pages can be avoided, and the extra reading and writing expenses caused by the misalignment can be avoided.
Further, the data block size in SSTable is consistent with the physical page size in OCSSD.
The invention sets the size of the data block in SSTable to be consistent with the size of the physical page in OCSD, and can reduce the extra times of reading and writing the flash memory.
Further, the flash physical blocks in the ocsd are managed using the super block.
The present invention manages the entire flash memory in a super block, and each write request is sequentially transmitted to each parallel unit, so that the parallelism of the OCSSDs can be maximally utilized.
According to another aspect of the present invention, there is provided an ocsd-based persistent key-value storage device that organizes data using an LSM-tree, including: the system comprises a key value pair sorting module and a data detection module which are realized in an LSM-tree KV storage engine, and a data remapping module and a data write-back module which are realized in a host FTL;
the key value pair sequencing module is used for reading the SSTable selected from the L layer and the SSTable with the key value range overlapping with the selected SSTable in the L +1 layer from the OCSSD to the memory in the compact operation, and sequencing the key value pairs in the SSTable read into the memory to obtain an ordered key value pair sequence; the read data blocks in the SSTable are marked as original data blocks; l and L +1 are layer serial numbers in the LSM-tree;
the data detection module is used for identifying the key value pair subsequences which are completely the same as the original data blocks in the ordered key value pair sequences as reusable data blocks and organizing the rest key value pairs into data blocks to be written back;
the remapping module is used for allocating a new first logical address to the reusable data block, obtaining a first physical address of an original data block with the same content as the reusable data block in the OCSD, and establishing a mapping relation from the first logical address to the first physical address;
and the write-back module is used for allocating a new second logical address for the data block to be written back, writing the data block to be written back into a free second physical address in the L +1 layer, and establishing a mapping relation from the second logical address to the second physical address.
Further, a user mode file system BlobFS is adopted in the LSM-tree KV storage engine, and a user mode driver SPDK is adopted by the host FTL.
The invention adopts the user mode file system BlobFS and the user mode driver SPDK to manage the FTL of the host, shortens the IO stack of the software and reduces the delay of the host.
According to another aspect of the present invention, a persistent key value storage system is provided, which includes an ocsd and the ocsd-based persistent key value storage device provided by the present invention.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) in the compact operation process, the reusable data blocks are identified from the ordered key value pair sequence, the logical addresses of the reusable data blocks are mapped to the physical addresses of the original data blocks in the OCSD in an address remapping mode, and the reusable data blocks are not written into the OCSD, so that the data writing amount in the compact operation process can be effectively reduced, the I/O amplification is further reduced, and the influence on the writing performance of a key value storage engine and the service life of an SSD is reduced.
(2) The invention realizes the remapping and the management of the host FTL to the repeated data based on the OCSSD, and eliminates the semantic isolation between the deduplication and the KV storage engine.
(3) The invention adopts the user mode file system BlobFS and the user mode driver SPDK to manage the FTL of the host, shortens the IO stack of the software and reduces the delay of the host.
Drawings
FIG. 1 is a diagram illustrating a process of executing a compact operation in a conventional LSM-tree key-value store;
fig. 2 is a flowchart of a persistent key value storage method based on an ocsd according to embodiment 1 of the present invention;
fig. 3 is a logic diagram of a persistent key value storage method based on an ocsd according to embodiment 1 of the present invention;
fig. 4 is a schematic diagram of a reusable data block identification method according to embodiment 1 of the present invention;
FIG. 5 is a schematic diagram of data alignment provided in embodiment 1 of the present invention;
fig. 6 is a schematic diagram of an address remapping process for a reusable data block according to embodiment 1 of the present invention;
fig. 7 is an IO stack for address remapping according to embodiment 1 of the present invention;
fig. 8 is a schematic diagram of an ocsd-based persistent key value storage device provided in embodiment 2 and a persistent key value storage system provided in embodiment 3.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In the present application, the terms "first," "second," and the like (if any) in the description and the drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In order to improve the problem that the compact operation in the LSM-tree key value storage introduces a large amount of additional I/O, which causes the problem of I/O amplification and influences the write performance of a key value storage engine and the service life of an SSD, the invention provides a persistent key value storage method, device and system based on an OCSSD, and the overall thought is as follows: in the course of the compact operation, identify the reusable data block, namely the data block completely consistent with the key value pair content of the original data block read in the ordered sequence of key value pairs, for these data blocks, keep the validity of the original data block in the SSD, but do not write these data blocks back to the SSD again, compare with traditional compact operation, has reduced the data write-in amount effectively, thus has improved the problem of write amplification, have reduced the influence on SSD life time too.
Before explaining the technical scheme of the invention in detail, related technical terms related to the invention are briefly introduced as follows:
an FTL (flash translation layer), namely a flash translation layer, establishes a connection relation between a flash storage medium and an equipment main controller, and is used for managing the flash storage medium;
an ocsd (Open-Channel SSD), that is, an Open Channel SSD, which opens an internal Channel and implements an FTL at a host end;
SDPK (storage Performance Development kit), which is a user-mode NVMe driver published by Intel; the SPDK supports an OCSSD and a host flash translation layer;
BlobFS, a simple, flat file system, can be mounted and dismounted, and can access a blob through operations such as "open", "read", "stat", and "mmap".
The following are examples.
Example 1:
an OCSD-based persistent key value storage method organizes data by using an LSM-tree.
Referring to fig. 2 and 3, in the present embodiment, the compact operation includes the following steps;
key value pair sorting step: reading the SSTable selected from the L layer and the SSTable which is overlapped with the selected SSTable in the L +1 layer in the key value range from the OCSSD into a memory, and sequencing the key value pairs in the SSTable read into the memory to obtain an ordered key value pair sequence; the read data blocks in the SSTable are marked as original data blocks; l and L +1 are layer serial numbers in the LSM-tree;
a data detection step: identifying key value pair subsequences which are completely the same as the original data blocks in the ordered key value pair sequences as reusable data blocks, and organizing the rest key value pairs into data blocks to be written back;
a remapping step: for the reusable data block, distributing a new first logical address to the reusable data block, obtaining a first physical address of an original data block with the same content in the OCSSD, and establishing a mapping relation from the first logical address to the first physical address;
a write-back step: and for the data block to be written back, distributing a new second logical address for the data block to be written back, writing the new second logical address into a free second physical address in the L + 1-th layer, and establishing a mapping relation from the second logical address to the second physical address.
During the conventional compact operation shown in fig. 1, the SSTable of the Li +1 layer with the key value range overlapping with the selected Li layer SSTable is selected to merge by taking SSTable as the granularity, which is a coarse granularity selection mode, and the coarse granularity selection mode causes a large number of key value pairs, namely irrelevant key value pairs, which do not overlap with the key value range of the Li layer SSTable to exist in the selected SSTable of the Li +1 layer, and the irrelevant key value pairs only participate in sorting and are directly written back to the Li +1 layer, so that a large amount of data is repeatedly written; such cross-hierarchy rewriting can result in a large amount of additional I/O, compromising key-value storage engine write performance and SSD lifetime.
FIG. 3 is a diagram illustrating an example of performing a compact operation using the steps provided in this embodiment, wherein the SSTable is selected at layer 2 of the LSM-tree; as can be seen from fig. 3, in this embodiment, through the data detection step and the remapping step, a data block having the same content as the read original data block is identified from the sorted key-value pair sequence, and is used as a reusable data block, the key-value pairs in the reusable data block are irrelevant key-value pairs, and for the identified reusable data block, the newly allocated logical address is mapped to the physical address of the original data block directly in a data remapping manner, without writing the reusable data block into the ocsd;
comparing the operation processes shown in fig. 1 and fig. 3, this embodiment can effectively reduce the data write amount during the compact operation process, and further reduce the I/O amplification, thereby reducing the impact on the write performance of the key-value storage engine and the SSD life.
In order to efficiently identify the reusable data block, in this embodiment, identifying a pair of key value pairs in the ordered pair of key value pairs that is identical to the original data block as the reusable data block includes:
sliding the ordered key value pair sequences by using a sliding window with the same size as the data blocks, judging whether an original data block which is the same as the key value pair subsequences in the sliding window exists or not after the original data block slides to a position, if so, identifying the key value pair subsequences in the sliding window as reusable data blocks, and sliding the sliding window to the next position by using the size of the data blocks as the sliding length; if not, the sliding window slides to the next position by taking 1 as the sliding length;
considering that key value pairs in SSTable are ordered, when the head and tail key value pairs of the ordered key value pair sequence in the sliding window are equal to the key values of the head and tail key value pairs of a certain original data block read from SSTable, the ordered key value pair sequence in the sliding window is completely the same as the key value pairs in the original data block, and based on this, this embodiment uses this as a basis for judgment to judge whether the key value pair sequence in the sliding window is the same as the original data block, which is specifically as follows:
and comparing the key values of the head and tail key value pairs in the sliding window with the key values of the head and tail key value pairs of the original data block respectively, and if the key values of the two key value pairs are correspondingly equal, judging that the key value pair subsequence in the sliding window is the same as the original data block.
The identification process of the reusable data block is further described with reference to fig. 4 as an example. As shown in FIG. 4, each data block includes four key-value pairs; in the compact operation, the key values of the four original data blocks read out are D1 ═ D0 (a0, a1, a2, A3), D2 ═ B0, B1, B2, B3), D3 ═ a1, a2, A3, a4, and D4 ═ a5, A6, a7, A8, respectively, and after the key values are merged and sorted, the resulting ordered key value pair sequence is:
(A0、A1、A2、A3、A4、A5、A6、A7、A8、B0、B1、B2、B3);
sliding over the ordered sequence of key values using a sliding window of length 4;
at the initial moment, the sub-sequences of the key value pairs in the sliding window are (A0, A1, A2 and A3), the key values of the head key value pair and the tail key value pair of the sub-sequences of the key value pairs are respectively equal to the key values of the head key value pair and the tail key value pair in the original data block D1, and the sub-sequence in the sliding window at the moment is identified as a reusable data block;
sliding the sliding window backwards by 4 key value pairs, wherein the key value pair subsequence in the sliding window is (A4, A5, A6 and A7), and the original data block which is the same as the key value pair subsequence does not exist;
sliding the sliding window backwards by 1 key value pair, wherein the key value pair subsequences in the sliding window are (A5, A6, A7 and A8), the key values of the first key value pair and the last key value pair of the key value pair subsequences are respectively equal to the key values of the first key value pair and the last key value pair in the original data block D4, and identifying the subsequences in the sliding window as a reusable data block;
and sliding the sliding window backwards for 4 key value pairs, wherein the key value pair subsequences in the sliding window are (B0, B1, B2 and B3), the key values of the head key value pair and the tail key value pair of the key value pair subsequences are respectively equal to the key values of the head key value pair and the tail key value pair in the original data block D2, and the subsequences in the sliding window at the moment are identified as a reusable data block.
In the embodiment, the reusable data blocks, namely the data blocks with the same content as the original data blocks, are identified from the sorted key value pair sequence by using the sliding window, and compared with the traditional duplicate removal-based method, the embodiment does not need to perform complex hash value calculation in the identification process, and also avoids the management overhead of the hash values, so that the reusable data blocks can be efficiently identified, and the calculation cost is low and the calculation overhead is small.
After detecting the reusable data block, the embodiment further performs data alignment to ensure that the start offset and length of the reusable data block and the data block to be written back in the new and old sstables are aligned according to the size of the flash memory page;
fig. 5 is an example, the original data blocks read out are (A, B, C, E), (G, H, I, J), (R, T, X, Z), (A, C, D, E), (L, M, N, P) and (Q, S, U, Y), and after key merging and sorting, the obtained ordered key-value pair sequence is:
(A、B、C、D、E、G、H、I、J、L、M、N、P、Q、S、U、Y、Z);
the identified reusable data blocks are (G, H, I, J) and (L, M, N, P), respectively, after data alignment by flash page size as shown in FIG. 5, where the blank locations are filled with 0 s;
and performing data alignment on the reusable data blocks and the rest key value pairs in the ordered key value pairs after sequencing according to the size of the flash memory page, so that the situation of crossing physical pages during the writing of the logical pages can be avoided, and the extra reading and writing expenses caused by the misalignment can be avoided.
After data alignment is carried out, for the reusable data blocks, the logical addresses of the reusable data blocks are directly mapped to the physical addresses of the corresponding original data blocks in the OCSSD; for the data block to be written back, a new physical address is allocated in the ocsd, the data block to be written back is written back to the physical address, and the logical address of the data block to be written back is mapped to the newly allocated physical address, as shown in fig. 6;
in fig. 6, from a logical level, the composition operation in the present embodiment is the same as the conventional composition operation, and the original file, i.e., the original SSTable, is invalidated, but from a physical level, no write-back operation for the reusable data block occurs.
Because the original SSTable is set to be invalid after the completion of the compact operation, in order to avoid the reusable data blocks in the reusable data blocks from being invalid due to other delete operations, this embodiment further maintains a reference count for each physical address in the ocsd, which is used to record a count of a logical address pointing to the physical address;
when the reference count of a certain physical address is 0, the physical address is set to be invalid.
As a preferred implementation manner, in this embodiment, the size of the data block in the SSTable is set to be consistent with the size of the physical page in the ocsd, which can reduce the number of times of reading and writing the flash memory additionally;
and, managing flash physical blocks in the OCSSD using the super block;
in this embodiment, the whole flash memory is managed by a super block, and multiple flash memory physical blocks are organized into one super block, so that each write request is sent to each parallel unit in sequence, and thus, the parallelism of the ocsds can be utilized to the maximum extent;
optionally, in this embodiment, when the free superblock is less than 5%, garbage collection is started; specifically, a greedy garbage collection strategy is adopted, and garbage collection is preferentially carried out on the super block with the most invalid data.
As an optional implementation manner, in this implementation, the host FTL is managed by using the user mode file system blob fs and the user mode driver SPDK, so as to implement the above various operations; as shown in fig. 7, the IO stack performing the data remapping step is implemented by combining a blob fs and a SPDK, which effectively shortens the software IO stack and reduces host latency.
Example 2:
an OCSD-based persistent key value storage device organizes data by using an LSM-tree.
Referring to fig. 8, the key value storage device includes an LSM-tree KV storage engine and a host FTL, and the LSM-tree KV storage engine implements a key value pair sorting module and a data detection module, and the host FTL implements a data remapping module and a data write-back module; wherein:
the key value pair sequencing module is used for reading the SSTable selected from the L layer and the SSTable with the key value range overlapping with the selected SSTable in the L +1 layer from the OCSSD to the memory in the compact operation, and sequencing the key value pairs in the SSTable read into the memory to obtain an ordered key value pair sequence; the read data blocks in the SSTable are marked as original data blocks; l and L +1 are layer serial numbers in the LSM-tree;
the data detection module is used for identifying the key value pair subsequences which are completely the same as the original data blocks in the ordered key value pair sequences as reusable data blocks and organizing the rest key value pairs into data blocks to be written back;
the remapping module is used for allocating a new first logical address to the reusable data block, acquiring a first physical address of an original data block with the same content as the reusable data block in the OCSD, and establishing a mapping relation from the first logical address to the first physical address;
the write-back module is used for allocating a new second logical address for the data block to be written back, writing the data block to be written back into a free second physical address in the L +1 layer, and establishing a mapping relation from the second logical address to the second physical address;
as a preferred implementation manner, in this embodiment, a user-mode file system blob fs is used in the LSM-tree KV storage engine, and a user-mode driver SPDK is used by the host FTL;
in this embodiment, the specific implementation of each module can refer to the description in embodiment 1, and will not be repeated here.
Example 3:
a persistent key value storage system, as shown in fig. 8, includes an ocsd and the ocsd-based persistent key value storage device provided in embodiment 2 above.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A persistent key value storage method based on OCSD organizes data by using LSM-tree, and is characterized in that the compact operation comprises the following steps:
key value pair sorting step: reading the SSTable selected from the L layer and the SSTable which is overlapped with the selected SSTable in the L +1 layer in the key value range from the OCSSD to a memory, and sequencing the key value pairs in the SSTable read into the memory to obtain an ordered key value pair sequence; the read data blocks in the SSTable are marked as original data blocks; l and L +1 are layer serial numbers in the LSM-tree;
a data detection step: identifying key value pair subsequences which are completely the same as the original data blocks in the ordered key value pair sequences as reusable data blocks, and organizing the rest key value pairs into data blocks to be written back;
a remapping step: for the reusable data block, allocating a new first logical address to the reusable data block, obtaining a first physical address of an original data block with the same content in the OCSSD, and establishing a mapping relation from the first logical address to the first physical address;
a write-back step: and for the data block to be written back, allocating a new second logical address to the data block to be written back, writing the new second logical address into a free second physical address in the L + 1-th layer, and establishing a mapping relation from the second logical address to the second physical address.
2. The OCSSD-based persistent key value storage method of claim 1, wherein identifying a pair sequence of key value pairs in the ordered sequence of key value pairs that is identical to an original data block as a reusable data block comprises:
sliding the ordered key value pair sequences by using a sliding window with the same size as the data blocks, judging whether an original data block which is the same as the key value pair subsequence in the sliding window exists after the original data block slides to one position, if so, identifying the key value pair subsequence in the sliding window as a reusable data block, and sliding the sliding window to the next position by using the size of the data block as the sliding length; if not, the sliding window is made to slide to the next position by taking 1 as the sliding length.
3. The OCSSD-based persistent key value storage method of claim 2, wherein the manner of determining whether the key value pair subsequence in the sliding window is the same as the original data block comprises:
and comparing the key values of the head and tail key value pairs in the sliding window with the key values of the head and tail key value pairs of the original data block respectively, and if the key values of the two key value pairs are correspondingly equal, judging that the key value pair subsequence in the sliding window is the same as the original data block.
4. The ocsd-based persistent key value storage method of claim 1, further comprising:
maintaining a reference count for each physical address in the OCSSD for recording a count of logical addresses pointing to the physical address;
when the reference count of a certain physical address is 0, the physical address is set to be invalid.
5. The OCSSD-based persistent key value storage method according to any one of claims 1 to 4, wherein in the data detection step, after identifying the reusable data block and before organizing the remaining key value pairs to be written back to the data block, further comprising:
and performing data alignment on the reusable data blocks and the rest key value pairs according to the size of the flash page.
6. The OCSSD-based persistent key value storage method according to any one of claims 1 to 4, wherein a data block size in SSTable is consistent with a physical page size in the OCSSD.
7. The OCSSD-based persistent key value storage method according to any one of claims 1 to 4, wherein a flash memory physical block in the OCSSD is managed by using a super block.
8. An OCSD-based persistent key value storage device that organizes data using an LSM-tree, comprising: the system comprises a key value pair sorting module and a data detection module which are realized in an LSM-tree KV storage engine, and a data remapping module and a data write-back module which are realized in a host FTL;
the key value pair sequencing module is used for reading the SSTable selected from the L layer and the SSTable with the key value range overlapping with the selected SSTable in the L +1 layer from the OCSSD to the memory in the compact operation, and sequencing the key value pairs in the SSTable read into the memory to obtain an ordered key value pair sequence; the read data blocks in the SSTable are marked as original data blocks; l and L +1 are layer serial numbers in the LSM-tree;
the data detection module is used for identifying the key value pair subsequences which are completely the same as the original data blocks in the ordered key value pair sequences as reusable data blocks and organizing the rest key value pairs into data blocks to be written back;
the remapping module is configured to allocate a new first logical address to the reusable data block, obtain a first physical address of an original data block having the same content as the reusable data block in the ocsd, and establish a mapping relationship between the first logical address and the first physical address;
the write-back module is configured to allocate a new second logical address to the data block to be written back, write the data block to be written back into a free second physical address in the L +1 th layer, and establish a mapping relationship between the second logical address and the second physical address.
9. The ocsd-based persistent key-value storage device of claim 8, wherein a user-state file system blob fs is employed in the LSM-tree KV storage engine, and the host FTL employs a user-state driver SPDK.
10. A persistent key-value storage system comprising an ocsd and the ocsd-based persistent key-value storage device of claim 8 or 9.
CN202210272087.2A 2022-03-18 2022-03-18 OCSD-based persistent key value storage method, device and system Pending CN114741028A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210272087.2A CN114741028A (en) 2022-03-18 2022-03-18 OCSD-based persistent key value storage method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210272087.2A CN114741028A (en) 2022-03-18 2022-03-18 OCSD-based persistent key value storage method, device and system

Publications (1)

Publication Number Publication Date
CN114741028A true CN114741028A (en) 2022-07-12

Family

ID=82277696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210272087.2A Pending CN114741028A (en) 2022-03-18 2022-03-18 OCSD-based persistent key value storage method, device and system

Country Status (1)

Country Link
CN (1) CN114741028A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116795296A (en) * 2023-08-16 2023-09-22 中移(苏州)软件技术有限公司 Data storage method, storage device and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116795296A (en) * 2023-08-16 2023-09-22 中移(苏州)软件技术有限公司 Data storage method, storage device and computer readable storage medium
CN116795296B (en) * 2023-08-16 2023-11-21 中移(苏州)软件技术有限公司 Data storage method, storage device and computer readable storage medium

Similar Documents

Publication Publication Date Title
US20220100377A1 (en) Memory system and method of controlling memory system
US9146877B2 (en) Storage system capable of managing a plurality of snapshot families and method of snapshot family based read
CN103186350B (en) The moving method of mixing storage system and hot spot data block
TWI551989B (en) Method for managing a flash storage system
US8898371B2 (en) Accessing logical-to-physical address translation data for solid state disks
CN107391774B (en) The rubbish recovering method of log file system based on data de-duplication
CN112395212B (en) Method and system for reducing garbage recovery and write amplification of key value separation storage system
US20060218347A1 (en) Memory card
CN109800185B (en) Data caching method in data storage system
WO2014015828A1 (en) Data storage space processing method and processing system, and data storage server
CN107766374B (en) Optimization method and system for storage and reading of massive small files
WO2017149592A1 (en) Storage device
Yao et al. Building efficient key-value stores via a lightweight compaction tree
US20210311877A1 (en) Key-value store architecture for key-value devices
CN110968269A (en) SCM and SSD-based key value storage system and read-write request processing method
CN111722797B (en) SSD and HA-SMR hybrid storage system oriented data management method, storage medium and device
CN114741028A (en) OCSD-based persistent key value storage method, device and system
CN111443874B (en) Solid-state disk memory cache management method and device based on content awareness and solid-state disk
CN115203079A (en) Method for writing data into solid state disk
Doekemeijer et al. Key-Value Stores on Flash Storage Devices: A Survey
EP4307129A1 (en) Method for writing data into solid-state hard disk
US20140359228A1 (en) Cache allocation in a computerized system
CN115344201A (en) Data storage method, data query method and device
CN116364148A (en) Wear balancing method and system for distributed full flash memory system
Shu et al. Towards unaligned writes optimization in cloud storage with high-performance ssds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination