CN112131140B

CN112131140B - SSD-based key value separation storage method supporting efficient storage space management

Info

Publication number: CN112131140B
Application number: CN202011018307.6A
Authority: CN
Inventors: 王冲; 刘莉; 张扬; 周可; 牛中盈; 滕海; 李春花; 张洲; 王颖
Original assignee: Aerospace Science And Technology Network Information Development Co ltd; Huazhong University of Science and Technology; Beijing Institute of Computer Technology and Applications
Current assignee: Aerospace Science And Technology Network Information Development Co ltd; Huazhong University of Science and Technology; Beijing Institute of Computer Technology and Applications
Priority date: 2020-09-24
Filing date: 2020-09-24
Publication date: 2023-07-14
Anticipated expiration: 2040-09-24
Also published as: CN112131140A

Abstract

The invention relates to a SSD-based key value separation storage method supporting efficient storage space management, which comprises the following steps: dividing a value storage space into equal-length segments, constructing a segment manager to manage invalidation and valid states of all data segments, establishing a value storage invalidation offset set and a key storage invalidation offset set for each segment, and performing available segment cache and half invalidation segment cache, wherein the value storage invalidation offset set is used for recording invalidation value metadata discarded in compression operation of key storage so as to assist in space recovery of the value storage; the key store failure offset set is used to record offsets in the key store that remain in the data segment being reclaimed after passive garbage collection, and these locations are not reclaimed again and are therefore discarded directly if they are collected in the key store. According to the invention, the key storage part collects the invalid key value pair discarded in the downward compression operation, and an efficient value storage space manager is constructed, so that the lightweight garbage collection operation is realized, and the influence of GC operation in the value storage on the write operation of the system foreground is further lightened.

Description

SSD-based key value separation storage method supporting efficient storage space management

Technical Field

The invention belongs to a computer storage system, and particularly relates to a key value separation storage method supporting efficient storage space management based on SSD.

Background

Persistent key storage plays a vital role in modern data-intensive storage systems and applications, such as messaging, e-commerce, search indexing, advertising, and the like. A Log-structured merge tree (Log-structuredmerge tree, LSM-tree) is a data structure based on disk write optimization proposed in 1996, which obtains considerable write performance by converting random write into sequential write, and provides reliable query performance by ensuring that data in disk is orderly, and is one of the data structures adopted in the current persistent key value storage. Since 2006 Google published a Bigtable paper of the distributed key value storage system, and after a subsequent open-source single key value storage engine LevelDB, facebook optimizes based on the LevelDB, and provides an open-source single key value storage engine RocksDB, and HBase is realized in the Hadoop ecosystem based on the Bigtable open source.

The key value storage system based on the LSM-tree is integrally formed by a memory component and a disk component, and data in the two components are ordered. The written data is firstly cached in the memory component, when the data volume of the memory component reaches a certain threshold value, the data is batch-persisted to the disk component, and the data covered by the key range in the disk component is merged and ordered, and the operation is called compression (compression). The compression operation performs merging and sorting on the data, and recovers the dead space while ensuring the ordering among the data. As the volume of data grows, LSM-tree based key-value storage systems require a large number of compression operations to maintain data organization, thereby providing good query performance. To amortize the overhead of compression operations, modern LSM-tree based key-value storage systems design disk assemblies as multi-layer structures. However, a large number of compression operations can severely impact the write performance of the system and cause significant write amplification. Write amplification is defined as the ratio of the total amount of written data performed to the amount of data actually written in the key-value store. Thus, a number of optimized LSM-tree based key-value storage systems have emerged, such as HyperLevelDB, bLSM, LSM-trie, wisckey, TRIAD, pebblesDB, SILK.

Wisckey is a key-value separated persistent key-value storage system, which comprises a value storage part and a key storage part. The value storage is a reusable log used for storing key value pair data; the key store is a key-value store system based on an LSM-tree for storing data indexes. Wisckey is an SSD optimized data layout, is suitable for application scenes with larger values, and avoids serious write performance influence and write amplification caused by compression operation through a key value separation mode. When the value storage space is consumed, wisckey needs to perform space reclamation on the value storage, reclaiming the available space by eliminating the stale data, an operation called garbage reclamation. The specific steps of garbage collection operation are 1) continuously reading batch data in a value store, 2) searching keys of the data one by one in a key store, judging the validity of the data, 3) rewriting valid data into the value store, and 4) updating index data into the key store. Garbage collection operations are very costly, and in write-intensive environments, frequent garbage collection will cause severe degradation of write performance and large amounts of data overwriting.

Disclosure of Invention

The invention aims to provide a SSD-based key value separation storage method supporting efficient storage space management, which is used for solving the problem that garbage in the prior art has adverse effects on system writing performance.

The invention discloses a SSD-based key value separation storage method supporting efficient storage space management, which comprises the following steps: dividing a value storage space into equal-length segments, constructing a segment manager to manage failure and valid states of all data segments, and establishing a value storage failure offset set and a key storage failure offset set for each segment, wherein the value storage space has three types of segments which are respectively: full failure section, partial failure section and effective section; the full failure section indicates that all data in the section are invalid data; the partial failure section indicates that partial data in the section is invalid data; the effective section indicates that all data in the section are effective data; performing available segment cache and semi-invalidation segment cache, wherein the available segment cache is used for caching the initial offset of the full invalidation segment and the initial offset of the segment of the reclaimed space after the passive garbage is reclaimed, the semi-invalidation segment cache is used for caching the partial invalidation segment of which the quantity of invalidation data accounts for half or more of the total quantity of data in the segment, and the value storage offset valid bitmap is used for marking the validity of the data in the data segment in the value storage; and the key storage offset valid bitmap is used for marking the position of the data segment originally marked as valid data after passive garbage collection, and judging whether the position is a recovered position or not when the invalid offset discarded in the key storage is collected.

According to one embodiment of the SSD-based key-value separate storage method supporting efficient storage space management of the present invention, the segment size is aligned with the page size of the file system and is larger than the single key-value pair size.

According to one embodiment of the SSD-based key-value separation storage method supporting efficient storage space management of the present invention, for failure offsets, offsets of failure data in a value store are continuously collected from compression operations of the key store.

According to one embodiment of the SSD-based key-value separation storage method supporting efficient storage space management, the key-value separation storage method is divided into active garbage collection and passive garbage collection according to segment states and value storage space use states; the active garbage collection includes: when the data in the cache is brushed into the SSD, if the full failure section exists, the data in the cache is written into the full failure section; the passive waste recycling includes: when the value storage space is used up, triggering a space recovery operation, selecting batch data segments, judging the validity of data by matching keys and offset in the segments with key value pairs in a key storage, discarding invalid data, rewriting valid data into the value storage, writing the key of the rewritten data and new offset into the key storage, and putting the initial offset of the recovered segments into an available segment cache.

According to one embodiment of the SSD-based key value separation storage method supporting efficient storage space management, a value storage space is divided into a main data area and a reserved data area, and effective data generated in the passive garbage collection is rewritten into the reserved data area; the main data area is used for storing new data; the reserved data area is used for storing effective data generated in passive garbage collection.

According to an embodiment of the SSD-based key value separation storage method supporting efficient storage space management of the present invention, the main data area needs to be larger than the reserved data area. According to an embodiment of the SSD-based key value separation storage method supporting efficient storage space management of the present invention, the main data area needs to be larger than the reserved data area.

According to the embodiment of the SSD-based key-value separation storage method supporting efficient storage space management, validity state metadata of data segments in value storage are recorded in a segment manager, so that the states of the segments can be acquired when a storage system is restored, and the metadata in the segment manager are periodically persisted to a disk.

According to one embodiment of the SSD-based key-value separation storage method supporting efficient storage space management of the present invention, a segment status log records a full failure segment offset, a half failure segment offset, and a value storage failure offset set, the offset of the back segment and the key storage failure offset set each time passive garbage collection.

According to one embodiment of the SSD-based key-value separation storage method supporting efficient storage space management of the present invention, discarded invalid key-value pairs are collected from compression operations in a key store, offsets and lengths of invalid data in a value store are extracted therefrom, starting offsets and intra-segment offsets of the belonging segments are obtained by the offsets and segment granularities, and segment granularities of the offset percentages, and the intra-segment offsets and the lengths are combined into a value store invalidation offset set in which invalidation offset data is inserted into the segment.

According to an embodiment of the SSD-based key-value separation storage method supporting efficient storage space management of the present invention, when data in a write cache is flushed back to value storage, the determining of a write location includes: if the total capacity of the middle section of the available section buffer is larger than the write buffer, selecting a candidate section, obtaining a starting offset, and brushing write buffer data back to the candidate section; if no segment is available, a writing position is obtained in the main data area, and if the writing position is not at the tail part of the main data area, the writing position is written; otherwise, triggering passive garbage collection operation, selecting a candidate segment after space collection is completed, obtaining initial offset, and brushing write cache data back into the candidate segment.

The invention provides a SSD-based key value separation storage method supporting efficient storage space management, which constructs an efficient value storage space manager by collecting invalid key value pairs discarded in downward compression operation (compression) in a key storage part, realizes lightweight garbage collection (Garbage Collection, GC) operation, and further reduces the influence of the GC operation in the value storage on system foreground write operation. In key store, the value of the key-value pair discarded in the action operation is the offset of the stale data in the value store. In order to effectively utilize the collected failure offset, the value storage space is divided into equal-length segments, the failure offset is managed in segment units, and GC operation of the value storage space is performed in segment units. The invention can improve the writing performance and reduce the writing amplification under the condition of triggering the workload of the dense GC.

Drawings

FIG. 1 is a system architecture and read/write flow diagram of a Wisckey;

FIG. 2 is a prototype construction diagram of SRKV;

FIG. 3 is an organizational chart of a value storage space;

FIG. 4 is a diagram showing the passive garbage collection operation;

FIG. 5 is a diagram of an example of the change in the set of value store invalidation offsets and the set of key store invalidation offsets for a single segment in the segment manager after performing two garbage collection operations;

fig. 6 is a diagram showing a process of determining a writing position of writing cache data.

Detailed Description

For the purposes of clarity, content, and advantages of the present invention, a detailed description of the embodiments of the present invention will be described in detail below with reference to the drawings and examples.

The Wisckey is based on the persistent key value storage of the LSM-tree, such as the serious problems of read/write amplification and write performance degradation caused by the compression operation of the LevelDB, and the provided persistent key value storage system for SSD-oriented optimized key value separation is applicable to a scene with larger key value pairs. Wisckey includes two parts, value store and key store. The value storage is a reusable log used for storing key value pair data; the key store is a key-value store system based on an LSM-tree for storing data indexes. Fig. 1 is a system architecture and read/write flow diagram of a wiskey, as shown in fig. 1,

when writing data, firstly writing the data into a value storage, and then writing the index data into a key storage, the specific steps are as follows:

writing data into a write cache, if the write cache is full, firstly brushing the data in the write cache back to a disk, executing the steps (2) - (3), and writing the data into the write cache;

brushing the data in the write cache back to the disk, extracting the key, the offset and the length of the data, and generating index data < key, < offset, length >;

writing the batch index data into a write cache (called a Memtable) stored in a key, and performing steps (4) - (5) if the Memtable is full, wherein the steps are followed by a system background operation generated by writing the data;

if the Memtable is full, converting the Memtable into an invariable Memtable, and generating a new Memtable to receive the data;

brushing the invariable Memable back to the first layer (called L0 layer) of the LSM-tree disk assembly, triggering the compression operation if the number of files of the L0 layer reaches a threshold value, and performing the step (6);

selecting all files with key range coverage in the L0 layer and files with key range coverage in the L1 layer to be combined and ordered, and writing the result back to the L1 layer; if the L1 layer size reaches the threshold, the L1 layer compression operation is triggered, and the data is always compressed downwards.

When reading data, firstly searching the key in the key storage, and after obtaining the offset and the length of the data corresponding to the key, reading the data in the dereferencing storage. The order in which the data is looked up in the key store is: 1) Memtable, 2) immutable Memtable, 3) L0 layer has all files covered by key range, find these files in order of new and old, 4) L1 layer and below have at most one file covered by key range. Searching keys in sequence, and if the keys are found, returning values of the keys, including data offset and data length; if not, the key is returned to not found.

When the value storage space is consumed, wisckey needs to perform space reclamation on the value storage, reclaiming the available space by eliminating the stale data, an operation called garbage reclamation. The specific steps of the garbage recycling operation are as follows:

continuously reading bulk data (e.g., 64 MB) from a starting location of the garbage collection operation in a value storage log;

searching keys of the data one by one in the key storage, matching the offsets of the keys in the two storages, if the keys are the same, indicating that the data are valid, and if the keys are different, indicating that the data are invalid;

rewriting the valid data into the write data starting position of the value storage log;

and generating index data of the rewriting data, and updating the index data to the key storage in batches.

According to the above method, an embodiment of the SSD-based key value separation storage method supporting efficient storage space management of the present invention includes:

the segment granularity includes:

the storage space of the value store log is divided into equal length segments, the segment size should be aligned with the page size of the file system (i.e., 4 KB) and must be larger than a single key pair size, in order to accommodate the application environment of variable length key sizes, the segment granularity is aligned with the write cache size (e.g., 1 MB).

The segment metadata includes:

maintaining metadata information of all segments in a memory, wherein the metadata information comprises initial offset of each segment, key value pair number in the segment and failure offset set in the segment;

collecting invalid key value pairs discarded by compression operation in key storage based on LSM-tree, extracting values from the invalid key value pairs, obtaining offset and length of invalid data in the value storage, obtaining initial offset of a section according to the data offset, and combining the intra-section offset and the length into an invalid offset set for inserting the invalid offset into the section.

Active garbage collection includes:

when all data within a segment fails, the data is flushed back to the segment in an upcoming write cache flush back disk operation.

Passive garbage collection includes:

when the value storage log space is used up, triggering the passive garbage collection operation, collecting space threshold values such as 64MB, selecting a part of invalid segments meeting the capacity, judging the validity of data by matching with key value pairs in the key storage, and rewriting valid data, which causes two problems: 1) The rewritten data is not aligned Duan Daxiao, so when implementing SRKV, the value storage space should be divided into two parts, one part for storing new data, i.e., the main data area, and the other part for storing rewritten data after the passive garbage collection operation, i.e., the reserved data area; 2) The index data of the data in the key storage is invalid, but is not discarded yet, and is collected in the subsequent compression process, so that a invalidation offset set existing in the segment in the key storage needs to be added in segment metadata when implementing SRKV. Finally, index data write key storage of the rewriting data is generated.

FIG. 2 is a prototype architecture diagram of SRKV, as shown in FIG. 2, a key value separation storage architecture of SRKV: 1) A segment manager is added between key storage and value storage to maintain state information of all segments, and segments are managed and used to realize efficient storage space management; 2) Dividing the original reusable log into a main data area and a reserved data area; 3) And adding a segment state log, and periodically lasting segment state change for restoring the segment state when the database is restored.

The specific implementation of SRKV is as follows:

the segment manager is structured to maintain and manage metadata and status for all data segments within the main data region, including segment granularity, segment metadata arrays, available segment caches, and semi-dead segment caches. The number of segments may be obtained according to the main data area capacity and segment granularity. The metadata for each segment includes a total amount of data within the segment, a set of value storage failure offsets, and a set of key storage failure offsets.

Fig. 3 is an organizational chart of a value storage space, and as shown in fig. 3, the value storage space is divided into a main data area and a reserved data area. The main data area is used for storing new data, and recovering and reusing the new data in a section unit; the reserved data area is used for storing effective data generated in passive garbage collection operation, setting a head pointer and a tail pointer, and additionally writing and recycling space according to a traditional log. To provide enough time to accumulate the failure offset, the main data area needs to be larger than the reserved data area, e.g., the storage space is divided by 7:3.

The segment status log records the full failure segment offset, the half failure segment offset, and the set of value storage failure offsets, the offset of the post-passive garbage collection segment each time, and the set of key storage failure offsets.

Collecting discarded invalid key value pairs from compression operation in the key storage, extracting the offset and length of invalid data in the value storage, and performing a hash method: 1) Offset/segment granularity, 2) offset% segment granularity, to obtain the starting offset and the intra-segment offset of the segment to which it belongs, and combining the intra-segment offset and the length into a value for which failure offset data is inserted into the segment to store the failure offset set.

FIG. 4 is a diagram illustrating a passive garbage collection operation, as shown in FIG. 4, with a progressive failure offset, three states of segments occur in the segment manager: 1) full failure section, 2) partial failure section, 3) active section. When the value of a segment stores the failure offset number in the failure offset set to be equal to the total data amount in the segment, when judging that the segment is full of the failure segment, the starting offset of the segment is put into the available segment cache. And triggering the active garbage collection operation when the next time of flushing the cache data. When the cache data is refreshed, if the space of the main data area of the value storage is found to be used up, triggering passive garbage collection operation, and selecting a part of failure sections meeting a collection space threshold (such as 64 MB) from the half failure section cache; if the total capacity of the acquired segments is smaller than the threshold value of passive garbage collection, selecting the rest part of invalid segments by traversing segment metadata data; reading all data in the segment and traversing, skipping positions appearing in the value storage failure offset set, acquiring keys, the in-segment offset and the data length one by one, putting the in-segment offset and the length into the key storage failure offset set, searching the key in the key storage, and judging that the data is valid if the offset and the length of the value in the found key value pair are the same as those in the value storage; rewriting effective data to tail pointer of reserved data area, and generating new index data to write into key storage; the starting offset of the segment after space reclamation is put into the available segment buffer and the metadata thereof is reset: and clearing the total data quantity in the segment, and clearing the value storage failure offset set. FIG. 5 is a diagram of an example of the change in the set of value store invalidation offsets and the set of key store invalidation offsets for a single segment in the segment manager after two garbage collection operations are performed, as in FIG. 5.

FIG. 6 is a diagram illustrating a process of determining a writing position of write buffer data, where two garbage collection policies change a flushing operation of the write buffer, and if an available segment buffer is not empty, a start offset of a segment is selected to flush the write buffer data back to the segment, as shown in FIG. 6; if no segment is available, a writing position is obtained in the main data area, and if the writing position is not at the tail part of the main data area, the writing position is written; otherwise, triggering passive garbage collection operation, selecting a section starting offset after space collection is completed, and brushing data back to the position.

The database instance recovery process of the SRKV comprises the following steps:

the key storage adopts key value storage based on LSM-tree, and has a recovery function;

value store recovery for normal system exit:

reading all segment states from the segment state log, and keeping the latest state of each segment; head/tail pointers for reserved data areas are obtained from key stores.

Recovery of value storage that resulted in a system crash in the event of a system, device failure:

reading all segment states from the segment state log, and keeping the latest state of each segment; acquiring a final segment operation, if the brushing and writing caching operation is not completed, reading a key in the segment to match with data in a key storage, and if the key is not found and the segment is a full failure segment, considering that the data is not written back and is lost; if the offsets are different, the representation values are stored as new data, the index data is not updated, and the index data updating operation is carried out; if the passive garbage collection operation is not completed, acquiring all candidate blocks, carrying out GC operation again, acquiring a head pointer of a reserved data area from a key storage, and writing effective data into the position.

The invention aims at writing dense environment, in particular to an efficient storage space management technology for key value separation storage under updating dense application, and realizes a new key value separation storage SRKV based on the technology. The SRKV is characterized in that a segment manager is added between a key storage and a value storage on the existing key value separation storage architecture comprising the key storage and the value storage, so that failure offset is collected from the key storage, and GC operation in the value storage is assisted. Based on the segment status, two space reclamation strategies, active GC and passive GC, were designed. Since the data rewritten at the time of the passive GC cannot be aligned Duan Daxiao, the value storage is divided into a main data area storing new data and a reserved data area storing data rewritten after the passive GC.

The invention is characterized in that a key value separation storage (Segment-conscious recycledkey-value separation store, SRKV) taking a data Segment as a granularity recycling and reusing value storage space is constructed, a Segment manager is added between the key storage and the value storage on the existing key value separation storage framework comprising the key storage and the value storage, so as to collect failure offset from the key storage and assist GC operation in the value storage; the value storage space is divided into a main data area and a reserved data area to distinguish between storing hot data and overwriting data after garbage collection operation. The invention can improve the writing performance and reduce the writing amplification under the working load of triggering the dense garbage recycling operation.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. The SSD-based key value separation storage method supporting efficient storage space management is characterized by comprising the following steps: dividing a value storage space into equal-length segments, constructing a segment manager to manage failure and valid states of all data segments, and establishing a value storage failure offset set and a key storage failure offset set for each segment, wherein the value storage space has three types of segments which are respectively: full failure section, partial failure section and effective section, wherein the full failure section represents that all data in the section are invalid data; the partial failure section indicates that partial data in the section is invalid data; the effective section indicates that all data in the section are effective data;

performing available segment cache and semi-invalidation segment cache, wherein the available segment cache is used for caching the initial offset of the full invalidation segment and the initial offset of the segment of the reclaimed space after passive garbage reclamation, the semi-invalidation segment cache is used for caching the partial invalidation segment of which the quantity of invalidation data accounts for half or more of the total quantity of data in the segment, and the value storage offset valid bitmap is used for marking the validity of the data in the data segment in the value storage; the key storage offset valid bitmap is used for marking the position of the data segment originally marked as valid data after passive garbage collection, and judging whether the position is a recovered position or not when the invalid offset discarded in the key storage is collected;

wherein, the liquid crystal display device comprises a liquid crystal display device,

according to the segment state and the value storage space use state, the method is divided into active garbage collection and passive garbage collection;

the active garbage collection includes: when the data in the cache is brushed into the SSD, if the full failure section exists, the data in the cache is written into the full failure section;

the passive waste recycling includes: when the value storage space is used up, triggering space recovery operation, selecting batch data segments, judging the validity of data by matching keys and offset in the segments with key value pairs in the key storage, discarding invalid data, rewriting valid data into the value storage, writing the key of the rewritten data and new offset into the key storage, and putting the initial offset of the recovered segments into an available segment cache;

dividing a value storage space into a main data area and a reserved data area, and rewriting effective data generated in the passive garbage collection into the reserved data area; the main data area is used for storing new data; the reserved data area is used for storing effective data generated in passive garbage collection.

2. The SSD-based key value split storage method supporting efficient storage space management of claim 1, wherein a segment size is aligned with a page size of a file system and is larger than a single key value pair size.

3. The SSD-based key value separation storage method supporting efficient storage space management of claim 1, wherein for failure offsets, offsets of failure data in the value store are continuously collected from compression operations of the key store.

4. The SSD-based key value separation storage method supporting efficient storage space management of claim 1, wherein the main data area is larger than the reserved data area.

5. The SSD-based key value separation storage method supporting efficient storage space management of claim 1, wherein validity state metadata of the data segments in the value store is recorded in the segment manager so that the state of the segments can be acquired when the storage system is restored, the metadata in the segment manager being periodically persisted to the disk.

6. The SSD-based key value separation storage method supporting efficient storage space management of claim 1, wherein the segment state log records a full failure segment offset, a half failure segment offset, and a value storage failure offset set, the offset of the post segment and the key storage failure offset set each time passive garbage collection.

7. The SSD-based key value separation storage method supporting efficient storage space management of claim 6, wherein discarded invalid key value pairs are collected from compression operations in the key store, wherein offsets and lengths of invalid data in the value store are extracted therefrom, wherein a starting offset and an intra-segment offset of the segment of interest are obtained by the offsets and the segment granularity, and the segment granularity of the offset percentage, and wherein the intra-segment offsets and the lengths are combined into a value store invalidation offset set in which invalidation offset data is inserted into the segment.

8. The SSD-based key value separation storage method supporting efficient storage space management of claim 1, wherein when the data in the write cache is swished back to the value storage, the determining of the write location includes: if the total capacity of the middle section of the available section buffer is larger than the write buffer, selecting a candidate section, obtaining a starting offset, and brushing write buffer data back to the candidate section; if no segment is available, a writing position is obtained in the main data area, and if the writing position is not at the tail part of the main data area, the writing position is written; otherwise, triggering passive garbage collection operation, selecting a candidate segment after space collection is completed, obtaining initial offset, and brushing write cache data back into the candidate segment.