CN112148225A - NVMe SSD-based block storage caching system and method thereof - Google Patents

NVMe SSD-based block storage caching system and method thereof Download PDF

Info

Publication number
CN112148225A
CN112148225A CN202011010192.6A CN202011010192A CN112148225A CN 112148225 A CN112148225 A CN 112148225A CN 202011010192 A CN202011010192 A CN 202011010192A CN 112148225 A CN112148225 A CN 112148225A
Authority
CN
China
Prior art keywords
ssd
block
data
module
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011010192.6A
Other languages
Chinese (zh)
Other versions
CN112148225B (en
Inventor
鲍苏宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eisoo Information Technology Co Ltd
Original Assignee
Shanghai Eisoo Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eisoo Information Technology Co Ltd filed Critical Shanghai Eisoo Information Technology Co Ltd
Priority to CN202011010192.6A priority Critical patent/CN112148225B/en
Publication of CN112148225A publication Critical patent/CN112148225A/en
Application granted granted Critical
Publication of CN112148225B publication Critical patent/CN112148225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention relates to a block storage cache system based on NVMe SSD and a method thereof, the system comprises a cache pool and a block storage, the cache pool comprises a control module, a cache pool allocation module, an SSD block management module, an elimination module, a read-write module and an NVMe SSD cache module, the NVMe SSD cache module comprises a plurality of SSD blocks with consistent capacity, the bottom physical space of the block storage is composed of mechanical hard disks, a plurality of block storage data blocks in the block storage are logically integrated into corresponding LUNs, the LUNs correspond to respective SSD block management modules and respective SSD block sets, and the cache pool allocation module is used for allocating the SSD blocks to the LUNs; the SSD block management module is used for executing the operation of applying for the SSD block by the LUN and scheduling the SSD block set corresponding to the LUN; the elimination module is used for eliminating, screening and recycling the SSD blocks with the read-write heat lower than a preset threshold value in the SSD block set corresponding to the LUN to the NVMe SSD cache module. Compared with the prior art, the invention can effectively improve the read-write performance of the block storage on the basis of ensuring low cost and large capacity.

Description

NVMe SSD-based block storage caching system and method thereof
Technical Field
The invention relates to the technical field of block storage caching, in particular to a block storage caching system and a block storage caching method based on NVMe SSD.
Background
With the rapid development of computer technology, most of the enterprises use computers to operate core services nowadays, so that service data shows explosive growth. In order to ensure the traceability of business data, currently, many enterprises adopt a storage system to store their business data, and block storage is a storage mode which is widely used.
The mechanical hard disk of the HDD has the characteristics of large capacity and low price, and is always the most popular data storage medium, and many blocks of storage use the mechanical hard disk as the data storage medium. But due to the limitation of the mechanical hard disk, the requirement of the business on the read-write performance cannot be met. In the prior art, a full flash memory system is adopted to replace a mechanical hard disk so as to improve the read-write performance, but the full flash memory system has higher cost and limited capacity. Therefore, it is necessary to design a cache system that combines the low cost, large capacity and high performance of flash memory of the mechanical hard disk, so as to improve the read-write performance of the conventional block storage at a lower cost.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a block storage cache system based on NVMe SSD and a method thereof, so as to improve the read-write performance of block storage at lower cost and ensure the capacity of the block storage.
The purpose of the invention can be realized by the following technical scheme: a block storage cache system based on NVMe SSD comprises a cache pool and block storage, wherein the cache pool comprises a control module, a cache pool distribution module, an SSD block management module, an elimination module, a read-write module and an NVMe SSD cache module, the control module is respectively connected to the cache pool distribution module, the SSD block management module and the elimination module, the SSD block management module is respectively bidirectionally connected with the cache pool distribution module and the read-write module, the cache pool distribution module is bidirectionally connected with the NVMe SSD cache module, the NVMe SSD cache module is bidirectionally connected with the read-write module, the NVMe SSD cache module comprises a plurality of SSD blocks with consistent capacity, the bottom physical space of the block storage is composed of mechanical hard disks, a plurality of block storage data blocks in the block storage are logically integrated into corresponding LUNs (Logical Unit numbers) to provide an external virtual disk function, the LUNs are bidirectionally connected with the read-write module, the LUNs correspond to respective SSD block management modules and respective SSD block sets, and the control module is used for creating a cache pool and initializing the NVMe SSD cache module to obtain a plurality of SSD blocks with consistent capacity;
the cache pool allocation module is used for allocating SSD blocks in the NVMe SSD cache module to the LUNs;
the SSD block management module is used for executing the operation of applying for the SSD block by the LUN and scheduling the SSD block set corresponding to the LUN;
the elimination module is used for executing an elimination algorithm so as to screen and recover the SSD blocks with the read-write heat lower than a preset threshold value in the SSD block set corresponding to the LUN to the NVMe SSD cache module;
the read-write module is used for executing the operation of reading or writing IO data.
Further, the elimination algorithm is specifically an ARC (adaptive Replacement Cache) elimination algorithm.
Further, the NVMe SSD cache module is specifically a Raid1 disk array formed by a main NVMe SSD and a standby NVMe SSD.
A block storage caching method based on NVMe SSD comprises the following steps:
s1, constructing a Raid1 disk array based on the main NVMe SSD and the standby NVMe SSD to serve as cache pool equipment;
s2, initializing a cache pool, and establishing a global cache pool distribution module;
s3, logically integrating a plurality of block storage data blocks in block storage to create a corresponding LUN, and establishing an SSD block management module corresponding to the LUN, wherein each LUN is distributed with a plurality of SSD blocks which jointly form an SSD block set of the LUN, the LUNs can be mapped out through iSCSI or FC to be used as block equipment, and data read and write of each LUN are mutually independent;
s4, the read-write module receives an IO data write-in or IO data read instruction, if the received IO data write-in instruction is the IO data write-in instruction, the step S5 is executed, and if the received IO data read instruction is the IO data read instruction, the step S6 is executed;
s5, searching whether corresponding data segments or combinable data segments exist in an SSD block set of the LUN to which the IO data belongs according to the offset and the length, if so, judging whether the IO data to be written is larger than a preset penetration threshold, if so, updating the corresponding data segments, marking the dirty bit of the data segments as 1, and waiting for a background flashing thread to flash the data from a cache pool into a block storage; if the value is larger than the penetration threshold value, merging the IO data, writing the merged IO data into a block memory, and marking a dirty bit of the originally merged data segment as 0;
if the data is not found, judging whether the length of the IO data to be written is larger than a preset penetration threshold, if so, directly writing the data into a block storage, if so, applying a new SSD block to a cache pool allocation module by an SSD block management module, writing the IO data into the newly applied SSD block to form one data segment, and marking a dirty bit of the data segment as 1, wherein if no distributable SSD block exists in the current NVMe SSD cache module, a elimination module screens out the distributable SSD block from an LUN SSD block set occupying the largest number of SSD blocks;
s6, searching whether a corresponding data segment exists in the SSD block set of the LUN to which the IO data belongs according to the offset and the length, and if so, reading corresponding data from the SSD block;
and if the data is not found, reading the data from the block storage, simultaneously applying a new SSD block to the cache pool allocation module by the SSD block management module, caching the read data into the newly applied SSD block, wherein if the current NVMe SSD cache module does not have the distributable SSD block, the elimination module eliminates and screens the redistributable SSD block from the SSD block set of the LUN occupying the largest number of the SSD blocks currently.
Further, the specific process of initializing the cache pool in step S2 is as follows:
s21, logically partitioning the Raid1 disk array into a plurality of SSD blocks according to a preset capacity size, and writing the tag information and the detailed information of the SSD cache device in the first sector of the first SSD block, including: raid1 disk array total size, block size, and number of blocks;
and S22, initializing a bitmap according to the size of the Raid1 disk array equipment from 4096 bytes of the first SSD block, wherein each SSD block corresponds to one bit, and the bit corresponding to the SSD block occupied by the bitmap space is set to be 1, and the bit of the other unused SSD blocks is set to be 0.
Further, the preset capacity is specifically 1 MB.
Further, the background flash thread in step S5 is specifically multiple data read-write threads for synchronizing the cache pool data to the block storage, and the specific working process of the background flash thread is as follows:
when data is written, marking the dirty bit of the corresponding data segment as 1, then selecting a proper data synchronization thread according to the current load condition of each data synchronization thread, adding the related information of the SSD block containing the data segment into the data synchronization queue of the data synchronization thread, and synchronizing the data of all the data segments needing to be synchronized into the block storage when the SSD block is processed by the subsequent data synchronization thread;
when the data in all the data segments in the SSD block are synchronized to the block store, the dirty bit of each data segment is marked as 0.
Further, the step S5 specifically includes the following steps:
s51, inquiring a data segment corresponding to the IO data from the SSD block set corresponding to the LUN according to the offset and the length of the IO data to be written, if the data segment is inquired, executing a step S52, and if the data segment is not inquired, executing a step S55;
s52, judging whether the IO data is larger than the penetration threshold value, if so, executing a step S53, otherwise, executing a step S54;
s53, reading the cached data segment in the SSD block, merging the data segment with the current IO data, writing the merged IO data into a block memory, finally updating the cached data of the SSD block, and marking the dirty bit of the merged data segment cached in the SSD block as 0;
s54, directly updating the data cached by the SSD block, marking the dirty bit of the corresponding data segment in the SSD block as 1, and then adding the SSD block into the data synchronization queue of the synchronization thread;
s55, judging whether the IO data is larger than the penetration threshold value, if so, directly writing the IO data into a block memory, otherwise, executing the step S56;
s56, the SSD block management module applies for a new SSD block to the cache pool allocation module, the cache pool allocation module searches whether a distributable SSD block exists in the NVMe SSD cache module, if not, the step S57 is executed, and if yes, the step S58 is executed;
s57, the elimination module eliminates and screens the SSD blocks with the read-write heat lower than a preset threshold value from the SSD block set of the LUN occupying the largest number of the SSD blocks, the SSD blocks are recycled by the cache pool allocation module, the eliminated and screened SSD blocks are placed into the NVMe SSD cache module again, and then the step S58 is executed;
and S58, the cache pool allocation module allocates a new SSD block to the LUN, writes the IO data into the new SSD block, marks the dirty bit of the corresponding data segment as 1, and then adds the new SSD block into the data synchronization queue of the synchronization thread.
Further, the step S6 specifically includes the following steps:
s61, inquiring a data segment corresponding to the IO data from the SSD block set corresponding to the LUN according to the offset and the length of the IO data to be read, if the data segment is inquired, executing a step S62, and if the data segment is not inquired, executing a step S65;
s62, judging whether the data cached in the SSD block can fill the current IO request, if so, directly reading the data from the SSD block, otherwise, executing the step S63;
s63, reading the uncached data in the SSD block from the block storage, then updating the SSD block, and filling the read data into an internal memory for reading IO;
s64, updating the read-write heat of the SSD block;
and S65, reading data from the block storage, filling the read data into an internal memory of the read IO, simultaneously applying a new SSD block to the cache pool allocation module by the SSD block management module, caching the read data to the newly applied SSD block, then updating the SSD block set of the LUN, and adjusting the read-write heat of the SSD block.
Further, the step S65 of the SSD block management module applying for a new SSD block from the cache pool allocation module specifically includes the following steps:
s651, the SSD block management module applies for a new SSD block to the cache pool allocation module, the cache pool allocation module searches whether a distributable SSD block exists in the NVMe SSD cache module, if not, step S652 is executed, and if yes, step S653 is executed;
s652, the elimination module eliminates and screens the SSD blocks with the read-write heat lower than a preset threshold value from the SSD block set of the LUN occupying the largest number of the SSD blocks, the SSD blocks are recycled by the cache pool allocation module, the eliminated and screened SSD blocks are placed into the NVMe SSD cache module again, and then the step S653 is executed;
s653, the cache pool allocation module allocates a new SSD block to the LUN.
Compared with the prior art, the invention has the following advantages:
the method combines the characteristics of low cost and large capacity of the mechanical hard disk and the high performance of the NVMe SSD, adopts the mechanical hard disk as the bottom physical space of block storage, constructs a cache pool based on the NVMe SSD, and adopts the LUNs to logically integrate the database of the block storage, so that the block storage can externally provide the function of a virtual disk, and simultaneously ensures the independence of data reading and writing among the LUNs.
In addition, the invention is based on a elimination algorithm mechanism to eliminate and screen out SSD blocks with lower read-write heat, so that new SSD blocks can be subsequently redistributed to needed LUNs, the cache pool can be ensured to continuously update and adjust the cache space to adapt to the cache requirement of new data, thereby fully and reasonably utilizing system resources and ensuring that the performance of the whole block storage cache system is continuously optimized and improved.
Third, the invention pre-filters the larger IO data in the process of writing the IO data by setting the penetration threshold value, thereby preventing the cache pool from being quickly filled with the large IO data and ensuring the normal and reliable operation of the cache pool.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a schematic flow diagram of the process of the present invention;
FIG. 3 is a flow chart illustrating data writing in an embodiment;
FIG. 4 is a schematic flow chart of data reading in the embodiment;
the notation in the figure is: 1. the system comprises a cache pool, 11, a control module, 12, a cache pool distribution module, 13, an SSD block management module, 14, a elimination module, 15, a read-write module, 16, an NVMe SSD cache module, 2, block storage, 21 and a mechanical hard disk.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
Examples
As shown in fig. 1, a block storage system based on an NVMe SSD is composed of a cache pool 1 and a block storage 2, where the cache pool 1 includes a control module 11, a cache pool allocation module 12, an SSD block management module 13, an elimination module 14, a read-write module 15, and an NVMe SSD cache module 16, and the NVMe SSD cache module 16 is specifically a Raid1 disk array composed of a main NVMe SSD and a standby NVMe SSD.
The cache pool 1 performs data interaction with the block storage 2 through the read-write module 15, the bottom physical space of the block storage 2 is composed of a mechanical hard disk 21 (HDD in fig. 1), specifically, the control module 11 is respectively connected to the cache pool allocation module 12, the SSD block management module 13 and the elimination module 14, the SSD block management module 13 is respectively connected to the cache pool allocation module 12 and the read-write module 15 in both directions, the cache pool allocation module 12 is connected to the NVMe SSD cache module 16 in both directions, the NVMe SSD cache module 16 is connected to the read-write module 15 in both directions, the NVMe SSD cache module 16 includes a plurality of SSD blocks with consistent capacity, a plurality of block storage data blocks in the block storage 2 are logically integrated into corresponding LUNs to provide a virtual disk function to the outside, the LUNs are connected to the read-write module 15 in both directions, the LUNs correspond to the respective SSD block management modules 13 and respective SSD block sets, the control module 11 is used for creating the cache pool, initializing the NVMe SSD cache module 16 to obtain a plurality of SSD blocks with consistent capacity;
the cache pool allocation module 12 is configured to allocate SSD blocks in the NVMe SSD cache module 16 to LUNs;
the SSD block management module 13 is configured to execute an operation of applying for an SSD block by a LUN, and schedule an SSD block set corresponding to the LUN;
the elimination module 14 is configured to execute an elimination algorithm, so as to filter and recycle SSD blocks, of which read-write heat is lower than a preset threshold, in the set of SSD blocks corresponding to the LUN to the NVMe SSD Cache module 16, in this embodiment, the elimination algorithm adopted by the elimination module 14 is an ARC (adaptive Replacement Cache) elimination algorithm, the ARC elimination algorithm is a commonly used page Replacement algorithm, and not only can Cache blocks (i.e., SSD blocks in the present invention) be managed according to time, but also Cache blocks can be managed according to frequency, and an optimization strategy can be automatically adjusted between the Cache blocks and the Cache blocks, and the elimination and the screening of the SSD blocks are performed by taking read-write access of the SSD blocks as heat;
the read/write module 15 is used for performing an operation of reading or writing IO data.
The system is applied to practice, and the specific working process is shown in fig. 2, and comprises the following steps:
s1, constructing a Raid1 disk array based on the main NVMe SSD and the standby NVMe SSD to serve as cache pool equipment;
s2, initializing a cache pool, and establishing a global cache pool distribution module;
s3, logically integrating a plurality of block storage data blocks in block storage to create a corresponding LUN, and establishing an SSD block management module corresponding to the LUN, wherein each LUN is distributed with a plurality of SSD blocks which jointly form an SSD block set of the LUN, the LUNs can be mapped out through iSCSI or FC to be used as block equipment, and data read and write of each LUN are mutually independent;
s4, the read-write module receives an IO data write-in or IO data read instruction, if the received IO data write-in instruction is the IO data write-in instruction, the step S5 is executed, and if the received IO data read instruction is the IO data read instruction, the step S6 is executed;
s5, searching whether corresponding data segments or combinable data segments exist in an SSD block set of the LUN to which the IO data belongs according to the offset and the length, if so, judging whether the IO data to be written is larger than a preset penetration threshold, if so, updating the corresponding data segments, marking the dirty bit of the data segments as 1, and waiting for a background flashing thread to flash the data from a cache pool into a block storage; if the value is larger than the penetration threshold value, merging the IO data, writing the merged IO data into a block memory, and marking a dirty bit of the originally merged data segment as 0;
if the data is not found, judging whether the length of the IO data to be written is larger than a preset penetration threshold, if so, directly writing the data into a block storage, if so, applying a new SSD block to a cache pool allocation module by an SSD block management module, writing the IO data into the newly applied SSD block to form one data segment, and marking a dirty bit of the data segment as 1, wherein if no distributable SSD block exists in the current NVMe SSD cache module, a elimination module screens out the distributable SSD block from an LUN SSD block set occupying the largest number of SSD blocks;
s6, searching whether a corresponding data segment exists in the SSD block set of the LUN to which the IO data belongs according to the offset and the length, and if so, reading corresponding data from the SSD block;
and if the data is not found, reading the data from the block storage, simultaneously applying a new SSD block to the cache pool allocation module by the SSD block management module, caching the read data into the newly applied SSD block, wherein if the current NVMe SSD cache module does not have the distributable SSD block, the elimination module eliminates and screens the redistributable SSD block from the SSD block set of the LUN occupying the largest number of the SSD blocks currently.
In this embodiment, when the method is applied, first, two NVMe SSDs are used as active and standby, respectively, to form raid1 device/dev/dm 0, then the/dev/dm 0 is logically divided into a plurality of blocks according to 1MB, and the mark information and the detailed information of the SSD cache device are written in the first sector of the first block, which includes: and finally, initializing a bitmap according to the size of the/dev/dm 0 equipment from 4096 bytes of the first block according to the total size, the block size and the number of blocks of the/dev/dm 0, wherein each 1MB block corresponds to one bit, the bit corresponding to the block occupied by the bitmap space is set to be 1, and the bit of other unused blocks is set to be 0, so that the initialization work of the cache pool is completed.
In this embodiment, when executing the read-write workflow, as shown in fig. 3 and fig. 4, the following steps are mainly included:
firstly, an NVMe SSD cache pool creating stage: two NVMe SSD disks are configured on the storage node to form raid1 equipment/dev/dm 0. After the cache device is selected, initialization operation is performed on/dev/dm 0, corresponding information is filled into the ssdcache manager (i.e., the cache pool allocation module 12) for global cache block management, and after the initialization is completed, the cache pool can be used.
II, creating an LUN stage: the block storage back end creates virtual LUNs, the LUNs can be mapped out through iSCSI or FC and used as block devices, and data reading and writing of each LUN are independent. At the same time of creating LUNs at the back end of the block storage, the corresponding SSD block management module 13 is created.
Third, data writing phase (as shown in fig. 3): when IO data is written, whether corresponding data segments or combinable data segments exist in an SSD block set of an LUN to which the IO belongs is searched according to offset and length, if so, the corresponding data segments are updated, the dirty bit of the data segments is marked to be 1, and a background flashing thread is waited to flash from a cache pool to a storage pool in which the blocks are stored; if the cache block information is not found, judging whether the length is larger than a set penetration threshold value, if so, directly writing the cache block information into a storage pool of the block storage, if so, applying for a new SSD block to the SSDCache manager, writing the IO data into the SSD block to form one data segment, marking the dirty bit of the data segment as 1, and placing the corresponding cache block information into a synchronization queue of a data synchronization thread.
Fourth, read data phase (as shown in FIG. 4): when data is read from a block storage system, whether a corresponding data segment exists in an SSD block set of an LUN to which an IO belongs is searched according to offset and length, and if the corresponding data segment exists, corresponding data is read from the SSD block; and if the data is not found, reading the data from the storage pool of the block storage, simultaneously applying for a new SSD block from the SSDCache manager, and caching the read data into the SSD block.
In the above step, the data segment refers to a continuous data area in an SSD block, and the SSD block has a size of 1MB as a unit, but the length of the write IO may be smaller than 1MB, and the data segment may be generated at this time. The merging of data segments means that if two data segments are continuous or have overlapped parts in the same SSD block, the two data segments can be merged into a new data segment.
The dirty bit is a flag bit for marking whether the data segment in the SSD block has new data written therein. If new data is written into the SSD block, the dirty bit of the data segment is marked as 1, which indicates that the data in the segment needs to be synchronized into the block storage pool, and after the synchronization is completed, the dirty bit needs to be reset to 0.
It should be noted that, 1, when reading and writing data, since the SSD cache pool is much smaller than the storage pool of the block storage, a data synchronization policy needs to be provided to synchronize the data in the SSD cache pool to the storage pool of the block storage, so that the SSD cache pool of the present invention adopts a real-time synchronization policy to synchronize the data in the SSD cache pool to the storage pool of the block storage in real time, so as to fully utilize the SSD cache pool and the block storage pool, specifically, when writing data, the process of writing the data in the cache pool to the storage pool of the block storage in the background mainly includes:
1) after the NVMe SSD cache pool works, a plurality of data synchronization threads are started to wait for synchronizing data.
2) When data is written, marking the dirty bit of the corresponding data segment as 1, then selecting a proper data synchronization thread according to the current load condition of each data synchronization thread, adding the SSD block related information containing the data segment into the data synchronization queue of the data synchronization thread, and synchronizing the data of all the data segments needing to be synchronized into the storage pool of the block storage when the SSD block is processed by the subsequent data synchronization thread.
3) When the data in all the data segments in the SSD block are synchronized to the storage pool of the block storage, the dirty bit of each data segment is marked as 0.
2. When a new SSD block is applied for failure, a corresponding elimination mechanism is required to be provided to eliminate the SSD block which is not accessed frequently in an SSD Cache pool into a storage pool of block storage so as to vacate a space to Cache new data, therefore, the invention independently adopts an ARC elimination algorithm to manage the adjustment and elimination of the SSD block of each LUN, and eliminates the SSD block which is not accessed frequently according to the read-write heat of each SSD block in the SSD Cache pool, wherein the ARC elimination algorithm, namely the AdjustableRepeactive Cache, is a commonly used page Replacement algorithm, can manage the Cache block according to time, can manage the Cache block according to frequency, and can automatically adjust and optimize strategies between the Cache block and the Cache block. The SSD cache pool block is eliminated by taking read-write access of the SSD cache pool block as heat, the specific implementation process is that an SSDCache manager is responsible for distribution and recovery of all SSD blocks in the SSD cache pool, each LUN is added into an own SSD block set after being applied to the SSD block, and subsequent adjustment and elimination are scheduled by the LUN.
3. Because the performance difference between the HDD mechanical hard disk and the NVMe SSD is not large when large IO is written, a penetration threshold value is required to be set to filter large IO, and the SSD cache pool is prevented from being quickly full of large IO.
In summary, the invention designs a cache system suitable for block storage by combining the characteristics of low cost and large capacity of the mechanical hard disk of the HDD and the characteristic of high performance of the NVMe SSD. The IO writing performance is improved by writing the IO which does not exceed the penetration threshold into the NVMe SSD, and meanwhile, the read IO acceleration is provided by using the written data in the NVMe SSD, so that the read-write performance of the block storage is effectively improved. Meanwhile, by brushing the data in the NVMe SSD cache pool to the storage pool of the block storage in real time and combining an ARC elimination mechanism, system resources can be fully and reasonably utilized, and continuous performance improvement is guaranteed.

Claims (10)

1. A block storage and cache system based on an NVMe SSD comprises a cache pool (1) and a block storage (2), wherein the cache pool (1) comprises a control module (11), a cache pool allocation module (12), an SSD block management module (13), an elimination module (14), a read-write module (15) and an NVMe SSD cache module (16), the control module (11) is respectively connected to the cache pool allocation module (12), the SSD block management module (13) and the elimination module (14), the SSD block management module (13) is respectively bidirectionally connected with the cache pool allocation module (12) and the read-write module (15), the cache pool allocation module (12) is bidirectionally connected with the NVMe SSD cache module (16), the NVMe SSD cache module (16) is bidirectionally connected with the read-write module (15), and the NVMe SSD cache module (16) comprises a plurality of SSD blocks with consistent capacity, the bottom physical space of the block storage (2) is composed of a mechanical hard disk (21), a plurality of data blocks of the block storage (2) in the block storage (2) are logically integrated into corresponding LUNs to provide a virtual disk function for the outside, the LUNs are bidirectionally connected with a read-write module (15), the LUNs correspond to respective SSD block management modules (13) and respective SSD block sets, and the control module (11) is used for creating a cache pool (1) and initializing an NVMe SSD cache module (16) to obtain a plurality of SSD blocks with consistent capacity and size;
the cache pool allocation module (12) is used for allocating SSD blocks in the NVMe SSD cache module (16) to LUNs;
the SSD block management module (13) is used for executing the operation of applying for the SSD block by the LUN and scheduling the SSD block set corresponding to the LUN;
the elimination module (14) is used for executing an elimination algorithm so as to screen and recover the SSD blocks with the read-write heat lower than a preset threshold value in the SSD block set corresponding to the LUN to the NVMe SSD cache module (16);
the read-write module (15) is used for executing the operation of reading or writing IO data.
2. The block storage caching system based on the NVMe SSD according to claim 1, wherein the eviction algorithm is specifically an ARC eviction algorithm.
3. The block storage and cache system based on the NVMe SSD according to claim 1, wherein the NVMe SSD cache module (16) is a Raid1 disk array formed by a main NVMe SSD and a standby NVMe SSD.
4. A block storage caching method using the block storage caching system according to claim 1, comprising the steps of:
s1, constructing a Raid1 disk array based on the main NVMe SSD and the standby NVMe SSD to serve as cache pool equipment;
s2, initializing a cache pool, and establishing a global cache pool distribution module;
s3, logically integrating a plurality of block storage data blocks in block storage to create a corresponding LUN, and establishing an SSD block management module corresponding to the LUN, wherein each LUN is distributed with a plurality of SSD blocks which jointly form an SSD block set of the LUN, the LUNs can be mapped out through iSCSI or FC to be used as block equipment, and data read and write of each LUN are mutually independent;
s4, the read-write module receives an IO data write-in or IO data read instruction, if the received IO data write-in instruction is the IO data write-in instruction, the step S5 is executed, and if the received IO data read instruction is the IO data read instruction, the step S6 is executed;
s5, searching whether corresponding data segments or combinable data segments exist in an SSD block set of the LUN to which the IO data belongs according to the offset and the length, if so, judging whether the IO data to be written is larger than a preset penetration threshold, if so, updating the corresponding data segments, marking the dirty bit of the data segments as 1, and waiting for a background flashing thread to flash the data from a cache pool into a block storage; if the value is larger than the penetration threshold value, merging the IO data, writing the merged IO data into a block memory, and marking a dirty bit of the originally merged data segment as 0;
if the data is not found, judging whether the length of the IO data to be written is larger than a preset penetration threshold, if so, directly writing the data into a block storage, if so, applying a new SSD block to a cache pool allocation module by an SSD block management module, writing the IO data into the newly applied SSD block to form one data segment, and marking a dirty bit of the data segment as 1, wherein if no distributable SSD block exists in the current NVMe SSD cache module, a elimination module screens out the distributable SSD block from an LUN SSD block set occupying the largest number of SSD blocks;
s6, searching whether a corresponding data segment exists in the SSD block set of the LUN to which the IO data belongs according to the offset and the length, and if so, reading corresponding data from the SSD block;
and if the data is not found, reading the data from the block storage, simultaneously applying a new SSD block to the cache pool allocation module by the SSD block management module, caching the read data into the newly applied SSD block, wherein if the current NVMe SSD cache module does not have the distributable SSD block, the elimination module eliminates and screens the redistributable SSD block from the SSD block set of the LUN occupying the largest number of the SSD blocks currently.
5. The block storage caching method according to claim 4, wherein the specific process of initializing the cache pool in step S2 is:
s21, logically partitioning the Raid1 disk array into a plurality of SSD blocks according to a preset capacity size, and writing the tag information and the detailed information of the SSD cache device in the first sector of the first SSD block, including: raid1 disk array total size, block size, and number of blocks;
and S22, initializing a bitmap according to the size of the Raid1 disk array equipment from 4096 bytes of the first SSD block, wherein each SSD block corresponds to one bit, and the bit corresponding to the SSD block occupied by the bitmap space is set to be 1, and the bit of the other unused SSD blocks is set to be 0.
6. The block storage caching method according to claim 5, wherein the predetermined capacity is 1 MB.
7. The block storage caching method according to claim 4, wherein the background flash thread in step S5 is specifically a plurality of data read-write threads for synchronizing cache pool data to block storage, and the specific working process of the background flash thread is:
when data is written, marking the dirty bit of the corresponding data segment as 1, then selecting a proper data synchronization thread according to the current load condition of each data synchronization thread, adding the related information of the SSD block containing the data segment into the data synchronization queue of the data synchronization thread, and synchronizing the data of all the data segments needing to be synchronized into the block storage when the SSD block is processed by the subsequent data synchronization thread;
when the data in all the data segments in the SSD block are synchronized to the block store, the dirty bit of each data segment is marked as 0.
8. The block storage caching method according to claim 4, wherein the step S5 specifically comprises the following steps:
s51, inquiring a data segment corresponding to the IO data from the SSD block set corresponding to the LUN according to the offset and the length of the IO data to be written, if the data segment is inquired, executing a step S52, and if the data segment is not inquired, executing a step S55;
s52, judging whether the IO data is larger than the penetration threshold value, if so, executing a step S53, otherwise, executing a step S54;
s53, reading the cached data segment in the SSD block, merging the data segment with the current IO data, writing the merged IO data into a block memory, finally updating the cached data of the SSD block, and marking the dirty bit of the merged data segment cached in the SSD block as 0;
s54, directly updating the data cached by the SSD block, marking the dirty bit of the corresponding data segment in the SSD block as 1, and then adding the SSD block into the data synchronization queue of the synchronization thread;
s55, judging whether the IO data is larger than the penetration threshold value, if so, directly writing the IO data into a block memory, otherwise, executing the step S56;
s56, the SSD block management module applies for a new SSD block to the cache pool allocation module, the cache pool allocation module searches whether a distributable SSD block exists in the NVMe SSD cache module, if not, the step S57 is executed, and if yes, the step S58 is executed;
s57, the elimination module eliminates and screens the SSD blocks with the read-write heat lower than a preset threshold value from the SSD block set of the LUN occupying the largest number of the SSD blocks, the SSD blocks are recycled by the cache pool allocation module, the eliminated and screened SSD blocks are placed into the NVMe SSD cache module again, and then the step S58 is executed;
and S58, the cache pool allocation module allocates a new SSD block to the LUN, writes the IO data into the new SSD block, marks the dirty bit of the corresponding data segment as 1, and then adds the new SSD block into the data synchronization queue of the synchronization thread.
9. The block storage caching method according to claim 4, wherein the step S6 specifically comprises the following steps:
s61, inquiring a data segment corresponding to the IO data from the SSD block set corresponding to the LUN according to the offset and the length of the IO data to be read, if the data segment is inquired, executing a step S62, and if the data segment is not inquired, executing a step S65;
s62, judging whether the data cached in the SSD block can fill the current IO request, if so, directly reading the data from the SSD block, otherwise, executing the step S63;
s63, reading the uncached data in the SSD block from the block storage, then updating the SSD block, and filling the read data into an internal memory for reading IO;
s64, updating the read-write heat of the SSD block;
and S65, reading data from the block storage, filling the read data into an internal memory of the read IO, simultaneously applying a new SSD block to the cache pool allocation module by the SSD block management module, caching the read data to the newly applied SSD block, then updating the SSD block set of the LUN, and adjusting the read-write heat of the SSD block.
10. The block storage caching method according to claim 9, wherein the step S65 of applying for a new SSD block from the cache pool allocation module by the SSD block management module specifically includes the steps of:
s651, the SSD block management module applies for a new SSD block to the cache pool allocation module, the cache pool allocation module searches whether a distributable SSD block exists in the NVMe SSD cache module, if not, step S652 is executed, and if yes, step S653 is executed;
s652, the elimination module eliminates and screens the SSD blocks with the read-write heat lower than a preset threshold value from the SSD block set of the LUN occupying the largest number of the SSD blocks, the SSD blocks are recycled by the cache pool allocation module, the eliminated and screened SSD blocks are placed into the NVMe SSD cache module again, and then the step S653 is executed;
s653, the cache pool allocation module allocates a new SSD block to the LUN.
CN202011010192.6A 2020-09-23 2020-09-23 NVMe SSD-based block storage caching system and method thereof Active CN112148225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011010192.6A CN112148225B (en) 2020-09-23 2020-09-23 NVMe SSD-based block storage caching system and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011010192.6A CN112148225B (en) 2020-09-23 2020-09-23 NVMe SSD-based block storage caching system and method thereof

Publications (2)

Publication Number Publication Date
CN112148225A true CN112148225A (en) 2020-12-29
CN112148225B CN112148225B (en) 2023-04-25

Family

ID=73896166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011010192.6A Active CN112148225B (en) 2020-09-23 2020-09-23 NVMe SSD-based block storage caching system and method thereof

Country Status (1)

Country Link
CN (1) CN112148225B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120102263A1 (en) * 2010-10-25 2012-04-26 Fastor Systems, Inc. Solid State Drive Architecture
US20130346693A1 (en) * 2011-08-04 2013-12-26 Huawei Technologies Co., Ltd. Data Cache Method, Device, and System in a Multi-Node System
CN103559138A (en) * 2013-10-09 2014-02-05 华为技术有限公司 Solid state disk (SSD) and space management method thereof
CN103631536A (en) * 2013-11-26 2014-03-12 华中科技大学 Method for optimizing RAID5/6 writing performance by means of invalid data of SSD
US20150095555A1 (en) * 2013-09-27 2015-04-02 Avalanche Technology, Inc. Method of thin provisioning in a solid state disk array

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120102263A1 (en) * 2010-10-25 2012-04-26 Fastor Systems, Inc. Solid State Drive Architecture
US20130346693A1 (en) * 2011-08-04 2013-12-26 Huawei Technologies Co., Ltd. Data Cache Method, Device, and System in a Multi-Node System
US20150095555A1 (en) * 2013-09-27 2015-04-02 Avalanche Technology, Inc. Method of thin provisioning in a solid state disk array
CN103559138A (en) * 2013-10-09 2014-02-05 华为技术有限公司 Solid state disk (SSD) and space management method thereof
CN103631536A (en) * 2013-11-26 2014-03-12 华中科技大学 Method for optimizing RAID5/6 writing performance by means of invalid data of SSD

Also Published As

Publication number Publication date
CN112148225B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
US20230066084A1 (en) Distributed storage system
JP5593577B2 (en) Storage system and control method thereof
US8615640B2 (en) System and method to efficiently schedule and/or commit write data to flash based SSDs attached to an array controller
KR101717644B1 (en) Apparatus, system, and method for caching data on a solid-state storage device
US8612716B2 (en) Storage system having partition size set in accordance with drive type
JP6007329B2 (en) Storage controller, storage device, storage system
US9280478B2 (en) Cache rebuilds based on tracking data for cache entries
US10203876B2 (en) Storage medium apparatus, method, and program for storing non-contiguous regions
US8966170B2 (en) Elastic cache of redundant cache data
US20150347310A1 (en) Storage Controller and Method for Managing Metadata in a Cache Store
JP2008015769A (en) Storage system and writing distribution method
JP5583227B1 (en) Disk array device, disk array controller and method for copying data between physical blocks
CN110502455B (en) Data storage method and system
US20160253123A1 (en) NVMM: An Extremely Large, Logically Unified, Sequentially Consistent Main-Memory System
KR20150105323A (en) Method and system for data storage
JP2007041904A (en) Storage device, disk cache control method and capacity allocating method of disk cache
US20140328127A1 (en) Method of Managing Non-Volatile Memory and Non-Volatile Storage Device Using the Same
WO2014142337A1 (en) Storage device and method, and program
US11016889B1 (en) Storage device with enhanced time to ready performance
CN117453152B (en) ZNS solid state disk Zone LBA management method and algorithm of block management command
US10482012B1 (en) Storage system and method of operating thereof
JP4734432B2 (en) Data storage system
JP2015052853A (en) Storage controller, storage control method, and program
CN112148225B (en) NVMe SSD-based block storage caching system and method thereof
CN114840148A (en) Method for realizing disk acceleration based on linux kernel bcache technology in Kubernets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant