WO2022213736A1 - 一种将数据写入固态硬盘的方法 - Google Patents

一种将数据写入固态硬盘的方法 Download PDF

Info

Publication number
WO2022213736A1
WO2022213736A1 PCT/CN2022/077686 CN2022077686W WO2022213736A1 WO 2022213736 A1 WO2022213736 A1 WO 2022213736A1 CN 2022077686 W CN2022077686 W CN 2022077686W WO 2022213736 A1 WO2022213736 A1 WO 2022213736A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
page
scm
address
chip
Prior art date
Application number
PCT/CN2022/077686
Other languages
English (en)
French (fr)
Inventor
周文
程桢
苏毅
蒋泓峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110662875.8A external-priority patent/CN115203079A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22783806.7A priority Critical patent/EP4307129A1/en
Publication of WO2022213736A1 publication Critical patent/WO2022213736A1/zh
Priority to US18/477,160 priority patent/US20240020014A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/04Addressing variable-length words or parts of words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0873Mapping of cache memory to specific storage devices or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0886Variable-length word access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0616Improving the reliability of storage systems in relation to life time, e.g. increasing Mean Time Between Failures [MTBF]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/068Hybrid storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0284Multiple user address space allocation, e.g. using different base addresses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1041Resource optimization
    • G06F2212/1044Space efficiency improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/21Employing a record carrier using a specific recording technology
    • G06F2212/214Solid state disk
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/31Providing disk cache in a specific location of a storage system
    • G06F2212/313In storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7201Logical to physical mapping or translation of blocks or pages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7203Temporary buffering, e.g. using volatile buffer or dedicated buffer blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7208Multiple device management, e.g. distributing data over multiple flash devices

Definitions

  • the present invention relates to the field of storage systems, and more particularly, to a method for writing data into a solid state disk.
  • Flash-based solid-state memory (Solid State Drive, SSD) uses a page as the basic unit to perform data read/write operations, and the default page size is generally 4KB/8KB/16KB. Therefore, upper-layer systems (such as file systems and block systems) that communicate with SSDs widely use pages as data writing units, thereby simplifying space management and improving metadata storage efficiency.
  • SSD Solid State Drive
  • the upper-layer system When the upper-layer system needs to write data that is smaller than the page size (ie, small IO data) to the SSD, it usually first reads the data belonging to the same page as the small IO data from the SSD, and then merges them into a new one. page data, and finally write the merged page data into the SSD as a new page. It can be seen that executing a small IO write request will result in a read operation and a write operation for one page of data, resulting in a read/write amplification problem and reducing the lifespan of the SSD.
  • the present application provides a method, device, solid state drive and system for writing data to a solid state drive, which can improve and optimize the problem of poor write performance of the solid state drive under small IO.
  • an embodiment of the present application provides a method for writing data into a solid-state disk, and the method provides two writing interfaces, ie, a byte-level writing interface and a page-level writing interface.
  • the specific write method is: receiving a first write request through the byte-level write interface, the first write request carries the first data to be written, and the length of the first data is less than the size of a flash memory page , and receive a second write request through the page-level write interface, where the second write request carries second data to be written, and the length of the first data is equal to the size of a flash memory page.
  • This method provides both byte-level and page-level write interfaces. Compared with traditional solid-state drives, which can only use one flash page as the unit for writing data, this method provides different write interfaces according to the size of the written data. When the data length is less than the size of a flash memory page, the write of the data can still be received, which effectively improves the problem of low write performance for small IO data.
  • a possible design method is to write the first data into the storage-level memory SCM chip of the solid state drive for persistent storage; and write the second data into the flash memory chip of the solid state drive for persistent storage.
  • This method writes full-page data into the Flash chip to achieve high performance and space utilization efficiency.
  • this method writes small IO data into the SCM chip to reduce space waste and eliminate write amplification.
  • This method utilizes the advantages of the byte-level addressing of the SCM chip and the characteristics of persistence, which effectively improves the efficiency of small IO writing, alleviates the write amplification problem caused by small IO writing, and effectively improves the service life of the solid-state hard disk.
  • the first write request also carries the logical address of the first data, and the physical address where the first data is written into the aforementioned SCM chip for persistence is the first physical address; according to the first data obtain the root page address and sub-page address of the first data; save the index relationship between the root page address and the sub-page address, and the index relationship between the sub-page address and the first physical address.
  • the present application establishes a translation layer for page-level data and byte-level data through multi-level index relationship management, which effectively improves the management efficiency of the storage system.
  • a possible design method after writing the second data into the flash memory chip for persistent storage: when the data stored in the aforementioned SCM chip contains the same data as at least a part of the aforementioned second data, delete the aforementioned SCM The data stored in the chip is the same as at least a part of the aforementioned data.
  • the method deletes invalid data and invalid indexes in the SCM chip in time, and can improve the use efficiency of the SCM chip.
  • a possible design method provides a page-level read interface, through which a first read request is received to read the aforementioned second data, and the first read request carries the logical address of the second data ; According to the logical address of this second data, judge whether the 3rd data identical with the root page address of this 2nd data has been stored in this SCM chip; When this 3rd data has been stored in the aforementioned SCM chip, according to this 3rd data The logical address of the second data reads the third data from the SCM chip, and the second data read from the flash memory chip; the third data and the second data read above are combined into a Return after a full page of data.
  • reading the third data from the SCM chip according to the logical address of the second data includes: obtaining the root page address of the second data according to the logical address of the second data; The root page address of the second data, obtains the sub-page address of the third data; according to the sub-page address of the third data, obtains the physical address of the third data, and reads the said third data according to the physical address of the third data Third data.
  • a possible design manner after the aforementioned writing of the second data into the flash memory chip for persistent storage, further includes: caching a copy of the second data in the DRAM chip of the aforementioned solid state drive.
  • a possible design manner after the aforementioned writing of the first data into the SCM chip of the solid-state drive for persistent storage, further includes: caching a copy of the first data in the aforementioned DRAM chip.
  • the SCM chip when reading the first data from the SCM chip: determine whether there is a copy of the first data in the DRAM chip; if so, read the first data from the DRAM chip and return; Otherwise, the first data is read from the aforementioned SCM chip.
  • the aforementioned flash memory chip determines whether there is a copy of the second data in the aforementioned DRAM chip; if so, read the aforementioned second data from the aforementioned DRAM chip and return; otherwise , and read the second data from the aforementioned flash memory chip.
  • a possible design method is to migrate the first data and other data to the aforementioned flash memory chip with page granularity, the other data and the first data belong to the same root page, and the other data is located in the aforementioned flash memory before the migration.
  • the SCM chip or the aforementioned flash memory chip In the SCM chip or the aforementioned flash memory chip.
  • a first aspect of the embodiments of the present application further provides a data migration method, which can be applied to a solid-state drive.
  • the method includes: determining a first root page to be migrated in an SCM chip, where the SCM chip is located in the flash solid-state drive read the data corresponding to the first root page from the SCM chip and/or the flash memory chip, the flash memory chip is located in the aforementioned flash solid-state hard disk; merge or directly write the data corresponding to the first page into the flash memory chip middle.
  • a possible design method is that when the data corresponding to the first root page is read from the SCM chip as a whole page of data, only the data corresponding to the first root page is read from the SCM chip and written into the aforementioned flash memory chip. middle.
  • a possible design method when the data corresponding to the first root page read from the SCM chip is not full page data, the data corresponding to the first root page is read from the SCM chip and the flash memory chip respectively, and it is After merging, it is written into the aforementioned flash memory chip.
  • a possible design method is to delete the data corresponding to the first root page in the SCM chip.
  • a possible design method is to delete the index corresponding to the first root page and create a new index.
  • the root page to be migrated in the SCM chip is determined according to the page aggregation degree, where the page aggregation degree refers to the number of sub-pages included in the root page in the SCM chip, and the root page is composed of multiple sub-pages.
  • the root page with a higher degree of page aggregation or the root page with a longer time from the last update will be preferentially determined as the root page that needs to be migrated.
  • a first aspect of the embodiments of the present application further provides a garbage collection method, the method includes: determining a first block to be recycled in the aforementioned flash memory chip, where the first block includes at least a first part of data and a second part of data, wherein, The first part of data is located in the aforementioned SCM chip, the second part of data is located in the flash memory chip, and the first part of data and the second part of data are valid data; read the first part of data from the SCM chip, from The second part of the data is read in the flash memory chip; the first part of the data and the second part of the data are written into the second block in the flash memory chip; the first block is erased.
  • an embodiment of the present application provides a solid-state hard disk, characterized in that, the solid-state hard disk includes a main controller, multiple flash memory chips, and one or more SCM chips; the main controller executes computer instructions in the first aspect and the method in any of its possible ways of designing it.
  • the SCM chip and the flash memory chip are connected to different channel controllers, wherein the channel controllers are located in the main controller.
  • an embodiment of the present application provides an apparatus for writing data into a solid state disk, the apparatus includes a plurality of modules, and the apparatus is used to implement the method in the first aspect and any possible design manner thereof. function.
  • the device includes:
  • the byte-level write module is used to receive a first write request through the byte-level write interface, the first write request carries the first data to be written, and the length of the first data is less than one flash page. the size of.
  • the foregoing first data is written into the SCM chip of the solid-state hard disk for persistent storage.
  • a copy of the first data is cached in the DRAM chip.
  • the page-level write module is configured to receive a second write request through a byte-level write interface, where the second write request carries the second data to be written, and the length of the first data is equal to the size of a flash memory page .
  • the second data is written into the flash memory chip of the foregoing solid state drive for persistent storage.
  • the data stored in the aforementioned SCM chip contains the same data as at least a part of the aforementioned second data
  • the data stored in the aforementioned SCM chip that is the same as at least a part of the aforementioned data is deleted.
  • the method further includes: caching the copy of the first data in the aforementioned DRAM chip.
  • the device also includes:
  • the page-level read module receives a first read request through the page-level read interface to read the aforementioned second data, and the first read request carries the logical address of the second data; according to the logic of the second data address, to determine whether the SCM chip has stored the third data that is the same as the root page address of the second data; when the third data has been stored in the aforementioned SCM chip, according to the logical address of the second data from the SCM
  • the third data is read from the chip, and the second data is read from the flash memory chip; the third data and the second data read above are combined into a whole page of data and returned.
  • reading the third data from the SCM chip according to the logical address of the second data includes: obtaining the root page address of the second data according to the logical address of the second data; according to the second data The root page address of the third data is obtained, and the sub-page address of the third data is obtained; according to the sub-page address of the third data, the physical address of the third data is obtained, and the third data is read according to the physical address of the third data. .
  • the SCM chip when reading the first data from the SCM chip: determine whether there is a copy of the first data in the DRAM chip; if so, read the first data from the DRAM chip and return; Otherwise, the first data is read from the aforementioned SCM chip.
  • the aforementioned flash memory chip determines whether there is a copy of the second data in the aforementioned DRAM chip; if so, read the aforementioned second data from the aforementioned DRAM chip and return; otherwise , and read the second data from the aforementioned flash memory chip.
  • the device further includes a data migration module for: migrating the first data and other data to the aforementioned flash memory chip with page granularity, and the other data and the first data belong to the same one The root page, the other data is located in the aforementioned SCM chip or the aforementioned flash memory chip before migration.
  • the data corresponding to the first root page read from the SCM chip is whole page data
  • the data corresponding to the first root page is read from the SCM chip and written into the aforementioned flash memory chip.
  • the data corresponding to the first root page read from the SCM chip is not full page data
  • the data corresponding to the first root page is read from the SCM chip and the flash memory chip respectively, and then written after merging them. into the aforementioned flash memory chip.
  • the root page to be migrated in the SCM chip is determined according to a page aggregation degree, where the page aggregation degree refers to the number of sub-pages included in the root page in the SCM chip, and the root page is composed of multiple sub-pages.
  • a root page with a higher degree of page aggregation or a root page with a longer time from the last update will be preferentially determined as the root page to be migrated.
  • the apparatus further includes a garbage collection module for: determining a first block to be collected in the aforementioned flash memory chip, where the first block includes at least a first part of data and a second part of data, wherein the first part The data is located in the aforementioned SCM chip, the second part of the data is located in the flash memory chip, and the first part of the data and the second part of the data are valid data; read the first part of the data from the SCM chip, from the flash memory The second part of data is read in the chip; the first part of data and the second part of data are written into the second block in the flash memory chip; the aforementioned first block is erased.
  • a garbage collection module for: determining a first block to be collected in the aforementioned flash memory chip, where the first block includes at least a first part of data and a second part of data, wherein the first part The data is located in the aforementioned SCM chip, the second part of the data is located in the flash memory chip, and the first part of the data and the second part of the
  • the device further includes a translation layer module, configured to: obtain the root page address and sub-page address of the first data according to the logical address of the first data carried in the first write request; Save the index relationship between the root page address and the sub-page address, and the index relationship between the sub-page address and the first physical address, where the first physical address is the first data written into the aforementioned SCM chip for persistence physical address.
  • a translation layer module configured to: obtain the root page address and sub-page address of the first data according to the logical address of the first data carried in the first write request; Save the index relationship between the root page address and the sub-page address, and the index relationship between the sub-page address and the first physical address, where the first physical address is the first data written into the aforementioned SCM chip for persistence physical address.
  • this module when reading the third data from the SCM chip according to the logical address of the second data, this module can be used to obtain the root page address of the second data according to the logical address of the second data. ; According to the root page address of this second data, obtain the sub-page address of this 3rd data; According to the sub-page address of this 3rd data, obtain the physical address of this 3rd data, and according to the physical address of this 3rd data The third data is read.
  • an embodiment of the present application provides a method for writing data into a storage device, the method comprising: a controller sending a first write request to the SCM storage device, where the first write request carries the data to be written first data, the length of the first data is less than the size of a flash memory page; the SCM storage device receives the first write request through a byte-level write interface; the controller sends the second write request to the flash memory storage device, The second write request carries second data to be written, and the length of the second data is equal to the size of a flash memory page; the flash memory storage device receives the second write request through a page-level write interface.
  • the write request of one flash page is divided into one or more first write operations.
  • different priorities are set for the first write request and the second write request.
  • the SCM storage device persistently stores the first data
  • the flash memory storage device persistently stores the second data
  • the length of the first data is greater than or equal to the minimum management unit of the SCM storage device; the SCM storage device persistently storing the first data includes: storing the aforementioned first data with the minimum management unit.
  • the first write request also carries the logical address of the first data, and the first data is written into the aforementioned
  • the persistent physical address in the SCM storage device is the first physical address; the aforementioned controller obtains the root page address and sub-page address of the first data according to the logical address of the first data; the controller saves the root page address and the sub-page address of the first data.
  • the index relationship of the sub-page address, and the index relationship between the sub-page address and the first physical address is the first physical address;
  • a possible design manner after the aforementioned flash storage device persistently stores the aforementioned second data: when the data stored in the aforementioned SCM storage device contains the same data as at least a part of the second data, the aforementioned controller deletes the aforementioned second data.
  • the same data as at least a part of the aforementioned data is stored in the SCM storage device.
  • the aforementioned controller sends a first read request to read the aforementioned second data, and the aforementioned first read request carries the logical address of the second data; according to the logical address of the second data, determine Whether the third data with the same root page address as the second data has been stored in the aforementioned SCM storage device; when the third data has been stored in the aforementioned SCM storage device, according to the logical address of the second data, from the aforementioned SCM storage device read the third data from the flash memory device, and read the second data from the flash memory storage device; combine the read third data and the second data into a whole page of data and return it.
  • a possible design method reading the third data from the SCM storage device according to the logical address of the second data, includes: obtaining the root page address of the second data according to the logical address of the second data; From the root page address of the second data, obtain the sub-page address of the third data; according to the sub-page address of the third data, obtain the physical address of the third data, and read according to the physical address of the third data the third data.
  • the controller migrates the first data and other data to the flash storage device with page granularity.
  • the other data and the first data belong to the same root page, and the other data is located in the aforementioned SCM storage device or the aforementioned flash storage device before the migration.
  • an embodiment of the present application provides a storage system, the storage system includes: a controller, an SCM storage device, and a flash memory storage device; the controller sends a first write request to the SCM storage device, and the first write request The first data to be written is carried in the SCM, and the length of the first data is less than the size of a flash memory page; the SCM storage device receives the first write request through a byte-level write interface; the controller sends the flash storage device to the A second write request, the second write request carries second data to be written, and the length of the second data is equal to the size of a flash memory page; the flash memory storage device receives the second write request through a page-level write interface input request. .
  • the controller divides a write request of a flash memory page into one or more of the first write operations.
  • the controller sets different priorities for the first write request and the second write request.
  • the SCM storage device persistently stores the first data
  • the flash memory storage device persistently stores the second data
  • the length of the first data is greater than or equal to the minimum management unit of the SCM storage device; the SCM storage device persistently storing the first data includes: storing the aforementioned first data with the minimum management unit.
  • the first write request also carries the logical address of the first data, and the physical address where the first data is written into the SCM storage device for persistence is the first physical address; the controller Obtain the root page address and sub-page address of the first data according to the logical address of the first data; the controller saves the index relationship between the root page address and the sub-page address, as well as the sub-page address and the first physical address index relationship between.
  • the controller device is further configured to: when the data stored in the SCM storage device contains the same data as at least a part of the second data.
  • the same data as at least a part of the data stored in the SCM storage device is deleted.
  • the aforementioned controller sends a first read request to read the aforementioned second data, and the aforementioned first read request carries the logical address of the second data; according to the logical address of the second data, determine Whether the third data with the same root page address as the second data has been stored in the aforementioned SCM storage device; when the third data has been stored in the aforementioned SCM storage device, according to the logical address of the second data, from the aforementioned SCM storage device read the third data from the flash memory device, and read the second data from the flash memory storage device; combine the read third data and the second data into a whole page of data and return it.
  • the controller is further configured to: read the third data from the SCM storage device according to the logical address of the second data, including: obtaining the second data according to the logical address of the second data the root page address of the data; obtain the sub-page address of the third data according to the root page address of the second data; obtain the physical address of the third data according to the sub-page address of the third data, and obtain the physical address of the third data according to the sub-page address of the third data; The physical address of the three data reads the third data.
  • a possible design method is to migrate the first data and other data to the flash storage device with page granularity, the other data and the first data belong to the same root page, and the other data is located in the SCM before the migration storage device or the flash storage device.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium includes program instructions, and when the program instructions are executed on a computer or a processor, the computer or the processor executes the first step.
  • the computer-readable storage medium includes program instructions, and when the program instructions are executed on a computer or a processor, the computer or the processor executes the first step.
  • Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium includes program instructions, when the program instructions are executed on a computer or a processor, the computer or the processor is caused to execute the second aspect and the A method executed by the controller in any of its possible designs.
  • the present application provides a computer program product comprising instructions stored in a computer-readable storage medium.
  • the processor of the controller can read the instructions from the computer-readable storage medium; the processor executes the instructions, so that the storage system or storage device implements the first aspect or the method provided by various possible designs of the first aspect.
  • the present application provides a computer program product comprising instructions stored in a computer-readable storage medium.
  • the processor of the controller can read the instructions from the computer-readable storage medium; the processor executes the instructions, so that the storage system or storage device implements the method provided by the second aspect or various possible designs of the second aspect.
  • FIG. 1 is an example diagram of a storage network architecture provided by an embodiment of the present application.
  • FIG. 2 is an exemplary diagram of another storage network architecture provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of the interior of a storage system 120 provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a solid-state hard disk SSD 300 provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a flash memory chip 212 provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an SCM chip 213 provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a software solution of a two-level translation layer provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a method for writing data to a solid-state hard disk provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a hard disk 134-1 provided by an embodiment of the present application.
  • FIG. 10( a ) is a schematic flowchart of a method for byte-level writing provided by an embodiment of the present application.
  • FIG. 10(b) is a schematic flowchart of a method for page-level writing provided by an embodiment of the present application
  • FIG. 11 is a schematic flowchart of a method for page-level reading provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a method for page-level reading provided by an embodiment of the present application.
  • FIG. 13( a ) is a flowchart of a method for garbage collection provided by an embodiment of the present application.
  • FIG. 13(b) is a flowchart of a data migration method provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of the interior of a storage system 120 provided by an embodiment of the present application.
  • FIG. 15 is a schematic flowchart of a method for writing data into a storage device according to an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of an apparatus 1600 provided by an embodiment of the present application.
  • FIG. 1 is an example diagram of a storage network architecture provided by an embodiment of the present application.
  • the storage network architecture includes an application server 100, a switch 110, and a storage system 120 (or storage node).
  • the method provided by the present application can be applied to the method shown in FIG. 1 in the storage network architecture shown.
  • the application server 100 may be a physical machine or a virtual machine. Physical machines include, but are not limited to, desktop computers, servers, laptops, and mobile devices. In the application scenario shown in FIG. 1 , the user accesses data through the application program running on the application server 100 .
  • the switch 110 is an optional device, and the application server 100 can access the storage system 120 through the fiber switch 110 to access data.
  • the application server 100 may also communicate with the storage system 120 directly through the network.
  • the fiber switch 110 can also be replaced with an Ethernet switch, an InfiniBand switch, a RoCE (RDMA over Converged Ethernet) switch, or the like.
  • the storage system 120 includes an engine 121 and one or more hard disks 134.
  • the engine 121 is the most core component in the centralized storage system, and many advanced functions of the storage system are implemented therein.
  • the storage system 120 shown in FIG. 1 is a storage system with integrated disk control.
  • the engine 121 has a hard disk slot, and the hard disk 134 can be directly deployed in the engine 121 , that is, the hard disk 134 and the engine 121 are deployed on the same device.
  • the engine 121 includes one or more controllers, taking the two controllers 122 in FIG. 1 as an example, there is a mirror channel between the two controllers, so that the two controllers 122 can back up each other, thereby avoiding hardware failures This results in the unavailability of the entire storage system 120 .
  • the engine 121 also includes a front-end interface 125 and a back-end interface 126 , wherein the front-end interface 125 is used for communicating with the application server 100 to provide storage services for the application server 100 .
  • the back-end interface 126 is used to communicate with the hard disk 134 to expand the capacity of the storage system.
  • the controller 122 includes at least a processor 123 and a memory 124 .
  • the processor 123 is a central processing unit (central processing unit, CPU), used for processing data access requests from outside the storage system (server or other storage systems), and also used for processing requests generated inside the storage system.
  • Memory 124 refers to internal memory that exchanges data directly with the processor.
  • the memory 124 includes at least two types of memory.
  • the memory 124 may be either a random access memory (Random Access Memory, RAM) or a read only memory (ROM).
  • random access memory includes dynamic random access memory (DRAM) and static random access memory (SRAM).
  • the present application does not limit the specific type of the memory 124, and any memory that can be used as the memory 124 to interact with the processor is applicable to the embodiment of the present application.
  • the memory 124 stores software programs, and the processor 123 runs the software programs in the memory 124 to manage the hard disk.
  • the hard disk is abstracted into a storage resource pool, and then divided into LUNs for use by the file server, etc.
  • the LUNs here are actually the hard disks seen on the application server.
  • Some centralized storage systems are also file servers themselves, which can provide shared file services for application servers.
  • the hard disk 134 is used to provide storage resources, such as storing data.
  • the hard disk 134 may be a serial attached SCSI (Serial Attached SCSI, SAS) hard disk, a non-volatile memory standard (Non-Volatile Memory Express, NVMe) hard disk, a high-speed peripheral device Connect standard (Peripheral Component Interconnect express, PCIe) hard disks, Serial ATA (Serial Advanced Technology Attachment, SATA) hard disks and other types of hard disk enclosures.
  • the hard disk 134 may be a flash memory SSD.
  • the hard disk 134 may also be a magnetic disk or other types of storage media, such as a solid state disk or a shingled magnetic recording hard disk, or the like.
  • the storage system 120 may also be a storage system with separate disk control, and the engine 121 may not have a hard disk slot, that is, the engine 121 and the hard disk 134 are deployed on two devices.
  • the system 120 further includes a hard disk enclosure 130, and the hard disk enclosure 130 includes a control unit 131, several hard disks 134 and a network card (not shown in the figure).
  • the hard disk enclosure 130 includes a control unit 131 and several hard disks 134 .
  • the control unit 131 may have various forms.
  • the hard disk enclosure 130 belongs to a smart disk enclosure
  • the control unit 131 includes a CPU and a memory
  • the control unit 131 includes a CPU and a memory. This CPU is used to perform operations such as address translation and reading and writing data.
  • the network card is used to communicate with other servers 110 .
  • the functions of the control unit 131 may be offloaded to the network card.
  • the hard disk enclosure 130 may not have the control unit 131, and the network card may perform data reading and writing, address translation, and other computing functions.
  • the network card is a smart network card.
  • the above-mentioned system 120 can be applied not only to the centralized storage architecture as shown in FIG. 1 and FIG. 2 , but also to various distributed storage systems.
  • FIG. 3 is a schematic structural diagram of the interior of the storage system 120 shown in FIG. 1 or FIG. 2 .
  • the storage system 120 includes a hard disk 134 - 1 , and the hard disk 134 - 1 may be an embodiment of the aforementioned hard disk 134 .
  • the storage system 120 includes mixed storage media, that is, the hard disk 134 includes two types of storage media, a flash memory (Flash) and a storage class memory (Storage Class Memory, SCM).
  • flash flash
  • SCM Storage Class Memory
  • SCM also known as persistent memory, is a composite storage technology that combines the characteristics of traditional storage devices and memory. SCM has the characteristics of persistence and fast byte-level access.
  • the hard disk 134 - 1 internally includes a flash memory chip 212 and an SCM chip 213 , and a main controller 211 .
  • the SCM chip is connected to the SSD main controller 211 by an independent line.
  • the main controller 211 is an embedded microchip, which also includes a processor, a memory, and a communication interface (not shown in FIG. 3 ). Its function is like a command center, and executes operation requests related to the hard disk. For details, please refer to the figure. 4 description.
  • the SCM chip 213 is mainly used for small IO (byte-level) writing, and the flash memory chip 212 can be used for page-level writing.
  • the related content of the SCM chip and the flash memory chip will be described in detail in FIG. 5 and FIG. 6 later.
  • small IO refers to a request for less than one full page of data.
  • small IO write when the full page is 8KB fixed length, the data write less than 8KB is called small IO write , for example, 64B, 2KB, and 4KB of data can be written as small IO writes.
  • the full page is 4KB fixed length, the data write below 4KB is called small IO write.
  • the size of a full page may also be 16KB, 32KB, or 64KB, etc. This application does not limit the size of the page.
  • FIG. 4 is a schematic structural diagram of a solid-state hard disk SSD 300 provided by an embodiment of the present application, and the solid-state hard disk 300 may be the hard disk 134-1 in FIG. 3 .
  • the SSD 300 is a storage device that mainly uses flash memory (eg, NAND Flash) and SCM chips as permanent storage.
  • the solid state drive 300 includes a NAND flash memory 213 , a main controller (abbreviation: main controller) 211 and SCM chips 214 , 213 and 214 can be used to persistently write page-level and byte-level data, respectively.
  • the SCM chip 213 may include a phase-change memory (Phase-change memory, PCM), a resistive random access memory (Resistive Random Access Memory, ReRAM), a non-volatile magnetic random access memory (Magnetic Random Access Memory) Memory, MRAM) or carbon nanotube random access memory (Montero's CNT Random Access Memory, NRAM) and any other SCM chip that has both byte-level addressing and persistence.
  • phase-change memory Phase-change memory
  • ReRAM resistive Random Access Memory
  • MRAM non-volatile magnetic random access memory
  • MRAM magnetic random access memory
  • NRAM carbon nanotube random access memory
  • the main controller 211 is the control center of the SSD and is responsible for some complex tasks, such as managing data storage, maintaining the performance and service life of the SSD, and so on. It includes the processor 102, and issues all operation requests of the SSD, such as the translation layer management, data merging, garbage collection, data migration and other functions mentioned later, which can also be executed by the main controller 211.
  • the processor 102 in the main control 211 can perform functions such as reading/writing data, garbage collection, and wear leveling through the firmware in the buffer.
  • the SSD master 211 also includes the host interface 104 and several channel controllers. Among them, the host interface 104 is used to communicate with the host.
  • a host here may refer to a server, a personal computer, or any device such as the controller 122.
  • the main controller 211 can operate the flash memory chip 212 of channel 0 and the SCM chip 213 of channel 1 in parallel, thereby increasing the bandwidth of the bottom layer.
  • FIG. 5 is a schematic structural diagram of the flash memory chip (or flash chip) 212 in FIG. 3 or FIG. 4 .
  • a die is a package of one or more flash memory chips.
  • a die can contain multiple regions (Planes).
  • Multi-Plane NAND is a design that can effectively improve performance.
  • a die is internally divided into two Planes, and one Plane contains multiple blocks.
  • a block consists of several pages.
  • a whole flash memory chip consists of two regions, and the two regions can operate in parallel. This is just an example, and the size of a page, the capacity of a block, and the capacity of a flash memory chip may have different specifications, which are not limited in this embodiment.
  • a page is the smallest unit of data writing in the flash memory chip.
  • the main control 211 writes data into a block with page granularity.
  • a page is the smallest unit of data read in a flash chip.
  • a block is the smallest unit of data erasing and garbage collection.
  • the master controller 211 erases data, it can only erase the entire block at a time.
  • FIG. 6 is a schematic structural diagram of the SCM chip 213 shown in FIG. 3 or FIG. 4 .
  • the SCM chip 213 includes a plurality of memory banks (banks), and a bank is a component in the chip for storing data.
  • a bank is a component in the chip for storing data.
  • an SCM chip with a capacity of 32GB includes 32 memory banks, so each memory bank has a capacity of 1GB.
  • the bank is further divided into multiple partitions to provide parallelism among multiple partitions within the same memory bank.
  • the stored data is managed according to the row (row) and the column (col) to realize the byte-level addressing function, and a memory cell (such as phase change memory cell, floating gate, etc.) can be determined at the intersection of each word line and bit line. transistor), in which the word line and the bit line are two data lines that are perpendicular to each other and are used to connect multiple memory cells.
  • a word line connects multiple memory cells in a row. Multiple memory cells in a row can be selected by applying a level value different from that of other word lines on the selected word line, and operations such as data reading and writing can be performed.
  • the SCM chip 213 may further include modules such as a row decoder, a column decoder, a command compiler, a driving circuit and a digital controller (not shown in FIG. 6 ).
  • the SCM chip of FIG. 6 also has a "row cache" module for caching the data of the aforementioned row. It can be seen that the SCM chip 213 can be read and written according to the length of one row. In other words, row can be used as the smallest management unit of SCM chip.
  • the SCM chip After receiving the read command from the controller 211, the SCM chip first reads 64 bytes of data from a row of the memory bank into the line cache, and then The 64-byte data is sent to the controller 211 from the line buffer. Similarly, the written 64-byte data needs to be written back to the designated row buffer from the controller 211 first, and then written into a row of the memory bank.
  • the size of the cache line is not limited in this embodiment.
  • the data in the "sub-page” in the following examples of this application may refer to the data of a row in the SCM here.
  • the size of a row is much smaller than the size of a "page” in the flash memory 212, in other words, "subpage" data is data that is smaller than a "page”. Therefore, a page can be composed of multiple subpages, for example: an 8KB page can include 128 subpages of 64 bytes.
  • Flash Translation Layer Flash Translation Layer
  • LBA Logical Block Address
  • PBA Physical Block Address
  • the Flash chip is read in units of pages.
  • the main controller 211 can find the physical address in the flash memory chip according to the logical address of the page sent by the controller 211, and read the required data therefrom.
  • the FTL can be implemented by a Hash table.
  • FIG. 7 is a schematic diagram of a software solution of a two-level translation layer proposed by this application.
  • the aforementioned concept of byte-level data written into the SCM chip is abstracted into a "sub-page" for management, which can be applied to the implementation of this application.
  • Example system 120 In a possible implementation manner, the above-mentioned two-level translation layer method is executed in the SSD main controller (eg, main controller 211 ).
  • the two-level translation layer in FIG. 7 includes two-level indexes: (1) find the corresponding root page according to the root page address (2) find the corresponding subpage in the subpage table of the root page according to the subpage address The physical address of the data. Then, the corresponding data can be accessed from the SCM chip according to the physical address, specifically:
  • the master 211 receives a byte-level write request req, which carries the data to be written, the logical address (lba) of the data to be written, and the length of the data (length). It is assumed that the size of the flash memory page in the hard disk 134-1 is 8KB, and the data length length in the request req is 64 bytes.
  • the controller 211 may obtain the corresponding root page address and child page address according to the logical address lba in the request.
  • root page address lba/8KB
  • child page address (lba%8KB)/64B.
  • the Paddr12 is accessed to point to the SCM storage space, for example, the to-be-written data carried by req is written into the storage space.
  • the two-level indexes in the two-level translation layer are stored in an isolated storage space in the SCM chip.
  • the above-mentioned method of two-level translation layer can also be executed by the processor in the upper-layer system of the solid-state hard disk (for example, the controller 122 in the control box), and the specific method is similar to that executed by the main controller 211. , and will not be repeated here.
  • the present application also provides a method for writing data to a solid-state disk, which provides both byte-level and page-level write interfaces, and can solve the problem of low performance in writing small IO data.
  • a traditional page-level interface is used to write into the Flash chip to achieve high performance and space utilization efficiency.
  • write data is written into the SCM chip using a byte-level interface, reducing space waste and eliminating write amplification problems.
  • the advantages of byte-level addressing and persistence characteristics of the SCM chip are utilized to effectively improve the small IO write performance.
  • FIG. 8 is a schematic flowchart of a method for writing data to a solid-state drive provided in Embodiment 1.
  • the SCM chip is located inside the flash SSD (hard disk 134-1).
  • the SCM acts as a persistent cache and is transparent to the software in the controller 122, that is, the software in the controller 122 only Can see the available space of the Flash chip.
  • the SSD driver software provides a byte-level write interface and a page-level write interface to write data into the flash memory chip or the SCM chip in the hard disk 134-1.
  • the garbage collection operation of Flash and the data migration operation of SCM in this embodiment 1 can be performed by the main controller 211 of the hard disk 134-1 without consuming the CPU resources of the controller 122.
  • the main controller 211 of the hard disk 134-1 without consuming the CPU resources of the controller 122.
  • Step 810 The controller 122 receives the service request from the application server.
  • the controller 122 in the storage system 120 receives service requests from outside the storage system (the application server 100 or other storage systems), and the service requests can be used to access stored data in the storage system.
  • the controller 122 may also receive a request generated by an internal application of the storage system.
  • Step 820 The controller 122 sends a byte-level or page-level write request to the hard disk 134-1 according to the service request.
  • the processor 123 of the controller 122 will send the byte-level or page-level data request to the SSD 134 through the two interfaces provided by the driver layer (driver software) of the hard disk 134-1. -1 in.
  • the processor 123 calls the file system in the system kernel, and the file system decides whether to initiate a byte-level or page-level request. Then, through the I/O scheduler in the system kernel, scheduling optimization is performed on these two types of requests. Finally, the two types of requests are sent to the hard disk 134-1 through the byte-level and page-level interfaces provided by the driver software.
  • the software stack used by the existing file/block system is eliminated to the SSD memory according to the memory page (4KB or 8KB), and the byte-level interface cannot be used.
  • the present application also provides an adaptation solution for a file system.
  • the file/block system can decide whether to split an 8KB page into one or more 64B writes, which are written to the host controller 211 by the device driver.
  • the smallest access unit of DRAM memory (such as memory 124 in FIG. 1 ) is a cache line
  • the file/block system of the controller 122 manages memory units according to page granularity
  • the cache line (such as 64B size) is less than one page
  • the size is similar to the data management unit in the SCM chip.
  • the file system can decide whether to use byte write or page write according to the cache line data to be modified (ie, the number of subpages). More subpages can bring higher revenue. specific:
  • the controller 122 adds a 1-bit flag to each sub-page data in the memory page (the same size as the cache line), recording whether each cache line in the page has been modified, and the modified cache line is marked as a dirty sub-page .
  • the file system can adapt the byte-level write interface of the hard disk 134-1 by issuing multiple 64B small IO write requests to write the dirty sub-page data into the SCM chip.
  • the clean sub-page data is consistent with the data in the flash memory chip and does not need to be written.
  • An example of an optional implementation is that when the total amount of dirty subpages in the memory dirty page is lower than a certain threshold (for example, 4KB), the byte write interface is called to issue multiple byte-level write requests, and Write data of multiple dirty subpages into the SCM chip; when the total amount of dirty subpages is greater than the threshold, use the page-level (for example, 8KB) write interface, issue a page-level write request, and write the memory dirty page in the flash memory chip.
  • a certain threshold for example, 4KB
  • the embodiment of the present application also provides a small IO write priority scheduling method, which can be executed by the controller 122 processor.
  • a small IO write priority scheduling method can be executed by the controller 122 processor.
  • the IO scheduler in the system kernel can also set a higher priority for the byte-level write request (small IO write).
  • small IO write To improve small IO write performance. Since small IO writes have lower write latency, the 64B small IO write queue is assigned a higher priority in I/O scheduling, which improves small IO processing performance and reduces IO queuing delay.
  • IOPS Input/Output Operations Per Second
  • Step 830 The main controller 211 of the hard disk 134-1 receives a byte-level or page-level write request.
  • the main controller 211 receives the write request sent by the controller 122 , and the write request carries the starting address of the data to be written, the data length and the data to be written.
  • the starting address is generally referred to as a logical block address (logical block address, LBA) by those skilled in the art, which is simply referred to as a logical address in this embodiment of the present application.
  • the controller 122 can access the corresponding physical address (Physical Block Address, PBA) in the hard disk through the logical block address LBA, so as to realize the reading and writing of data.
  • PBA Physical Block Address
  • the length of the data carried in the request is equal to the size of a full page in the flash memory chip.
  • the length of the data carried in the request is less than the size of a full page in the flash chip.
  • the length of the data is the length of a minimum management unit in the SCM chip, such as 64B.
  • the controller 122 can complete the splitting of the write request.
  • the aforementioned file system can split the page write and issue one or more byte-level write operations.
  • the data length may also be a multiple of the size of the smallest management unit in the SCM chip, such as 128B, 512B, and the like.
  • the main controller 211 can split the data into multiple minimum management units and write them into the SCM chip.
  • Step 840 The main controller 211 writes byte-level data into the SCM chip for persistence, or writes page-level data into the flash memory chip for persistence.
  • the SSD hard disk main controller 211 After receiving the write request sent by the controller 122, the SSD hard disk main controller 211 will write the data into the SCM memory or the Flash memory according to the different write requests, and update the indexes of the Flash and the SCM. In a possible implementation manner, the main controller 211 decides whether to write the data to the SCM chip or the flash memory chip according to the length of the arriving data, that is, the small IO is written to the SCM chip, and the large IO is written to the flash memory chip. In an optional implementation manner, the SSD main controller may directly determine whether to write data to the SCM chip or the Flash chip according to the interface type written by the controller 122 .
  • Step 850 the main controller 211 updates the index in the SCM translation layer or the flash translation layer.
  • the index in the SCM chip is managed according to the granularity of "sub-page" (for example, 64 bytes), and the two-level translation layer in FIG. All sub-page addresses are found in the root page.
  • Each sub-page address corresponds to a physical address on the SCM memory.
  • the size of the data stored in the physical address PBA can be the size of the cache line in the SCM chip.
  • the page-level index table in FIG. 7 can be implemented using a hash table (Hash table) structure, the hash table includes a page address (key) and a subpage table (value), and the subpages inside the hash table
  • the lookup of the table is implemented using a Radix tree or a Red-black tree.
  • the flash memory chips are managed according to page granularity (eg, 8KB), and a traditional page-level/block-level/block-page mixed management algorithm may be used, which will not be described in detail here.
  • the hard disk 134 also records the mapping from LBA to PBA (ie, the flash translation layer FTL), and the main controller 211 can find the page data on the corresponding PBA in the flash chip according to the LBA.
  • the hard disk 134 provides a byte-level write interface.
  • the main controller 211 receives a byte-level write request (for example, 64B)
  • the main controller 211 first recognizes that the requested data length is less than one page, and determines that it should be Write to SCM. It is assumed that the main controller 211 finds the root page and the sub-page index corresponding to the 64B sub-page data according to the two-level translation layer in FIG. 7 .
  • the main controller 211 inquires whether the storage capacity in the SCM chip 212 is left, and if it is full, it needs to perform a data migration (TierDown) operation first to ensure that there is enough space on the SCM for writing, and the data migration operation The essence is to migrate the data in the SCM chip into the Flash chip or migrate the data in the Flash into the SCM chip, as described in detail later.
  • Figure 10(a) is a schematic flowchart of a method for byte-level writing provided by the present application, specifically:
  • Step a1 write the byte-level data into the SCM chip.
  • the 64-byte sub-page data is written into the SCM chip (SCM chip 213) for persistence, and its physical address is recorded as PA1.
  • Step a2 update the index of the SCM translation layer.
  • the root page address and the sub page address are obtained according to the logical address lba.
  • the paddr in the sub-page table in the hash table is modified to point to the SCM address PA1 where the 64B data is newly written.
  • the page data and index in the Flash chip 212 may not be modified in any way.
  • Step a3 Optionally, write the byte-level data into the cache in the DRAM chip.
  • FIG. 9 is a schematic structural diagram of another hard disk 134-1 provided in Embodiment 1 of the present application.
  • the hard disk 134-1 further includes a dynamic random access memory (Dynamic Random Access Memory, DRAM) chip 215.
  • DRAM Dynamic Random Access Memory
  • the chip 215 can be used to cache data.
  • the main controller 211 checks whether the 8KB page where the sub-page is located exists in the DRAM chip 215 . If it exists, the sub-page data is updated to the DRAM chip 215 as a copy to support subsequent reading, thereby improving the data reading efficiency of the hard disk. Otherwise, the DRAM data is not updated.
  • Embodiment 1 of the present application also provides a method for a page-level write operation, which can be applied to the system 120 in FIG. 3 . Specific examples are given below:
  • the hard disk 134 provides a page-level write interface.
  • a page-level write request for example, 8KB
  • the main controller 211 will first recognize that the requested data length is equal to one page, and determine that it should be written to the Flash core.
  • FIG. 10(b) is a schematic flowchart of a method for page-level writing provided by an embodiment of the present application, and the following steps are performed:
  • Step b1 write the page-level data to the Flash chip.
  • Step b2 update the SCM translation layer index.
  • Step b3 update the flash translation layer index.
  • the index corresponding to the page on the flash translation layer is updated, that is, the physical address PBA pointed to by the logical address of the page is changed to PA2, and the data is successfully written to the page.
  • Step b4 Optionally, write the page-level data into the DRAM chip.
  • the main controller 211 may also write the 8KB page data into the DRAM chip 215 as a copy for caching, so as to support subsequent reading.
  • Embodiment 1 of the present application also provides a method for a page-level read operation, which can be applied to the system 120 in FIG. 3 . Specific examples are given below:
  • the hard disk 134 can also provide a page-level read interface.
  • a page-level read request for example, 8KB
  • the main controller 211 will recognize that the data length of the request is equal to one page, and will initiate a corresponding read.
  • the main controller 211 reads data from the SCM and Flash.
  • FIG. 11 is a schematic flowchart of a method for page-level reading provided by the present application, and the specific steps are as follows:
  • Step c1 optionally, query and read data in the DRAM chip.
  • the data when data copies are updated to DRAM after writing data in SCM and flash memory (as described in steps a3 and b4), the data may have been cached in the DRAM chip 215 when read. , that is, the DRAM chip serves as the read cache of SCM and Flash at the same time.
  • the main controller 211 when the main controller 211 reads data, it may first inquire whether there is page data to be read in the DRAM chip 215 . If the data exists, the cached copy of the data is directly read from the DRAM chip, and a successful read response is returned, without the need for subsequent steps, which effectively improves the read efficiency. If the data does not exist, go to the next step.
  • Step c2 query whether the data exists in the SCM chip and the flash memory chip, and perform step c3 or step c4 or step c5 according to the query result.
  • the main controller 211 performs the following steps in sequence:
  • step c3 is executed; otherwise,
  • step c4 is executed; or,
  • step c5 is performed;
  • Step c3 read data from the SCM chip and return.
  • the main controller 211 will query the 64B subpage index corresponding to the 8KB page node, and read the subpage data stored on the SCM according to the index. Since all sub-pages in the SCM chip can fill up an entire 8KB page, after the SCM read request is executed, the data of the entire page is returned to the controller 122 without waiting for the Flash read request. After the flash read operation is completed, the data can be discarded directly; this scenario may occur when the user writes data less than 8KB pages multiple times without triggering garbage collection or data migration.
  • Step c4 read data from the SCM chip and the flash memory chip, merge and return.
  • the controller 211 will simultaneously read data from the indexes of the SCM and the Flash. After it is determined that the data read from the two media is successful, the two data needs to be merged.
  • the 8KB page data is returned to controller 122 . This scenario occurs when the user has written less than 8KB of data before and has not triggered GC/TierDown.
  • Step c5 read data from the flash memory chip and return.
  • the main controller 211 does not read any data from the SCM memory, but directly reads the 8KB page data in the Flash and returns it to the controller 122 . This scenario occurs when the user has never written data smaller than 8KB pages before, or has triggered garbage collection or data migration (TierDown).
  • FIG. 12 is a schematic flowchart of another page-level reading method provided by the present application.
  • the difference from FIG. 11 is that the premise of the method in FIG. 11 is that the copies of the data written in the SCM and Flash need to be written into the DRAM chip ( That is, execute a3 and b4).
  • the premise of the method in Figure 12 is to write only the copy of the data written to the Flash into the DRAM chip (ie, step b4 is executed), and after the data is persisted in the SCM, the data is not written to the DRAM (step a3 is not executed). ).
  • the DRAM chip in FIG. 11 is used as a read cache for Flash and SCM
  • the DRAM chip in FIG. 12 is only used as a read cache for Flash. Therefore, when the method of FIG. 12 reads the data in the SCM, it is not necessary to read the buffered data in the DRAM first.
  • the specific steps of Figure 12 are as follows:
  • Step c1' read the data from the SCM chip, and return if it is full page data.
  • the main controller 211 will read all 64B sub-pages under the queried page node, and read the sub-page data stored on the SCM according to the index. If the read data can fill an entire 8KB page, the whole page data is returned to the controller 122, and subsequent steps are not performed; otherwise, step c2' is performed.
  • Step c2' query whether the data exists in the flash memory chip and the DRAM cache, and perform step c3' or step c4' according to the query result.
  • the main controller 211 will perform the following steps in sequence: when the entire page data is queried in the DRAM cache chip , step c3' is performed; or, when the entire page of data is queried in the Flash chip, step c4' is performed.
  • step c3' data is read from the DRAM cache, merged with the data read in step c1', and returned.
  • a copy of the page data to be read is cached in the DRAM chip, and the subpage data of the page is written into the SCM later.
  • the page data is read from the DRAM cache, it is combined with the sub-page data read from the SCM chip, and the combined data is returned.
  • the details are similar to the foregoing step c4, and are not repeated here.
  • step c4' data is read from the Flash chip, combined with the data read in step c1' and returned. Refer to the content in the foregoing step c4, which will not be repeated here.
  • Embodiment 1 of the present application also provides a method for data deletion, and specific examples are given below:
  • the main controller 211 When the controller 122 initiates a secure delete (eg, TRIM) command, the main controller 211 will invoke the delete command of the hard disk to implement secure deletion and secure erasure of the specified page. First, the main controller 211 needs to query whether the index of the page address exists in the translation layer of the SCM chip and the Flash chip. If it exists, for the data in the Flash, it is necessary to migrate all other valid pages in the block (block) where the page is located to other blocks, and then erase the data of the block. For the data in the SCM chip, it is necessary to delete the data of all sub-pages under the corresponding root page in the SCM translation layer, and set the data in the old address to zero.
  • a secure delete eg, TRIM
  • controller 122 When the controller 122 initiates a normal delete operation, it only needs to delete the specified page address from the index in the translation layer of the SCM and Flash.
  • Invalid data also known as garbage data
  • garbage data refers to data that does not have any mapping relationship to point to, otherwise it is valid data.
  • Embodiment 1 of the present application also provides a corresponding garbage collection method, which can be applied to the system 120 in FIG. 3 .
  • the operation of garbage collection is performed by the main controller 211, and a specific example is given below:
  • FIG. 13(a) is a flowchart of a method for garbage collection provided in Embodiment 1 of the present application, and the steps are as follows:
  • Step e1 Determine the valid pages in the block to be reclaimed.
  • the main controller 211 selects a block with the most invalid pages from the blocks of the Flash chip, reads out the valid Flash pages in the block, and migrates them to other blocks.
  • Step e2 reading the sub-page data in the SCM chip.
  • the sub-page data of the valid page (eg logical address is LA5) is read from the SCM memory. For example, if three sub-page data are queried through the sub-page index of the specified page in the SCM, the three sub-page data are read into the memory of the controller 211 .
  • Step e3 reading page data in the flash memory chip.
  • the data of the designated page LA5 is read from the Flash into the memory of the controller 211 . If the SCM in step d2 is full of page data, this step does not need to read the data of the page read from the Flash, and all the data of the SCM chip is used.
  • Step e4 merging the data read from the SCM chip and the Flash chip.
  • step d2 the sub-page data read in the aforementioned SCM chip is combined with the data read in the page in the Flash. If the SCM chip in step d2 is full of page data, this step does not need to be merged, and directly proceeds to step e5 to write data.
  • Step e5 write new data to the Flash chip, and update the index.
  • the data in the SCM needs to be migrated to the Flash.
  • the main controller 211 regularly queries the free capacity space of the SCM, if it is found that the used capacity of the SCM exceeds a certain threshold, such as 80%, it needs to start the background task of sub-page migration.
  • Embodiment 1 of the present application also provides a data migration method, which can be applied to the system 120 in FIG. 3 .
  • the operation of data is performed by the main controller 211, and a specific example is given below:
  • a page with the highest degree of aggregation that is, a page with the largest number of sub-pages, may be selected from the SCM, and the page may be eliminated from the SCM space.
  • the data migration algorithm can also consider the time attribute, and the oldest page is selected first.
  • Figure 13(b) is a flowchart of a method for data migration provided by an embodiment of the present application, and the steps are as follows:
  • Step f1 determine the page to be migrated in the SCM chip.
  • the update time interval of each root page is calculated, that is, the difference between the time when the task is currently started and the time when the page is last updated.
  • migration metric update time interval * W 1 + number of sub-pages * W 2 , W 1 and W 2 are the weights of these two dimensions, and calculate the N that needs to be migrated most. pages, that is, the N pages with the largest migration metric.
  • the size of N can be set to be adjustable according to the water level of the SCM. The more the used capacity of the SCM is, the more sub-pages that need to be flowed at one time. Suppose, this step determines that the root page A is the page A to be migrated this time.
  • Step f2 reading the sub-page data in the SCM chip.
  • Step f3 reading page data in the flash memory chip.
  • controller 211 for each 8KB page A that needs to be migrated, it is determined whether page A is full in the SCM:
  • step f2 When the SCM data read in step f2 is full page A, there is no need to read the page from the Flash, and all the data of the SCM is used; or,
  • the SCM data read in step f2 is not a full page, assuming that the data of one of the sub-pages C of the A page is missing, then according to the logical address (root page address and sub-page address) of the page to be migrated, the data B of the page is removed from the Flash read into the memory of the controller 211. Then, merge the data of page B with all sub-pages of page A in the SCM. Specifically, the data belonging to the sub-page C in the data of the acquired page B is read out, and combined with all the sub-pages of the page A in the SCM to form a whole page of data.
  • Step f4 merge the data read from the SCM and the flash memory.
  • Step f5 write data into the flash memory chip, and update the index.
  • the controller 211 writes the merged data in the above step f3 into the newly allocated page address in the flash memory. After the writing is successful, it modifies the index corresponding to the 8KB page in the flash so that it points to a new location. After the index update is successful, the controller 211 deletes the data corresponding to all sub-pages under the 8KB in the SCM and the Flash and related indexes, and completes the operation flow of data migration TierDown.
  • the DRAM cache elimination strategy when the DRAM capacity reaches a certain water level, the DRAM elimination strategy is triggered.
  • the elimination algorithm can choose to use LRU (Least Recently Used, least recently used), LFU (Least Frequently Used, least frequently used), etc., It will not be described in detail here.
  • FIG. 14 is a schematic structural diagram of another storage system 120 provided by an embodiment of the present application, which may be applied to the storage architecture of FIG. 1 or FIG. 2 , and the method in the embodiment of the present application may also be applied to the storage system shown in FIG. 14 . 120 in.
  • the storage system 120 includes two types of hard disks: the hard disk 134-2 is a storage device based on flash media, and the hard disk 134-3 is a storage device based on SCM media. Hard disk 134 .
  • the difference from the storage system 120 shown in FIG. 3 is that the flash memory chip 212 and the SCM chip 213 are located in the two hard disks 134 .
  • the hard disk 134 includes two types: the hard disk 134-2 is a flash memory-based storage device, and the hard disk 134-3 is an SCM-based storage device.
  • the hard disk 134-2 includes one or more flash memory chips 212, and the hard disk 134-3 includes one or more SCM chips 213.
  • the flash memory chips 212 and SCM chips 213 are used to store data written into the hard disk.
  • the hard disks 134-2 and 134-3 may be raw devices, that is, the related functions originally implemented by the hard disk controller 211, such as translation layer, garbage collection, data flow, data merging and other operations, can be transferred to The controller 122 is implemented.
  • the hard disks 134-2 and 134-3 may contain a main controller (not shown in FIG. 14), but they only have some simple functions such as sending/receiving commands, and still need to be complicated. The data processing function of the controller is handed over to the controller 122 for implementation.
  • FIG. 11 is a schematic flowchart of a method for writing data into a storage device according to Embodiment 2 of the present application.
  • This embodiment 2 provides a unified software management layer on the driver layer of the SCM storage device and the flash storage device.
  • the driver layer is installed in the controller 122.
  • the software management layer provides a byte interface and a page interface to the operating system, and Byte-level data is written to the SCM storage device, and page-level data is written to the flash storage device.
  • the application is not aware of the aforementioned data writing location.
  • the functions of the software layer may be performed by the processor of the controller 122, and the functions of the software management layer may also be used to manage the translation layer of the SCM device and the flash device.
  • the garbage collection operation and the data migration operation are executed in the controller 122 , which needs to consume a certain amount of CPU resources of the controller 122 .
  • FIG. 15 is a schematic flowchart of a method for writing data into a storage device provided in Embodiment 2 of the present application.
  • the method can be applied to the storage system shown in FIG. 1 or FIG. 2 or FIG. 14 , and the steps are as follows:
  • Step 1510 The controller 122 receives the service request from the application server.
  • the controller 122 receives a request generated from an application server or an internal application of the storage system 120 . See similar descriptions above, which will not be repeated here.
  • Step 1520 The controller 122 sends a page-level write request to the hard disk 134-2 or a byte-level write request to the hard disk 134-3 according to the service request.
  • the processor 123 of the controller 122 will use the page-level interface provided by the hard disk 134-2 driver software and the byte-level interface provided by the hard disk 134-3 driver software to convert the page-level or page-level interface.
  • the data write request command of the level is sent to the hard disk 134-2 or the hard disk 134-2, respectively.
  • the aforementioned software management layer when the data size in the write request is less than one Flash page, the aforementioned software management layer sends the write command to the driver layer of the SCM device 134-3, and the driver layer sends the write command to In the SCM storage device 134-3; or, when the data size in the write request is equal to one Flash page, the aforementioned software management layer sends the write command to the driver layer of the flash device 134-2, and the driver layer sends the write command sent to the flash memory device 134-2.
  • step 1530 the hard disk 134-2 or the hard disk 134-3 receives the write request.
  • the hard disk 134-2 or the hard disk 134-3 receives a page-level write request or a byte-level write request, respectively.
  • the hard disk 134-2 receives 8KB of write data through the page-level write interface
  • the hard disk 134-3 receives 64B of write data through the byte-level write interface.
  • the content carried in the request please refer to the description in step 730 above, which will not be repeated here.
  • Step 1540 Write byte-level data to the SCM chip for persistence, or write page-level data to the flash memory chip for persistence.
  • the hard disk 134-2 is a flash memory device. After receiving the write request, the main controller (not shown in the figure) of the hard disk 134-2 writes the page data into the flash memory chip persistently.
  • the hard disk 134-3 is an SCM device. After the main controller (not shown in the figure) of the hard disk 134-3 receives the write request, the hard disk 134-3 persistently writes byte-level data into the SCM chip.
  • step 1550 the controller 122 updates the index in the SCM translation layer or the flash translation layer.
  • the SCM translation layer or the flash memory translation layer in this embodiment 2 is the same as that in step 750 of the first embodiment, that is, the SCM chip uses the two-level translation layer in FIG. 6, and the flash memory chip uses the traditional FTL. It is not repeated here. It should be noted that, in Embodiment 2, the index update and management in the translation layer are moved up to the controller 122 to complete.
  • Embodiment 2 also provides methods such as A, B, C, D, E, and F in Embodiment 1, ie, byte-level writing, page-level writing, page-level reading, data deletion, garbage collection, and data migration Wait.
  • A, B, C, D, E, and F in Embodiment 1, ie, byte-level writing, page-level writing, page-level reading, data deletion, garbage collection, and data migration Wait.
  • operations such as data migration, garbage collection, and data merging are also performed by the controller 122 .
  • both can be implemented by invoking instructions by the processor in the corresponding controller (122 or 211).
  • the steps performed by the controller 122 in this embodiment 2 may also be performed by the controller unit 131 in FIG. 3 .
  • the hard disk enclosure 130 may not have the control unit 131, it may also be executed by the aforementioned smart network card.
  • FIG. 16 is a schematic structural diagram of an apparatus 1600 provided by an embodiment of the present application, which can be applied to the storage system 120 to implement the methods in Embodiment 1 and Embodiment 2 of the present application.
  • the device may include: a byte-level writing module 1601, a page-level writing module 1602, a page-level reading module port 1603, a data migration module 1604, a garbage collection module 1605, and a translation layer module 1606, specifically:
  • the byte-level write module 1601 is used to receive a first write request through a byte-level write interface, the first write request carries the first data to be written, and the length of the first data is less than one flash memory page size.
  • the foregoing first data is written into the SCM chip of the solid-state hard disk for persistent storage.
  • a copy of the first data is cached in the DRAM chip.
  • the page-level write module 1602 is configured to receive a second write request through a byte-level write interface, where the second write request carries second data to be written, and the length of the first data is equal to the length of one flash page. size.
  • the second data is written into the flash memory chip of the foregoing solid state drive for persistent storage.
  • the method further includes: caching the copy of the first data in the aforementioned DRAM chip.
  • the data stored in the aforementioned SCM chip contains the same data as at least a part of the aforementioned second data
  • the data stored in the aforementioned SCM chip that is the same as at least a part of the aforementioned data is deleted.
  • the page-level read module 1603 receives a first read request through the page-level read interface to read the aforementioned second data, and the first read request carries the logical address of the second data; logical address, to determine whether the SCM chip has stored the third data that is the same as the root page address of the second data; when the third data has been stored in the aforementioned SCM chip, according to the logical address of the second data.
  • the third data is read from the SCM chip, and the second data is read from the flash memory chip; the third data and the second data read above are combined into a whole page of data and returned.
  • reading the third data from the SCM chip according to the logical address of the second data includes: obtaining the root page address of the second data according to the logical address of the second data; according to the second data The root page address of the third data is obtained, and the sub-page address of the third data is obtained; according to the sub-page address of the third data, the physical address of the third data is obtained, and the third data is read according to the physical address of the third data. .
  • the SCM chip when reading the first data from the SCM chip: determine whether there is a copy of the first data in the DRAM chip; if so, read the first data from the DRAM chip and return; Otherwise, the first data is read from the aforementioned SCM chip.
  • the aforementioned flash memory chip determines whether there is a copy of the second data in the aforementioned DRAM chip; if so, read the aforementioned second data from the aforementioned DRAM chip and return; otherwise , and read the second data from the aforementioned flash memory chip.
  • the data migration module 1604 is used for migrating the first data and other data to the aforementioned flash memory chip with page granularity, the other data and the first data belong to the same root page, and the other data is located in the aforementioned flash memory before the migration In the SCM chip or the aforementioned flash memory chip. specific:
  • the data corresponding to the first root page read from the SCM chip is full page data
  • the data corresponding to the first root page is read from the SCM chip and written into the aforementioned flash memory chip.
  • the data corresponding to the first root page read from the SCM chip is not full page data
  • the data corresponding to the first root page is read from the SCM chip and the flash memory chip respectively, and then written after merging them. into the aforementioned flash memory chip.
  • the root page to be migrated in the SCM chip is determined according to a page aggregation degree, where the page aggregation degree refers to the number of sub-pages included in the root page in the SCM chip, and the root page is composed of multiple sub-pages.
  • a root page with a higher degree of page aggregation or a root page with a longer time from the last update will be preferentially determined as the root page to be migrated.
  • the garbage collection module 1605 is configured to determine the first block to be collected in the aforementioned flash memory chip, where the first block includes at least a first portion of data and a second portion of data, wherein the first portion of data is located in the aforementioned SCM chip, and the second portion of data is located in the aforementioned SCM chip.
  • Data is located in the flash memory chip, and the first part of data and the second part of data are valid data; the first part of data is read from the SCM chip, and the second part of data is read from the flash memory chip; The first part of data and the second part of data are written into the second block in the flash memory chip; the aforementioned first block is erased.
  • the translation layer module 1606 is configured to obtain the root page address and sub-page address of the first data according to the logical address of the first data carried in the first write request; save the root page address and the sub-page address of the The index relationship, and the index relationship between the sub-page address and the first physical address, where the first physical address is the physical address where the first data is written into the aforementioned SCM chip for persistence.
  • this module when reading the third data from the SCM chip according to the logical address of the second data, this module can be used to obtain the root page address of the second data according to the logical address of the second data. ; According to the root page address of this second data, obtain the sub-page address of this 3rd data; According to the sub-page address of this 3rd data, obtain the physical address of this 3rd data, and according to the physical address of this 3rd data The third data is read.
  • module division manner of the apparatus 1600 may also be in other manners, and FIG. 16 is only an example.
  • the three interfaces, the byte-level write interface, the page-level write interface, and the page-level read interface, may be software programming interfaces.
  • the software programming interface can identify different operation instructions and their contents to realize the functions of byte-level writing, page-level writing, and page-level reading.
  • the byte-level write interface and the page-level write interface can be used to identify the same operation command, for example, both are the same write command. In other words, these two interfaces are collectively referred to as write interfaces. . The difference between them is that the data lengths carried in the instructions are different.
  • the data length recognized by the byte-level write interface is less than the flash page, while the page-level write interface recognizes that the data length is equal to the flash page.
  • the byte-level write interface and the page-level write interface may also be different write commands, which are implemented through two write interfaces.
  • the byte-level write interface, page-level write interface, and page-level read interface can also be interfaces implemented by hardware, for example, communicate with upper-layer devices through different physical interfaces and data lines. , to achieve different functions (byte-level writing, page-level writing, and page-level reading), and another example: different functions are implemented through different hardware modules of the device.
  • these three types of interfaces may also be implemented by a combination of software and hardware, and the specific implementation manner is not limited in the present invention.
  • ROM read-only memory
  • RAM random-access memory
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line, or wireless (eg, infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be a computer Any available medium that can be accessed or a data storage device that contains one or more of the available media integration servers, data centers, etc.
  • the available media may be magnetic media, (eg, floppy disks, hard disks, tapes), optical media, or Semiconductor media (such as Solid State Disk (SSD), etc.).
  • the disclosed apparatus and method may be implemented in other manners without exceeding the scope of the present application.
  • the above-described embodiments are only illustrative.
  • the division of the modules or units is only a logical function division.
  • multiple units or components may be combined. Either it can be integrated into another system, or some features can be omitted, or not implemented.
  • the unit described as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units .
  • Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提出了一种将数据写入固态硬盘的方法,该方法同时提供了字节级和页级的写入接口,对于满页的IO,采用页级接口,将其写入Flash芯片以达到高性能和空间利用效率。对于小IO,写数据使用字节级接口,将其写入SCM芯片中,利用了SCM芯片字节级编址的优点和持久化的特性,有效提升了小IO写的效率。本申请实施例中的方法,缓解了小IO写带来的写放大问题,减少空间浪费,有效提升了固态硬盘的使用寿命。

Description

一种将数据写入固态硬盘的方法 技术领域
本发明涉及存储系统领域,并且更具体地,涉及一种将数据写入固态硬盘的方法。
背景技术
基于闪存(Flash)的固态存储器(Solid State Drive,SSD)以页(Page)为基本单元进行数据读/写操作,默认的页大小一般为4KB/8KB/16KB。因此,和SSD通信的上层系统(例如文件系统和块系统)广泛使用页作为数据写入单位,从而简化空间管理,提升元数据存储效率。
当上层系统需要向SSD写入一个小于页大小的数据(即小IO数据)时,通常先从SSD中读取出和该小IO数据属于同一个页的数据,然后将它们合并成一个新的页数据,最后将合并后的页数据作为一个新页写入SSD中。可见,执行一个小IO的写请求会带来对于一个页数据的读取操作和一个写入操作,从而造成读/写放大问题,降低了SSD的寿命。
发明内容
有鉴于此,本申请提供一种数据写入固态硬盘的方法、装置、固态硬盘以及系统,能够提高优化固态硬盘在小IO下写性能较差问题。
第一方面,本申请实施例提供了一种将数据写入固态硬盘中的方法,该方法提供两种写入的接口,即字节级写入接口和页级写入接口。具体的写入方法是:通过该字节级写入接口接收第一写入请求,该第一写入请求中携带待写入的第一数据,该第一数据的长度小于一个闪存页的大小,以及通过该页级写入接口接收第二写入请求,该第二写入请求中携带待写入的第二数据,该第一数据长度等于一个闪存页的大小。
该方法同时提供字节级和页级的写入接口,相比于传统的固态硬盘只能以一个闪存页为写入数据的单位,本方法根据写入的数据大小提供了不同写入接口,当数据长度小于一个闪存页大小时,仍然能接收该数据的写入,有效提高了小IO数据下写性能效率低下的问题。
一种可能的设计方式,将该第一数据写入所述固态硬盘的存储级内存SCM芯片中进行持久化存储;以及将该第二数据写入前述固态硬盘的闪存芯片中进行持久化存储。本方法将满页的数据写入Flash芯片中,以达到高性能和空间利用效率,对于小IO,本方法将小IO数据写入SCM芯片中,减少空间浪费并消除写放大问题。本方法利用了SCM芯片字节级编址的优点和持久化的特性,有效提升了小IO写的效率,缓解了小IO写带来的写放大问题,有效提升了固态硬盘的使用寿命。
一种可能的设计方式,该第一数据的长度大于或等于该SCM芯片的最小管理单元;将该第一数据写入存储级存储SCM芯片中持久化存储包括:以该最小管理单元存储该第一数据。
一种可能的设计方式,该第一写入请求中还携带该第一数据的逻辑地址,该第一数据写入前述SCM芯片进行持久化的物理地址为第一物理地址;根据该第一数据的逻辑地址,获得所述第一数据的根页地址和子页地址;保存该根页地址和该子页地址的索引关系,以及该子页地址和所述第一物理地址之间的索引关系。本申请通过多级的索引关系管理,建立页级数据和字节级数据的翻译层,有效提升了存储系统的管理效率。
一种可能的设计方式,在将该第二数据写入该闪存芯片中持久化存储之后:当前述SCM芯片中存储的数据包含与前述第二数据的至少一部分数据相同的数据时,删除前述SCM芯片中存储的与前述的至少一部分数据相同的数据。本方法及时删除SCM芯片中的无效数据和无效索引,能提高SCM芯片的使用效率。
一种可能的设计方式,提供页级读取接口,通过该页级读取接口接收第一读取请求以读取前述第二数据,该第一读取请求中携带该第二数据的逻辑地址;根据该第二数据的逻辑地址,判断该SCM芯片中是否已存储与该第二数据的根页地址相同的第三数据;当前述的SCM芯片中已存储该第三数据时,根据该第二数据的逻辑地址从该SCM芯片中读取该第三数据,以及从该闪存芯片中读取的该第二数据;将前述读取到的所述第三数据和所述第二数据合并成整页数据后返回。
一种可能的设计方式,根据前述第二数据的逻辑地址从前述SCM芯片中读取该第三数据,包括:根据该第二数据的逻辑地址,获得该第二数据的根页地址;根据该第二数据的根页地址,获得该第三数据的子页地址;根据该第三数据的子页地址,获得该第三数据的物理地址,并根据该第三数据的物理地址读取所述第三数据。
一种可能的设计方式,在前述将第二数据的写入所述闪存芯片持久化存储之后,还包括:将该第二数据的副本缓存在前述固态硬盘的DRAM芯片中。
一种可能的设计方式,在前述将第一数据的写入所述固态硬盘的SCM芯片中进行持久化存储之后,还包括:将该第一数据的副本缓存在前述DRAM芯片中。
可选的,当从前述SCM芯片中读取前述第一数据时:判断前述DRAM芯片中是否存在第一数据的副本;如果存在,则从前述DRAM芯片中读取所述第一数据并返回;否则,从前述SCM芯片中读取该第一数据。
可选的,当从前述闪存芯片中读取前述第二数据时:判断前述DRAM芯片中是否存在第二数据的副本;如果存在,则从前述DRAM芯片中读取前述第二数据并返回;否则,从前述闪存芯片中读取该第二数据。
一种可能的设计方式,将该第一数据与其他数据以页为粒度迁移至前述闪存芯片中,该所述其他数据与该第一数据属于同一个根页,该其他数据在迁移之前位于前述SCM芯片或前述闪存芯片中。
本申请实施例第一方面还提供了一种数据迁移的方法,该方法可应用于固态硬盘中,该方法包括:确定SCM芯片中需要迁移的第一根页,该SCM芯片位于该闪存固态硬盘中;从该SCM芯片和/或闪存芯片中读取所述第一根页对应的数据,该闪存芯片位于前述闪存固态硬盘中;将所述第一页面对应的数据合并或直接写入闪存芯片中。
一种可能的设计方式,当从SCM芯片中读取第一根页对应的数据为整页数据,则只从SCM芯片读取该第一根页对应的数据,并将其写入前述闪存芯片中。
一种可能的设计方式,当从SCM芯片中读取第一根页对应的数据为不是整页数据,则分别从SCM芯片和闪存芯片中读取该第一根页对应的数据,并将其合并后写入前述闪存芯片中。
一种可能的设计方式,删除该SCM芯片中的所述第一根页对应的数据。
一种可能的设计方式,删除该第一根页对应的索引,并建立新索引。
一种可能的设计方式,根据页面聚合度确定SCM芯片中需要迁移的根页,所述页面聚合度指SCM芯片中根页面包含的子页面的数量,所述根页面由多个子页面组成。
一种可能的设计方式,前述页面聚合度越高的根页面或距离上次更新时间越久的根页面, 会被优先确定为所述需要迁移的根页。
本申请实施例第一方面还提供了一种垃圾回收的方法,该方法包括:确定前述闪存芯片中需要回收的第一块,该第一块至少包括第一部分数据和第二部分数据,其中,该第一部分数据位于前述SCM芯片中,该第二部分数据位于该闪存芯片中,并且该第一部分数据和该第二部分数据是有效数据;从该SCM芯片中读取所述第一部分数据,从所述闪存芯片中读取所述第二部分数据;将该第一部分数据和该第二部分数据写入闪存芯片中的第二块中;擦除前述第一块。
第二方面,本申请实施例提供了一种固态硬盘,其特征在于,该固态硬盘中包括主控制器、多个闪存芯片以及一个或多个SCM芯片;该主控制器执行计算机指令第一方面以及其任一种可能的设计方式中的方法。
一种可能的设计方式,该SCM芯片与该闪存芯片连接在不同通道控制器上,其中,该通道控制器位于该主控制器中。
第三方面,本申请实施例提供了一种将数据写入固态硬盘的装置,该装置包括多个模块,该装置用于实现由第一方面以及其任一种可能的设计方式中的方法实现的功能。
一种可能的设计方式,该装置包括:
字节级写入模块,用于通过字节级写入接口,接收第一写入请求,该第一写入请求中携带待写入的第一数据,该第一数据的长度小于一个闪存页的大小。
可选的,将前述第一数据的写入所述固态硬盘的SCM芯片中进行持久化存储。
可选的,在前述将第一数据的写入所述固态硬盘的SCM芯片中进行持久化存储之后,将该第一数据的副本缓存在前述DRAM芯片中。
页级写入模块,用于通过字节级写入接口,接收第二写入请求,该第二写入请求中携带待写入的第二数据,该第一数据长度等于一个闪存页的大小。
可选的,将该第二数据写入前述固态硬盘的闪存芯片中进行持久化存储。
可选的,当前述SCM芯片中存储的数据包含与前述第二数据的至少一部分数据相同的数据时,删除前述SCM芯片中存储的与前述的至少一部分数据相同的数据。
可选的,在前述将第一数据的写入所述固态硬盘的SCM芯片中进行持久化存储之后,还包括:将该第一数据的副本缓存在前述DRAM芯片中。
一种可能的设计方式,该装置还包括:
页级读取模块,通过该页级读取接口接收第一读取请求以读取前述第二数据,该第一读取请求中携带该第二数据的逻辑地址;根据该第二数据的逻辑地址,判断该SCM芯片中是否已存储与该第二数据的根页地址相同的第三数据;当前述的SCM芯片中已存储该第三数据时,根据该第二数据的逻辑地址从该SCM芯片中读取该第三数据,以及从该闪存芯片中读取的该第二数据;将前述读取到的所述第三数据和所述第二数据合并成整页数据后返回。
可选的,根据前述第二数据的逻辑地址从前述SCM芯片中读取该第三数据,包括:根据该第二数据的逻辑地址,获得该第二数据的根页地址;根据该第二数据的根页地址,获得该第三数据的子页地址;根据该第三数据的子页地址,获得该第三数据的物理地址,并根据该第三数据的物理地址读取所述第三数据。
可选的,当从前述SCM芯片中读取前述第一数据时:判断前述DRAM芯片中是否存在第一 数据的副本;如果存在,则从前述DRAM芯片中读取所述第一数据并返回;否则,从前述SCM芯片中读取该第一数据。
可选的,当从前述闪存芯片中读取前述第二数据时:判断前述DRAM芯片中是否存在第二数据的副本;如果存在,则从前述DRAM芯片中读取前述第二数据并返回;否则,从前述闪存芯片中读取该第二数据。
一种可能的设计方式,该装置还包括数据迁移模块,用于:将该第一数据与其他数据以页为粒度迁移至前述闪存芯片中,该所述其他数据与该第一数据属于同一个根页,该其他数据在迁移之前位于前述SCM芯片或前述闪存芯片中。具体的:
确定SCM芯片中需要迁移的第一根页,该SCM芯片位于该闪存固态硬盘中;从该SCM芯片和/或闪存芯片中读取所述第一根页对应的数据,该闪存芯片位于前述闪存固态硬盘中;将所述第一页面对应的数据合并或直接写入闪存芯片中。
可选的,当从SCM芯片中读取第一根页对应的数据为整页数据,则从SCM芯片读取该第一根页对应的数据,并将其写入前述闪存芯片中。
可选的,当从SCM芯片中读取第一根页对应的数据为不是整页数据,则分别从SCM芯片和闪存芯片中读取该第一根页对应的数据,并将其合并后写入前述闪存芯片中。
可选的,删除该SCM芯片中的所述第一根页对应的数据。
可选的,删除该第一根页对应的索引,并建立新索引。
可选的,根据页面聚合度确定SCM芯片中需要迁移的根页,所述页面聚合度指SCM芯片中根页面包含的子页面的数量,所述根页面由多个子页面组成。
可选的,前述页面聚合度越高的根页面或距离上次更新时间越久的根页面,会被优先确定为前述需要迁移的根页。
一种可能的设计方式,该装置还包括垃圾回收模块,用于:确定前述闪存芯片中需要回收的第一块,该第一块至少包括第一部分数据和第二部分数据,其中,该第一部分数据位于前述SCM芯片中,该第二部分数据位于该闪存芯片中,并且该第一部分数据和该第二部分数据是有效数据;从该SCM芯片中读取所述第一部分数据,从所述闪存芯片中读取所述第二部分数据;将该第一部分数据和该第二部分数据写入闪存芯片中的第二块中;擦除前述第一块。
一种可能的设计方式,该装置还包括翻译层模块,用于:根据该第一写入请求中还携带该第一数据的逻辑地址,获得所述第一数据的根页地址和子页地址;保存该根页地址和该子页地址的索引关系,以及该子页地址和所述第一物理地址之间的索引关系,第一物理地址为该第一数据写入前述SCM芯片进行持久化的物理地址。
可选的,当根据前述第二数据的逻辑地址从前述SCM芯片中读取该第三数据时,本模块可以用于:根据该第二数据的逻辑地址,获得该第二数据的根页地址;根据该第二数据的根页地址,获得该第三数据所的子页地址;根据该第三数据的子页地址,获得该第三数据的物理地址,并根据该第三数据的物理地址读取所述第三数据。
第四方面,本申请实施例提供了一种将数据写入存储设备的方法,该方法包括:控制器向SCM存储设备发送第一写入请求,该第一写入请求中携带待写入的第一数据,该第一数据的长度小于一个闪存页的大小;该SCM存储设备通过字节级写入接口接收该第一写入请求;该控制器向闪存存储设备发送第二写入请求,所述第二写入请求中携带待写入的第二数据,该第二数据长度等于一个闪存页的大小;该闪存存储设备通过页级写入接口接收该第二写入请求。
可选的,将一个闪存页的写入请求拆分成一个或多个所述第一写入操作。
可选的,为该第一写入请求和第二写入请求设置不同的优先级。
一种可能的设计方式,该SCM存储设备持久化存储该第一数据,该闪存存储设备持久化存储该第二数据。
一种可能的设计方式,该第一数据的长度大于或等于所述SCM存储设备的最小管理单元;该SCM存储设备持久化存储该第一数据包括:以该最小管理单元存储前述第一数据。
一种可能的设计方式,该第一写入请求中还携带所述第一数据的逻辑地址,该第一数据写入前述
SCM存储设备中进行持久化的物理地址为第一物理地址;前述控制器根据该第一数据的逻辑地址,获得该第一数据的根页地址和子页地址;该控制器保存该根页地址和所述子页地址的索引关系,以及该子页地址和所述第一物理地址之间的索引关系。
一种可能的设计方式,在前述闪存存储设备持久化存储前述第二数据之后:当前述SCM存储设备中存储的数据包含与该第二数据的至少一部分数据相同的数据时,前述控制器删除前述SCM存储设备中存储的与前述的至少一部分数据相同的数据。
一种可能的设计方式,前述控制器发送第一读取请求以读取前述第二数据,前述第一读取请求中携带该第二数据的逻辑地址;根据该第二数据的逻辑地址,判断前述SCM存储设备中是否已存储与前述第二数据的根页地址相同的第三数据;当前述SCM存储设备中已存储该第三数据时,根据该第二数据的逻辑地址从前述SCM存储设备中读取该第三数据,以及从所述闪存存储设备中读取该第二数据;将读取到的第三数据和第二数据合并成整页数据后返回。
一种可能的设计方式,根据前述第二数据的逻辑地址从前述SCM存储设备中读取该第三数据,包括:根据该第二数据的逻辑地址,获得该第二数据的根页地址;根据该第二数据的根页地址,获得该第三数据所的子页地址;根据该第三数据的子页地址,获得该第三数据的物理地址,并根据该第三数据的物理地址读取所述第三数据。
一种可能的设计方式,前述控制器将该第一数据与其他数据以页为粒度迁移至前述闪存存储设
备中,该其他数据与该第一数据属于同一个根页,该其他数据在迁移之前位于前述SCM存储设备或前述闪存存储设备中。
第五方面,本申请实施例提供了一种存储系统,该存储系统包括:控制器、SCM存储设备以及闪存存储设备;控制器向SCM存储设备发送第一写入请求,该第一写入请求中携带待写入的第一数据,该第一数据的长度小于一个闪存页的大小;该SCM存储设备通过字节级写入接口接收该第一写入请求;该控制器向闪存存储设备发送第二写入请求,所述第二写入请求中携带待写入的第二数据,该第二数据长度等于一个闪存页的大小;该闪存存储设备通过页级写入接口接收该第二写入请求。。
可选的,该控制器将一个闪存页的写入请求拆分成一个或多个所述第一写入操作。
可选的,该控制器为该第一写入请求和第二写入请求设置不同的优先级。
一种可能的设计方式,该SCM存储设备持久化存储该第一数据,该闪存存储设备持久化存储该第二数据。
一种可能的设计方式,该第一数据的长度大于或等于所述SCM存储设备的最小管理单元;该SCM存储设备持久化存储该第一数据包括:以该最小管理单元存储前述第一数据。
一种可能的设计方式,该第一写入请求中还携带该第一数据的逻辑地址,该第一数据写入该SCM存储设备中进行持久化的物理地址为第一物理地址;该控制器根据该第一数据的逻辑地址,获得该第一数据的根页地址和子页地址;该控制器保存该根页地址和该子页地址的索引关系,以及该子页地址和该第一物理地址之间的索引关系。
一种可能的设计方式,在前述闪存存储设备持久化存储前述第二数据之后,该控制器设备还用于:当该SCM存储设备中存储的数据包含与该第二数据的至少一部分数据相同的数据时,删除该SCM存储设备中存储的与该的至少一部分数据相同的数据。
一种可能的设计方式,前述控制器发送第一读取请求以读取前述第二数据,前述第一读 取请求中携带该第二数据的逻辑地址;根据该第二数据的逻辑地址,判断前述SCM存储设备中是否已存储与前述第二数据的根页地址相同的第三数据;当前述SCM存储设备中已存储该第三数据时,根据该第二数据的逻辑地址从前述SCM存储设备中读取该第三数据,以及从所述闪存存储设备中读取该第二数据;将读取到的第三数据和第二数据合并成整页数据后返回。
一种可能的设计方式,前述控制器还用于:根据前述第二数据的逻辑地址从前述SCM存储设备中读取该第三数据,包括:根据该第二数据的逻辑地址,获得该第二数据的根页地址;根据该第二数据的根页地址,获得该第三数据所的子页地址;根据该第三数据的子页地址,获得该第三数据的物理地址,并根据该第三数据的物理地址读取所述第三数据。
一种可能的设计方式,将该第一数据与其他数据以页为粒度迁移至该闪存存储设备中,该其他数据与该第一数据属于同一个根页,该其他数据在迁移之前位于该SCM存储设备或该闪存存储设备中。
第六方面,本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质包括程序指令,当该程序指令在计算机或处理器上运行时,使得该计算机或该处理器执行第一方面及其任一种可能的设计中的方法。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质包括程序指令,当该程序指令在计算机或处理器上运行时,使得该计算机或该处理器执行第二方面及其任一种可能的设计中的由控制器执行的方法。
第七方面,本申请提供一种计算机程序产品,该计算机程序产品包括指令,该指令存储在计算机可读存储介质中。控制器的处理器可以从计算机可读存储介质读取该指令;该处理器执行该指令,使得存储系统或存储设备实现上述第一方面或者第一方面的各种可能设计提供的方法。
本申请提供一种计算机程序产品,该计算机程序产品包括指令,该指令存储在计算机可读存储介质中。控制器的处理器可以从计算机可读存储介质读取该指令;该处理器执行该指令,使得存储系统或存储设备实现上述第二方面或者第二方面的各种可能设计提供的方法。
附图说明
为了更清楚的说明本发明实施例或现有技术中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例。
图1是本申请实施例提供一种存储网络架构示例图。
图2是本申请实施例提供另一种存储网络架构示例图。
图3是本申请实施例提供的一种存储系统120内部的结构示意图。
图4是本申请实施例提供的一种固态硬盘SSD 300的结构示意图。
图5是本申请实施例提供的一种闪存芯片212的结构示意图。
图6是本申请实施例提供的一种SCM芯片213的结构示意图。
图7是本申请实施例提供的一种两级翻译层的软件方案示意图。
图8是本申请实施例提供的一种将数据写入固态硬盘的方法流程示意图.
图9是本申请实施例提供的一种硬盘134-1的结构示意图。
图10(a)是本申请实施例提供的一种字节级写入的方法流程示意图。
图10(b)是本申请实施例提供的一种页级写入的方法流程示意图
图11是本申请实施例提供的一种页级读取的方法流程示意图。
图12是本申请实施例提供的一种页级读取的方法流程示意图。
图13(a)是本申请实施例提供的一种垃圾回收的方法流程图。
图13(b)是本申请实施例提供的一种数据迁移的方法流程图。
图14是本申请实施例提供的一种存储系统120内部的结构示意图。
图15是本申请实施例提供的一种将数据写入存储设备的方法流程示意图。
图16是本申请实施例提供的一种装置1600结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚的描述。显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。
图1是本申请实施例提供一种存储网络架构示例图,该存储网络架构中包括应用服务器100、交换机110和存储系统120(或存储节点),本申请提供的方法可以应用于如图1所示的存储网络架构中。
应用服务器100可以是物理机,也可以是虚拟机。物理机包括但不限于桌面电脑、服务器、笔记本电脑以及移动设备。在图1所示的应用场景中,用户通过应用服务器100上运行的应用程序来存取数据。
交换机110是一个可选设备,应用服务器100可以通过光纤交换机110访问存储系统120以存取数据。应用服务器100也可以直接通过网络与存储系统120通信。此外,光纤交换机110也可以替换成以太网交换机、InfiniBand交换机、RoCE(RDMA over Converged Ethernet)交换机等。
存储系统120包括引擎121和一个或多个硬盘134,引擎121是集中式存储系统中最为核心的部件,许多存储系统的高级功能都在其中实现。
图1中所示的存储系统120是盘控一体的存储系统,引擎121具有硬盘槽位,硬盘134可直接部署在引擎121中,即硬盘134和引擎121部署于同一台设备。
引擎121中包括一个或多个控制器,以图1中的两个控制器122为例,这两个控制器之间具有镜像通道,使得两个控制器122可以互为备份,从而避免硬件故障导致整个存储系统120的不可用。引擎121还包含前端接口125和后端接口126,其中前端接口125用于与应用服务器100通信,从而为应用服务器100提供存储服务。而后端接口126用于与硬盘134通信,以扩充存储系统的容量。
控制器122中至少包括处理器123、内存124。处理器123是一个中央处理器(central processing unit,CPU),用于处理来自存储系统外部(服务器或者其他存储系统)的数据访问请求,也用于处理存储系统内部生成的请求。内存124是指与处理器直接交换数据的内部存储器。内存124包括至少两种存储器,例如内存124既可以是随机存取存储器(Random Access Memory,RAM),也可以是只读存储器(read only memory,ROM)。例如,随机存取存储器是动态随机存取存储器(dynamic random access memory,DRAM)和静态随机存取存储器(static random access memory,SRAM)等。本申请并不限定内存124的具体类型,凡是能够作为内存124与处理器进行交互的存储器均适应于本申请实施例。内存124中存储有软件程序,处理器123运行内存124中的软件程序可实现对硬盘的管理。例如,将硬盘抽象化为存储资源池,然后划分为LUN提供给文件文件服务器使用等,这里的LUN其实就是在应用服务器上看到的硬盘。一些集中式存储系统本身也是文件服务器,可以为应用服务器提供共享文件服务。
硬盘134用于提供存储资源,例如存储数据。按照引擎121与硬盘134之间通信协议的 类型,硬盘134可能是串行连接SCSI(Serial Attached SCSI,SAS)硬盘、非易失性存储器标准(Non-Volatile Memory Express,NVMe)硬盘、外围设备高速连接标准(Peripheral Component Interconnect express,PCIe)硬盘、串行ATA(Serial Advanced Technology Attachment,SATA)硬盘以及其他类型的硬盘框。在本申请实施例中,硬盘134可以为闪存SSD。硬盘134还可以是磁盘或者其他类型的存储介质,例如固态硬盘或者叠瓦式磁记录硬盘等。
本申请还提供另一种存储系统120的结构。如图2所示,存储系统120还可以是盘控分离的存储系统,引擎121可以不具有硬盘槽位,即引擎121和硬盘134是部署在两台设备上。这种情况下,系统120中还具备硬盘框130,硬盘框130包括控制单元131、若干个硬盘134和网卡(图中未示出)。硬盘框130包括控制单元131和若干个硬盘134。控制单元131可具有多种形态。
在一种实施方式中,硬盘框130属于智能盘框,控制单元131包括CPU和内存,控制单元131包括CPU和内存。该CPU用于执行地址转换以及读写数据等操作。网卡用于与其他服务器110通信。在另一种实施方式中,控制单元131的功能可以卸载到网卡上。换言之,在该种实施方式中,硬盘框130内部可以不具有控制单元131,而是由网卡来完成数据读写、地址转换以及其他计算功能。此时,该网卡是一个智能网卡。
可选的,上述的系统120除了可以应用于如图1和图2的集中式存储架构中,还可以应用于各种形态的分布式存储系统中。
图3是图1或图2所示的存储系统120内部的结构示意图。如图2所示,存储系统120包括硬盘134-1,硬盘134-1可以为前述硬盘134的一种实施方式。存储系统120包括混合的存储介质,即硬盘134中包括了闪存(Flash)和存储级存储(Storage Class Memory,SCM)两类存储介质。SCM又称为持久型存储器(persistent memory),是一种结合传统储存装置与存储器特性的复合型储存技术,SCM同时具备了持久化和快速字节级访问的特点。
一种可能的实施方式,如图3所示,硬盘134-1内部包括闪存芯片212和SCM芯片213,以及主控制器211。其中,SCM芯片采用独立线路连接SSD主控制器211。主控制器211是一种嵌入式微芯片,当中还包含处理器、存储器、通信接口(图3中未示出),其功能就像命令中心,执行和硬盘相关的操作请求,详细介绍请参照图4的描述。
SCM芯片213主要用于小IO(字节级)的写入,闪存芯片212可用于页级写入。关于SCM芯片和闪存芯片的相关内容将在后面图5和图6详细描述。
需要说明的是:在存储系统中,小IO指不足一个满页数据的请求,以小IO写为例:当满页为8KB定长时,低于8KB的数据写入就称作小IO写,例如64B、2KB、4KB大小的数据都写入都可以称作小IO写。同理可得,当满页为4KB定长时,低于4KB的数据写入就称作小IO写。实际中,一个满页的大小还可以是16KB、32KB或64KB等等,本申请不对页的大小进行限定。
图4是本申请实施例提供的一种固态硬盘SSD 300的结构示意图,该固态硬盘300可以是图3中的硬盘134-1。SSD 300是一种主要以闪存(例如NAND Flash)和SCM芯片作为永久性存储器的存储设备。
如图4所示,固态硬盘300包括NAND闪存213、主控制器(简称:主控)211以及SCM芯片214,213和214可分别用于持久化写入页级和字节级数据。其中,SCM芯片213可以包括相变存储器(Phase-change memory,PCM)、电阻式随机存储存取非易失性存储器(Resistive Random Access Memory,ReRAM)、非易失性磁性随机存储器(Magnetic Random  Access Memory,MRAM)或碳纳米管随机存储器(Montero’s CNT Random Access Memory,NRAM)等其他任何同时具备字节级寻址和持久化特性的SCM芯片。
主控211是SSD的控制中心,负责一些复杂的任务,如管理数据存储、维护SSD性能和使用寿命等。它包括处理器102,发出SSD的所有操作请求,例如后文提到的翻译层管理、数据合并、垃圾回收、数据迁移等功能也都可以由主控211执行。例如,主控211中的处理器102可通过缓冲区中的固件来执行读取/写入数据,垃圾回收以及磨损均衡等功能。SSD主控211还包括主机接口104和若干个通道控制器。其中,主机接口104用于与主机通信。这里的主机可以指服务器、个人电脑或者诸如控制器122等任何设备。通过若干个通道控制器(例如通道控制器0和1),主控211可以并行操作通道0的闪存芯片212和通道1的SCM芯片213,从而提高底层的带宽。
图5是图3或图4中闪存芯片(或flash芯片)212的结构示意图。如图5所示,die是一个或多个闪存芯片的封装。一个die可包含多个区域(Plane),多Plane NAND是一种能够有效提升性能的设计。如图5所示,一个die内部分成了2个Plane,一个Plane包含多个块(block)。而一个块由若干个页(page)组成。
以一个16GB容量的闪存芯片举例,每4314*8=34512个cell逻辑上形成一个页,每个页中可以存放4KB的内容和218B的ECC校验数据,页也是IO操作的最小单位。每128个页组成一个块,每个2048个块组成一个区域,一整片闪存芯片由两个区域组成,两个区域可以并行操作。这只是一个示例,页的尺寸、块的容量、闪存芯片的容量都可以有不同的规格,本实施例不予限定。
页是闪存芯片中数据写入的最小单位,换言之,主控211是以页为粒度往块里面写入数据的。同样,页也是闪存芯片中数据读取的最小单位。
当一个块写满的时候,SSD的主控211会挑选下一个块继续写入。块是数据擦除和垃圾回收的最小单位,主控211在擦除数据时,每次只能擦除整个块。
图6是图3或图4所示的SCM芯片213的结构示意图。如图6所示,SCM芯片213包括多个存储体(bank),bank是芯片中用于存储数据的部件。例如,一个容量为32GB的SCM芯片,其中包括32个存储体,那么每个存储体具有1GB的容量。可选的,bank又被划分为多个分区(partition),在同一存储体内部的多个分区之间提供并行性。
在bank中,存储数据是按照行(row)和列(col)的管理实现字节级寻址功能,每个字线和位线相交处可确定一个存储单元(例如相变存储单元、浮栅晶体管),其中字线和位线是相互垂直的两条数据线,用于连接多个存储单元。在bank中,一条字线连接一个row的多个存储单元。可以通过在选中的字线上施加和其他字线不同的电平值,从而选择某一个row的多个存储单元,进行数据读写等操作。实际中,SCM芯片213中还可以包括行解码器、列解码器、命令编译器、驱动电路和数字控制器等模块(图6中未示出)。
此外,图6的SCM芯片中还具有“行缓存”的模块,用于缓存前述一个row的数据。可见,SCM芯片213可按照一个row的长度来进行读取和写入的。换句话说,row可作为SCM芯片的最小管理单元。
以SCM芯片中一个row的大小为64字节举例,在接收到控制器211的读取命令后,SCM芯片先将64字节数据从存储体的一个row中读取到行缓存里,然后再将该64字节数据从行缓存发送到控制器211。同理,写入的64字节数据需要先从控制器211写回到指定行缓存,然后再被写入存储体的一个row中。另,缓存行的大小本实施例不予限定。
一种可能实施方式中,本申请实施例后文中的“子页”中的数据,可以指此处的SCM中 一个row的数据。通常一个row的大小是远小于闪存212中“页”的大小,换句话说,“子页”数据是小于“页”的数据。因此,一个页可以由多个子页组成,例如:一个8KB的页可以包括128个64字节的子页。
通常在闪存固态硬盘中,都存在闪存翻译层(Flash Translation Layer,FTL)的模块,用于实现主机的逻辑地址(Logical Block Address,LBA)和闪存存储器中物理地址(Physical Block Address,PBA)之间的转换。例如,读取数据的时候,Flash芯片是按照页为单位进行读取的。例如,主控制器211可以根据控制器211发送的页的逻辑地址,找到闪存芯片中的物理地址,从中读取出所需数据。一种可能的实施方案,该FTL可以通过哈希表(Hash table)实现。
然而,由于本申请提出的方案涉及了闪存芯片和SCM芯片中两种粒度的数据读写,即页级和字节级的数据,为了进一步管理字节级和页级的数据,还需要建立页级数据和字节级数据的翻译层,提升存储系统的管理效率。
针对上述问题,图7是本申请提出的一种两级翻译层的软件方案示意图,将前述写入SCM芯片中字节级数据抽象成“子页”的概念进行管理,可以应用于本申请实施例中的系统120。一种可能的实施方式,上述的两级翻译层的方法是在SSD主控制器(例如主控制器211)中执行的。
例如,图7中的两级翻译层包括两级索引:(1)根据根页地址在查找到对应根页(2)根据子页地址在该根页的子页表中,查找到对应子页数据的物理地址。然后,就可以根据物理地址从SCM芯片中访问相应数据了,具体的:
首先,如图7所示,主控211接收到字节级写入请求req,该请求中携带待写入数据以及该待写入数据的逻辑地址(lba)和数据的长度(length)。假设硬盘134-1中的闪存页的大小为8KB,请求req中的数据长度length为64字节。控制器211可以根据请求中的逻辑地址lba求得对应的根页地址、子页地址。
例如:根页地址=lba/8KB,子页面地址=(lba%8KB)/64B。
接着,根据前述根页地址(lba/8KB),查询图6中的页级索引表(左一),如果该页级索引表中存在该根页地址,说明SCM存储了该页数据,可以获取到该页对应的子页表。例如:当lab/8KB=2时,可以查找到页级索引表中根页2对应的子页表table2。
其次,根据子页地址(lba%8KB)/64B),查询子页表table2,如果table2中存在该子页地址,说明SCM存储了该子页数据。例如:当(lba%8KB)/64B=12时,可以查找到子页12对应的物理地址Padder2。如果table2中未查找到对应子页地址,说明该子页无有效数据在SCM芯片中,则在该根页的子页表中插入的新子页,例如在table2最后添加一项<子页地址,物理地址>映射。
最后,根据物理地址访问Paddr12指向SCM存储空间,例如向该存储空间中写入req携带的待写入数据。
一种可能的实施方式,该两级翻译层中的两层索引是存放于SCM芯片中隔离存储空间中。
一种可选的实施方式,上述的两级翻译层的方法还可以由固态硬盘上层系统中处理器执行(例如控制框中的控制器122),具体方法和由主控制器211执行是类似的,此处不再赘述。
基于前述内容,本申请还提供了一种将数据写入固态硬盘的方法,该方法同时提供字节级和页级的写入接口,能够解决小IO数据下写性能效率低下的问题。具体的:对于满页的IO,本方法实施例,采用使用传统页级接口写入Flash芯片中以达到高性能和空间利用效率。对于小IO,写数据使用字节级接口写入SCM芯片中,减少空间浪费并消除写放大问题,利用 了SCM芯片字节级编址的优点和持久化的特性,有效提升了小IO写的效率,缓解了小IO写带来的写放大问题,有效提升了固态硬盘的使用寿命。针对图2和图3提出两种系统架构,本申请给出2个实施例进一步介绍本申请提出的方法:
实施例1:
基于图3中的存储系统120,本实施例提供了一种将数据写入固态硬盘的方法,图8是实施例1给出的一种将数据写入固态硬盘的方法流程示意图。本方法实施例中的SCM芯片位于在闪存SSD(硬盘134-1)内部,一种可能的实施方式,SCM作为持久化缓存,对控制器122中的软件透明,即控制器122中的软件只能看到Flash芯片的可用空间。控制器122的软件中发出的读写请求,通过SSD驱动软件提供字节级写入接口和页级写入接口,将数据写入硬盘134-1中的闪存芯片或SCM芯片。此外,本实施例1中的Flash的垃圾回收操作和SCM的数据迁移操作可以由硬盘134-1的主控制器211中执行,不消耗控制器122的CPU资源,详细介绍参见后文。
图8中的步骤如下:
步骤810、控制器122接收应用服务器的业务请求。
存储系统120中的控制器122接收来自存储系统外部(应用服务器100或者其他存储系统)的发来的业务请求,该业务请求可用于访问存储系统中的存储的数据。可选的,控制器122也可以接收的是存储系统内部应用生成的请求。
步骤820、控制器122根据业务请求,向硬盘134-1发送字节级或页级的写入请求。
控制器122接收到访问请求后,控制器122的处理器123会通过硬盘134-1的驱动层(驱动软件)提供的两种接口,将字节级或页级的数据请求发送至固态硬盘134-1中。
一种可能的实施方式,在将上述数据请求发送至硬盘134-1中之前,处理器123调用系统内核中的文件系统,由文件系统决策是发起字节级还是页级别的请求。然后,通过系统内核中的I/O调度器,对这两类请求进行调度优化。最终,经驱动软件提供的字节级和页级的接口,把这两类请求发送至硬盘134-1中。
关于这部分的涉及到的文件系统决策和IO调度,下面提供具体举例说明:
(1)文件系统决策
现有文件/块系统使用的软件栈是按照内存页(4KB或8KB)淘汰到SSD存储器,无法使用到字节级接口。对于本申请实施例中的小IO写场景,本申请还提供了一种文件系统的适配方案。例如,文件/块系统可以决策是否将一个8KB页将拆分成一个或多个64B的写操作,通过设备驱动程序写入主控制器211。
考虑到DRAM内存(例如图1中的内存124)的最小访问单位是缓存行(Cacheline),而控制器122的文件/块系统按照页粒度管理内存单元,缓存行(例如64B大小)小于一个页的大小,和SCM芯片中数据管理单位类似。一种可能的实施方式,文件系统可以根据要修改的缓存行数据(即子页数量)决策使用字节写还是页写,由于混合SSD(例如硬盘134-1)中的SCM空间有限,缓存更多的子页能带来更高的收益。具体的:
控制器122对内存页中的每个子页数据(与缓存行大小相同)添加1比特标志,记录该页里的每个缓存行是否被修改过,被修改过的缓存行被标记为脏子页。当内存脏页被刷到硬盘134-1中时,文件系统可以通过下发多个64B的小IO写入请求,适配硬盘134-1的字节级写入接口,将脏子页数据写入SCM芯片中。干净子页数据与闪存芯片中数据一致,无需写入。
一种可选的实施方式举例,当该内存脏页中脏子页总量低于某个阈值时(例如4KB),则调用字节写接口,下发多个字节级写入请求,将多个脏子页数据写入SCM芯片中;在脏子页总 量大于该阈值时,使用页级(例如8KB)写入接口,下发一个页级写入请求,将该内存脏页写入闪存芯片中。
(2)IO调度
本申请实施例还提供了一种小IO写优先调度方法,可以由控制器122处理器执行。例如,当上述文件/块系统下发页级写请求和字节级写请求后,系统内核中的IO调度器还可以为字节级的写请求(小IO写)设置更高的优先级,来提升小IO写性能。由于小IO写具有更低的写时延,I/O调度中64B小IO写队列分配更高优先级,提升小IO处理性能,减少IO排队时延。在大小IO共存的场景下,本申请实施例提出的混合SSD可以实现更优的IOPS(Input/Output Operations Per Second)性能。
步骤830、硬盘134-1主控制器211接收字节级或页级的写入请求。
主控制器211接收到控制器122发送的写入请求,该写入请求中携带了待写入数据的起始地址、数据长度以及待写入的数据。对于起始地址,本领域技术人员通常称作逻辑块地址(logical block address,LBA),在本申请实施例中简称为逻辑地址。控制器122可以通过逻辑块地址LBA,访问硬盘中的对应物理地址(Physical Block Address,PBA),实现数据的读取和写入。
对于页级写入请求,该请求中携带的数据长度是等于闪存芯片中一个满页的大小。
对于字节级写入请求,该请求中携带的数据长度是小于闪存芯片中一个满页的大小。一种可能的实施方式,该数据的长度是SCM芯片中一个最小管理单位的长度,例如64B。这种情况下,可以由控制器122完成写入请求拆分,例如前述文件系统可以将页写入将拆分,下发一个或多个字节级的写操作。
可选的,该数据长度还可以是SCM芯片中最小管理单位大小的倍数大小,例如128B、512B等。这种情况下,可以由主控制器211将该数据拆分成多个最小管理单位写入SCM芯片中。步骤840、主控制器211将字节级数据写入SCM芯片中持久化,或将页级数据写入闪存芯片中持久化。
对于写请求,SSD硬盘主控制器211在接收到控制器122发送的写入请求后,会根据写入请求的不同,将该数据写入SCM存储器或Flash存储器,并更新Flash和SCM的索引。一种可能的实施方式,主控制器211根据到达数据的长度,决策数据写入SCM芯片还是闪存芯片,即小IO写入SCM芯片中,大IO写入闪存芯片中。一种可选的实施方式,SSD主控制器还可以直接根据控制器122写入的接口类型,确定数据写入SCM芯片还是Flash芯片。
步骤850、主控制器211更新SCM翻译层或闪存翻译层中的索引。
本申请实施例中,SCM芯片中的索引是按照“子页”(例如64字节)粒度管理的,采用图7中的两级翻译层:即先根据逻辑地址lba找到一个根页,然后可以在根页中在查找到所有子页地址,每个子页地址对应SCM存储器上的一个物理地址,该物理地址PBA存储的数据大小可以为SCM芯片中的缓存行的大小,具体内容请参照前文相关描述,此处不再赘述。
一种可能的实施方式,图7中页级索引表可以使用哈希表(Hash table)结构实现,哈希表中包括页地址(key)和子页表(value),哈希表内部的子页表的查找采用基数树(Radix tree)或红黑树(Red–black tree)实现。
在本申请实施例中,闪存芯片是按照页粒度(如8KB)管理,可以采用传统页级/块级/块页混合管理算法,此处不再详细介绍。例如,硬盘134内部也记录了LBA到PBA的映射(即闪存翻译层FTL),主控制器211可以根据LBA在闪存芯片中找到对应PBA上的页数据。
为了进一步介绍本申请实施例1中涉及的写入操作,下面给出具体举例:
A.字节级写入
硬盘134提供字节级写入接口,当主控制器211接收到一个字节级写入请求(例如为64B),主控制器211会先识别出这个请求的数据长度小于一个页,判断出其应该写入SCM中。假设,主控制器211根据图7中的两级翻译层,找到了该64B的子页数据对应的根页和子页索引。接着,主控器211查询SCM芯片212中的存储容量是否还有剩余,如果已经写满,则需要先执行数据迁移(TierDown)的操作,确保SCM上有足够的空间可以写入,数据迁移操作本质就是将SCM芯片中的数据迁移入Flash芯片中或将Flash中的数据迁移入SCM芯片中,参见后文详细介绍。然后,图10(a)是本申请提供的一种的字节级写入的方法流程示意图,具体的:
步骤a1、向SCM芯片中写入该字节级数据。
将这个64字节大小的子页数据写入SCM芯片(SCM芯片213)中持久化,记录其物理地址为PA1。
步骤a2、更新SCM翻译层的索引。
参考图7,根据逻辑地址lba求得的根页地址和子页地址。查询SCM的两级翻译层中的哈希表(即页级索引表)中,是否存在查询该根页,以及该根页下是否存在该子页:如果未查询到,说明该数据为新写入的子页数据,需要把对应的根页地址和子页地址条目添加到哈希表中,其中子页地址条目<offset,paddr>包括:子页地址(offset)和SCM物理存储空间地址(paddr)。如果查询到了,则根据对应的8KB页地址的根页地址,修改哈希表中第子页表中的paddr,让其指向新写入64B数据的SCM地址PA1。具体描述请参见前文相关描述,此处不再赘述。
可见,在整个64字节的写入和更新索引步骤中,可以不对于Flash芯片212中的页数据和索引有任何的修改。
步骤a3、可选的,将该字节级数据写入DRAM芯片中缓存。
可选的,图9是本申请实施例1提供另一种硬盘134-1的结构示意图,硬盘134-1还包括动态随机存取存储器(Dynamic Random Access Memory,DRAM)芯片215。该芯片215可以用于缓存数据。前述步骤将前述子页数据写入SCM后,主控制器211检查该子页所在8KB页是否存在DRAM芯片215中。如果存在,则把该子页数据更新到DRAM芯片215中作为副本,以支持后续读取,从而提升硬盘的数据读取效率。否则,不更新DRAM数据。
本申请实施例1还提供了一种页级写入操作的方法,可以应用于图3中的系统120中,下面给出具体举例:
B.页级写入
硬盘134提供页级写入接口,当主控制器211接收到一个页级写入请求(例如为8KB),
主控制器211会先识别出这个请求的数据长度等于一个页,判断出其应该写入Flash芯
片中。图10(b)是本申请实施例提供的一种页级写入的方法流程示意图,执行如下步骤:
步骤b1、向Flash芯片写入该页级数据。
首先把8KB的数据写入到Flash存储器新分配的空闲页中,写入的物理地址为PA2。确定数据写入成功后执行下面步骤。如果写入失败,则直接返回写入失败给控制器122,不执行后续步骤;
步骤b2、更新SCM翻译层索引。
查找该8KB页地址在SCM索引的哈希表是否存在有对应的根页:如果存在,则需要把该根页下所有子页的索引和数据全部删除,确保该8KB页下的所有64B子页数据在SCM中无法 被读取。子页数据从SCM索引删除后,空间被回收,以支持后续小IO写入。如果不存在,则不需要对SCM的两级翻译层进行任何操作,直接进入下一个步骤。
步骤b3、更新闪存翻译层索引。
当确定上述两个步骤都成功后,更新该页在闪存翻译层上对应的索引,即将页逻辑地址指向的物理地址PBA修改为PA2,向上返回数据写入页数据成功。
步骤b4、可选的,将该页级数据写入DRAM芯片。
前述步骤将数据写入闪存后,主控制器211还可以把该8KB页数据写入到DRAM芯片215中作为副本进行缓存,以支持后续读取。
本申请实施例1还提供了一种页级的读操作的方法,可以应用于图3中的系统120中,下面给出具体举例:
C.页级读取
硬盘134还可以提供页级读取接口,当主控制器211接收到一个页级读取请求(例如为8KB),主控制器211会识别出这个请求的数据长度等于一个页后,会发起相应读操作,主控制器211从SCM和Flash中读取数据。
图11是本申请提供的一种页级读取的方法流程示意图,具体步骤如下:
步骤c1、可选的,查询并读取DRAM芯片中数据。
一种可能的实施方式,当在SCM和闪存中写入数据后,都将数据副本更新到DRAM的情况时(如步骤a3和步骤b4所述),数据读取时可能已经缓存在DRAM芯片215中,即DRAM芯片同时作为SCM和Flash的读缓存。
可选的,在主控器211读取数据时,可以先去查询DRAM芯片215中是否存在待读取的页数据。如果存在该数据,则直接从DRAM芯片中读取该数据的缓存副本,并返回读取成功的响应,无需进行后面的步骤,有效提升了读取效率。如果不存在该数据,则进入下一步骤。
步骤c2、查询SCM芯片和闪存芯片是否存在该数据,根据查询结果,执行步骤c3或步骤c4或步骤c5。
具体的:查询SCM翻译层和闪存翻译层中是否存在该8KB页地址的索引,即根据逻辑地址在哈希表中查询对应的页地址。根据页级索引表中页地址的存在情况,主控制器211会依次执行以下的步骤:
当在SCM中查询到页对应的所有(子页)索引可以指向一个完整的页时,则执行步骤c3;否则,
当在SCM和Flash中都找查询到对应的页索引,则执行步骤c4;或,
当在SCM中未查询到对应的根页节点,在Flash中查询到该页的索引地址,执行步骤c5;或,
当在SCM和Flash中都未找查询到对应的页索引,则返回读取失败。
步骤c3、从SCM芯片中读取数据并返回。
主控制器211会查询该8KB页节点对应的64B的子页索引,并根据索引读取SCM上存储的子页数据。由于SCM芯片中所有子页能凑满一个8KB整页,SCM读请求执行后,就将整页数据返回给控制器122,不用等待Flash读请求。待Flash读操作执行完成后,数据直接丢弃即可;该场景可能出现于用户多次写入不足8KB页的数据且未触发过垃圾回收或数据迁移的情况。
步骤c4、从SCM芯片和闪存芯片中读取数据,合并后返回。
控制器211会同时在SCM和Flash的索引中读取数据。在确定在两种介质中读取数据成 功后,需要对两个数据进行合并操作。一种实施方式举例:遍历SCM中8KB页地址对应的64B子页后,查询到该8KB页中第1-3个子页存储在SCM中,那么将子页1-3中的数据替换从SSD中读取到8KB页中相应位置0-2对应的数据(具体的,Flash中8KB页一共有8KB/64B=192个子页,一个Flash页可以存储192个SCM子页数据),将合并后的完整8KB页数据返回给控制器122。该场景出现于用户之前写入过不足8KB的数据且未触发过GC/TierDown的情况。
步骤c5、从闪存芯片中读取数据并返回。
主控制器211不会从SCM存储器中读取任何数据,直接把Flash中的8KB页数据读取后返回给控制器122。该场景出现于用户之前从未写入小于8KB页的数据或者已经触发过垃圾回收或数据迁移(TierDown)的情况。
图12是本申请提供的另一种页级读取的方法流程示意图,与图11的区别在于:图11方法的前提是需要将写入SCM和Flash中数据的副本都写入DRAM芯片中(即执行a3和b4)。而图12的方法的前提是只将写入Flash的数据副本写入到DRAM芯片中(即执行步骤b4),而在SCM中持久化数据后,不将数据写入DRAM中(不执行步骤a3)。
换句话说,图11的DRAM芯片是作为Flash和SCM的读缓存,图12中DRAM芯片仅是作为Flash的读缓存。因此图12的方法在读取SCM中数据时,无需先读取DRAM中缓存数据。图12的具体步骤如下:
步骤c1'、从SCM芯片中读取该数据,如果是满页数据则返回。主控制器211会读取查询到的页节点下所有64B的子页,并根据索引读取SCM上存储的子页数据。如果读取的数据能凑满一个8KB整页,就将整页数据返回给控制器122,不执行后续步骤;否则,执行步骤c2'。
步骤c2'、查询闪存芯片和DRAM缓存中是否存在该数据,根据查询结果,执行步骤c3'或步骤c4'。
具体的:查询闪存芯片和DRAM芯片中缓存的数据中是否存在该8KB页数据,根据查询到的情况,主控制器211会依次执行以下的步骤:当在DRAM缓存芯片中查询到该整页数据,则执行步骤c3';或,当在Flash芯片中查询到该整页数据,则执行步骤c4'。
步骤c3'、从DRAM缓存中读取数据,与步骤c1'中读取的数据合并后返回。在这种情况下,待读取的页数据的副本在DRAM芯片中缓存过,并且该页的子页数据在之后写入了SCM中。从DRAM缓存中读取到该页数据后,将其与从SCM芯片中读取的子页数据进行合并,并将合并后的数据返回,详细内容和前述步骤c4类似,此处不再赘述。
步骤c4'、从Flash芯片中读取数据,与步骤c1'中读取的数据合并后返回。参考前述步骤c4中内容,此处不再赘述。
本申请实施例1还提供了一种数据删除的方法,下面给出具体举例:
D.数据删除
当控制器122发起一个安全删除(如,TRIM)命令时,主控制器211会调用硬盘的删除命令实施指定页的安全删除和安全擦除。首先,主控制器211需要查询SCM芯片和Flash芯片的翻译层中是否存在该页地址的索引。如果存在,对于Flash中数据,需要将该页所在的块(block)中其他有效页全部迁移到其他块中,再擦除该块数据。对于SCM芯片中数据,则需要删除SCM翻译层中对应根页下所有子页的数据,对旧地址中的数据置零。
当控制器122发起一个的是普通删除操作时,只需要将指定页地址从SCM和Flash的翻译层中的索引删除。
随着存储系统120的写入的Flash中的数据越来越多,修改的数据也越来越多,很多数据都变成了无效数据。无效数据(也称垃圾数据),是指没有任何映射关系指向的数据,反之 则为有效数据。这些无效数据占用不少存储空间,有必要进行垃圾回收(Garbage Collection,GC)。
本申请实施例1还提供了相应的垃圾回收的方法,可以应用于图3的系统120中。一种可能的实施方式,垃圾回收的操作由主控制器211执行,下面给出具体举例:
E.垃圾回收
当Flash空白页较少时,由主控制器211发起的垃圾回收操作(GC),从Flash芯片全部块中选择一个失效页最多的块,把该块中的有效Flash页,以及存储在SCM芯片中属于该有效Flash页的部分子页也一并迁移到Flash芯片中新分配的页中。图13(a)是本申请实施例1提供的一种垃圾回收的方法流程图,步骤如下:
步骤e1、确定待回收的块中的有效页。
例如,由主控制器211从Flash芯片的块中选择一个失效页最多的块,把该块中的有效Flash页读取出来,迁移到其他块中。
步骤e2、读取SCM芯片中子页数据。
把有效页(例如逻辑地址为LA5)的子页数据从SCM存储器中读取出来。例如,通过SCM中的指定页的子页索引查询到了3个子页数据,则将这三个子页数据读取到控制器211的内存中。
步骤e3、读取闪存芯片中页数据。
把指定页LA5的数据从Flash中读取到控制器211的内存中。如果步骤d2中的SCM是满页数据,则本步骤不需要读取从Flash中读取该页的数据,全部使用SCM芯片的数据。
步骤e4、合并从SCM芯片和Flash芯片中读取的数据。
在主控制器211中,将前述SCM芯片中读取的子页数据,与Flash中页中读取的数据进行合并操作。如果步骤d2中的SCM芯片是满页数据,则本步骤不需要进行合并,直接进入步骤e5写入数据。
步骤e5、向Flash芯片写入新数据,更新索引。
将合并后的数据写入Flash中新分配的存储位置PA6,在FTL中创建LA5新的索引(指向PA6),即将并删除SCM中的两级索引以及Flash中的旧索引。最后,擦除待回收的块中的数据。
在SCM空间达到或接近饱和时,为了确保SCM中有足够的空间可以继续提供字节级的写入,需要将SCM中的数据迁移到Flash中。例如,主控制器211定时查询SCM的空闲容量空间时,如果发现SCM的已用容量超过某个阈值,例如80%的时候,则需要启动子页迁移的后台任务。
因此,本申请实施例一还提供了数据迁移的方法,可以应用于图3的系统120中。一种可能的实施方式,数据的操作由主控制器211执行,下面给出具体举例:
F.数据迁移操作
一种可能的实施方式,为了保证性能最优,可以将SCM中选取一个聚合度最高的页,即子页数量最多的页,将该页淘汰出SCM空间。此外,数据迁移算法还可以考虑时间属性,时间最久的页优先被选中。图13(b)是本申请实施例提供的一种数据迁移的方法流程图,步骤如下:
步骤f1、确定SCM芯片中待迁移的页。
首先,遍历哈希表中存在的所有根页,统计所有根页中存在的有效子页数量。接着,计 算每个根页的更新时间间隔,即把当前启动任务的时间与页最后一次更新时间的差值。最后,对两者做一个比例的加权,例如:迁移度量=更新时间间隔*W 1+子页数量*W 2,W 1和W 2是这两个维度的权重,计算出最需要迁移的N个页,即迁移度量最大的N个页。N的大小根据SCM的水位可以设置为可调整,SCM已用容量越多,则单次需要流动的子页也需要相应增加。假设,本步骤确定了根页A为本次待迁移页A。
步骤f2、读取SCM芯片中子页数据。
把待迁移页A的子页数据从SCM芯片中读取出来。参见步骤f2中的内容,此处不再赘述。
步骤f3、读取闪存芯片中页数据。
在控制器211中,对于每一个需要迁移8KB页A,确定在SCM中页A是否为满页:
当步骤f2读取的SCM数据为满页A,则不需要从Flash中进行读取页的操作,全部使用SCM的数据;或,
当步骤f2读取的SCM数据不是满页,假设缺少A页其中一个子页C的数据,则根据待迁移的页的逻辑地址(根页地址和子页地址),把该页的数据B从Flash中读取到控制器211的内存中。然后,把B页的数据与SCM中的页A的所有子页进行合并操作。具体的,获取B页的数据中属于该子页C的数据读出来,和SCM中的页A的所有子页合并成一个整页数据。
步骤f4、合并从SCM和闪存中读取的数据。
具体的:使用SCM中读取的页A的子页数据,覆盖Flash中页数据B中相同位置的数据,即使用完整的A页数据加上页B中一个64B的数据,构成新的8KB页数据。参照前文合并操作,此处不再赘述。
步骤f5、向闪存芯片中写入数据,更新索引。
控制器211把上述步骤f3中合并数据写入闪存中新分配的页地址,写入成功后,修改Flash中该8KB页对应的索引,让其指向新的位置。索引更新成功后,控制器211删除SCM和Flash中该8KB下所有子页对应的数据以及相关索引,完成数据迁移TierDown的操作流程。
此外,关于DRAM缓存淘汰策略:当DRAM容量达到一定水位时,触发DRAM淘汰策略,淘汰算法可以选择使用LRU(Least Recently Used,最近最少使用)、LFU(Least Frequently Used,最不经常使用)等,此处不再详细描述。
图14是本申请实施例提供的另一种存储系统120的结构示意图,可以应用于图1或图2的存储架构中,本申请实施例中的方法也可以应用于图14所示的存储系统120中。
图14中,存储系统120包括两种类型的硬盘:硬盘134-2为基于闪存介质的存储设备,硬盘134-3为基于SCM介质的存储设备,这两种硬盘也就是图1或图2中硬盘134。与图3所示的存储系统120的不同之处在于闪存芯片212和SCM芯片213是位于两个硬盘134中的。
硬盘134包括两种:硬盘134-2为基于闪存的存储设备,硬盘134-3为基于SCM的存储设备。其中,硬盘134-2中包括一个或多个闪存芯片212,硬盘134-3中包括一个或多个SCM芯片213,闪存芯片212和SCM芯片213用于存储写入硬盘中的数据。
一种可能的实施方式,硬盘134-2和134-3可以是裸设备,即原来由硬盘控制器211实现的相关功能,例如翻译层、垃圾回收、数据流动、数据合并等操作,可以移交给控制器122来实现。一种可能的实施方式中,硬盘134-2和134-3可以包含主控制器(图14中未示出),只不过它们只具备一些诸如发送/接收指令等的简单功能,仍然要将复杂的数据处理功能交给控制器122实现。
实施例2:
基于图14中的存储系统120,图11是本申请实施例2给出的一种将数据写入存储设备 的方法流程示意图。
本实施例2在SCM存储设备和闪存存储设备的驱动层之上提供统一的软件管理层,该驱动层安装在控制器122中,该软件管理层向操作系统提供字节接口和页接口,把字节级数据写入SCM存储设备,将页级数据写入闪存存储设备中。一种可能的实施方式,应用程序不会感知前述数据写入位置。该软件层的功能可以由控制器122的处理器执行,该软件管理层的功能还可以用于管理SCM设备和闪存设备的翻译层。此外,垃圾回收操作和数据迁移操作放到控制器122里面执行,需要消耗一定的控制器122的CPU资源。
图15是本申请实施例2提供的一种将数据写入存储设备的方法流程示意图,该方法可应用在图1或图2或图14所示的存储系统中,步骤如下:
步骤1510、控制器122接收应用服务器的业务请求。
控制器122接收来自应用服务器或存储系统120的内部应用生成的请求,参见前文类似描述,此处不再赘述。
步骤1520、控制器122根据业务请求,向硬盘134-2发送页级的写入请求,或向硬盘134-3发送字节级写入请求。
控制器122接收到访问请求后,控制器122的处理器123会通过硬盘134-2驱动软件提供的页级接口,和硬盘134-3的驱动软件提供的字节级接口,将页级或页级的数据写入请求指令,分别发送至硬盘134-2或硬盘134-2中。
一种可能的实施方式,当写入请求中的数据大小低于一个Flash页时,前述软件管理层把写入指令发送给SCM设备134-3的驱动层,该驱动层将写入指令发送到SCM存储设备134-3中;或,当写入请求中的数据大小等于一个Flash页时,前述软件管理层把写入指令发送给闪存设备134-2的驱动层,该驱动层将写入指令发送到闪存设备134-2中。
一种可能的实施方式,在将上述数据请求发送至硬盘134-1中之前,还涉及内核在的文件系统和IO调度器的优化处理,进一步提升小IO(字节级)的写入效率,该部分请参照前文步骤720中的描述,此处不再赘述。
步骤1530、硬盘134-2或硬盘134-3接收到写入请求。
硬盘134-2或硬盘134-3分别接收到页级写入请求或字节级写入请求。例如,硬盘134-2通过页级写入接口,接收到8KB的写入数据,硬盘134-3通过字节级写入接口,接收到64B的写入数据。请求中携带的内容请参照前文步骤730中的描述,此处不再赘述。
步骤1540、将字节级数据写入SCM芯片中持久化,或将页级数据写入闪存芯片中持久化。
硬盘134-2是闪存设备,硬盘134-2的主控制器(图中未示出)接收到写请求后,将页数据持久化写入闪存芯片中。硬盘134-3是SCM设备,硬盘134-3的主控制器(图中未示出)接收到写请求后,硬盘134-3将字节级数据持久化写入SCM芯片中。
步骤1550、控制器122更新SCM翻译层或闪存翻译层中的索引。
本实施例2中的SCM翻译层或闪存翻译层和实施例1的步骤750中相同,即SCM芯片使用图6中两级翻译层,闪存芯片采用传统的FTL,相关内容请参照前文介绍,此处不再赘述。需要说明的是,本实施例2中翻译层中的索引更新和管理上移到控制器122中完成。
实施例2也提供诸如实施例1中的A、B、C、D、E和F等方法,即字节级写入、页级写入、页级读取、数据删除、垃圾回收和数据迁移等。具体步骤请参见实施例1中的描述,此处不再赘述。相对于实施例1,本实施例2在本步骤的区别在于,诸如数据迁移、垃圾回收、数据合并等操作,也是由控制器122完成。在执行实施例1和实施例2中的方法时,都可通过相应控制器(122或211)中处理器调用指令实现的。
一种可选的实施方式,本实施例2中由控制器122执行的步骤,诸如翻译层的管理、数据流动、垃圾回收、数据合并等操作,还可以由图3中的控制器单元131执行。可选的,当硬盘框130内部可以不具有控制单元131时,还可以由前述的智能网卡执行。
图16是本申请实施例提供的一种装置1600结构示意图,可应用于存储系统120中,用以实现本申请实施例1和实施例2中的方法。该装置可以包括:字节级写入模块1601,页级写入模块1602,页级读取模块口1603、数据迁移模块1604、垃圾回收模块1605、翻译层模块1606,具体的:
字节级写入模块1601,用于通过字节级写入接口,接收第一写入请求,该第一写入请求中携带待写入的第一数据,该第一数据的长度小于一个闪存页的大小。
可选的,将前述第一数据的写入所述固态硬盘的SCM芯片中进行持久化存储。
可选的,在前述将第一数据的写入所述固态硬盘的SCM芯片中进行持久化存储之后,将该第一数据的副本缓存在前述DRAM芯片中。
页级写入模块1602,用于通过字节级写入接口,接收第二写入请求,该第二写入请求中携带待写入的第二数据,该第一数据长度等于一个闪存页的大小。
可选的,将该第二数据写入前述固态硬盘的闪存芯片中进行持久化存储。
可选的,在前述将第一数据的写入所述固态硬盘的SCM芯片中进行持久化存储之后,还包括:将该第一数据的副本缓存在前述DRAM芯片中。
可选的,当前述SCM芯片中存储的数据包含与前述第二数据的至少一部分数据相同的数据时,删除前述SCM芯片中存储的与前述的至少一部分数据相同的数据。
页级读取模块1603,通过该页级读取接口接收第一读取请求以读取前述第二数据,该第一读取请求中携带该第二数据的逻辑地址;根据该第二数据的逻辑地址,判断该SCM芯片中是否已存储与该第二数据的根页地址相同的第三数据;当前述的SCM芯片中已存储该第三数据时,根据该第二数据的逻辑地址从该SCM芯片中读取该第三数据,以及从该闪存芯片中读取的该第二数据;将前述读取到的所述第三数据和所述第二数据合并成整页数据后返回。
可选的,根据前述第二数据的逻辑地址从前述SCM芯片中读取该第三数据,包括:根据该第二数据的逻辑地址,获得该第二数据的根页地址;根据该第二数据的根页地址,获得该第三数据的子页地址;根据该第三数据的子页地址,获得该第三数据的物理地址,并根据该第三数据的物理地址读取所述第三数据。
可选的,当从前述SCM芯片中读取前述第一数据时:判断前述DRAM芯片中是否存在第一数据的副本;如果存在,则从前述DRAM芯片中读取所述第一数据并返回;否则,从前述SCM芯片中读取该第一数据。
可选的,当从前述闪存芯片中读取前述第二数据时:判断前述DRAM芯片中是否存在第二数据的副本;如果存在,则从前述DRAM芯片中读取前述第二数据并返回;否则,从前述闪存芯片中读取该第二数据。
数据迁移模块1604,用于将该第一数据与其他数据以页为粒度迁移至前述闪存芯片中,该所述其他数据与该第一数据属于同一个根页,该其他数据在迁移之前位于前述SCM芯片或前述闪存芯片中。具体的:
确定SCM芯片中需要迁移的第一根页,该SCM芯片位于该闪存固态硬盘中;从该SCM芯片和/或闪存芯片中读取所述第一根页对应的数据,该闪存芯片位于前述闪存固态硬盘中;将所述第一页面对应的数据合并或直接写入闪存芯片中。
可选的,当从SCM芯片中读取第一根页对应的数据为整页数据,则从SCM芯片读取该第 一根页对应的数据,并将其写入前述闪存芯片中。
可选的,当从SCM芯片中读取第一根页对应的数据为不是整页数据,则分别从SCM芯片和闪存芯片中读取该第一根页对应的数据,并将其合并后写入前述闪存芯片中。
可选的,删除该SCM芯片中的所述第一根页对应的数据。
可选的,删除该第一根页对应的索引,并建立新索引。
可选的,根据页面聚合度确定SCM芯片中需要迁移的根页,所述页面聚合度指SCM芯片中根页面包含的子页面的数量,所述根页面由多个子页面组成。
可选的,前述页面聚合度越高的根页面或距离上次更新时间越久的根页面,会被优先确定为前述需要迁移的根页。
垃圾回收模块1605,用于确定前述闪存芯片中需要回收的第一块,该第一块至少包括第一部分数据和第二部分数据,其中,该第一部分数据位于前述SCM芯片中,该第二部分数据位于该闪存芯片中,并且该第一部分数据和该第二部分数据是有效数据;从该SCM芯片中读取所述第一部分数据,从所述闪存芯片中读取所述第二部分数据;将该第一部分数据和该第二部分数据写入闪存芯片中的第二块中;擦除前述第一块。
翻译层模块1606,用于根据该第一写入请求中还携带该第一数据的逻辑地址,获得所述第一数据的根页地址和子页地址;保存该根页地址和该子页地址的索引关系,以及该子页地址和所述第一物理地址之间的索引关系,第一物理地址为该第一数据写入前述SCM芯片进行持久化的物理地址。
可选的,当根据前述第二数据的逻辑地址从前述SCM芯片中读取该第三数据时,本模块可以用于:根据该第二数据的逻辑地址,获得该第二数据的根页地址;根据该第二数据的根页地址,获得该第三数据所的子页地址;根据该第三数据的子页地址,获得该第三数据的物理地址,并根据该第三数据的物理地址读取所述第三数据。
上述各个模块中方法的详细内容,请参照实施例1和实施例2中的相关描述,此处不再赘述。实际中,装置1600的模块划分方式还可以由其他方式,图16仅作为示例。
其中,字节级写入接口、页级写入接口、页级读取接口这三个接口可以是软件编程接口。该软件编程接口可以识别不同的操作指令及其内容,以实现字节级写入、页级写入、页面级别读取的功能。
一种可能的实施方式,字节级写入接口和页级写入接口可以用于识别同一种操作指令,例如都是同一种写入指令,换句话说,这两种接口统称为写入接口。它们的区别在于指令中携带的数据长度不同,字节级写入接口识别的数据长度是小于f l ash页的,而页级写入接口识别的是数据长度是等于flash页的。可选的,字节级写入接口和页级写入接口也可以是不同的写入指令,通过两种写入接口实现。
一种可选的实施方式,字节级写入接口、页级写入接口、页级读取接口也可以是通过硬件方式实现的接口,例如:通过不同的物理接口和数据线和上层设备通信,实现不同的功能(字节级写入、页级写入、页面级别读取),再例如:通过设备在不同的硬件模块,来实现不同的功能。
可选的,这三类接口还可以是通过软件硬件结合的方式实现的,具体实现方式本发明不做限制。
需要说明的是,本申请所提供的实施例仅仅是示意性的。所属领域的技术人员可以清楚的了解到,为了描述的方便和简洁,在上述实施例中,对各个实施例的描述都各有侧重,某 个实施例中没有详述的部分,可以参见其他实施例的相关描述。在本发明实施例、权利要求以及附图中揭示的特征可以独立存在也可以组合存在。在本发明实施例中以硬件形式描述的特征可以通过软件来执行,反之亦然。在此不做限定。
本领域技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的全部或部分步骤;而前述的存储介质包括:只读存储器(read-only memory,ROM)、随机存取存储器(random-access memory,RAM)、磁盘或者光盘等各种可以存储程序代码的介质。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质、或者半导体介质(例如固态硬盘(Solid State Disk,SSD)等。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,在没有超过本申请的范围内,可以通过其他的方式实现。例如,以上所描述的实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
另外,所描述装置和方法以及不同实施例的示意图,在不超出本申请的范围内,可以与其它系统,模块,技术或方法结合或集成。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电子、机械或其它的形式。

Claims (29)

  1. 一种将数据写入固态硬盘的方法,其特征在于,所述方法包括:
    提供字节级写入接口;
    通过所述字节级写入接口接收第一写入请求,所述第一写入请求中携带待写入的第一数据,所述第一数据的长度小于一个闪存页的大小;
    提供页级写入接口;
    通过所述页级写入接口接收第二写入请求,所述第二写入请求中携带待写入的第二数据,所述第二数据长度等于一个闪存页的大小。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述第一数据写入所述固态硬盘的存储级内存SCM芯片中进行持久化存储;以及
    将所述第二数据写入所述固态硬盘的闪存芯片中进行持久化存储。
  3. 根据权利要求2所述的方法,其特征在于,
    所述第一数据的长度大于或等于所述SCM芯片的最小管理单元;
    所述将所述第一数据写入存储级存储SCM芯片中持久化存储包括:以所述最小管理单元为粒度存储所述第一数据。
  4. 根据权利要求2或3所述的方法,其特征在于,所述第一写入请求中还携带所述第一数据的逻辑地址,所述第一数据写入所述SCM芯片的地址为第一物理地址;所述方法还包括:
    根据所述第一数据的逻辑地址,获得所述第一数据的根页地址和子页地址;
    保存所述根页地址和所述子页地址的索引关系,以及所述子页地址和所述第一物理地址之间的索引关系。
  5. 根据权利要求2-4任一项所述的方法,其特征在于,在将所述第二数据写入所述闪存芯片中持久化存储之后,所述方法还包括:
    当所述SCM芯片中存储的数据包含与所述第二数据的至少一部分数据相同的数据时,删除所述SCM芯片中存储的与所述的至少一部分数据相同的数据。
  6. 根据权利要求2至5任一项所述的方法,其特征在于,所述方法还包括:
    提供页级读取接口;
    通过所述页级读取接口接收第一读取请求以读取所述第二数据,所述第一读取请求中携带所述第二数据的逻辑地址;
    根据所述第二数据的逻辑地址,判断所述SCM芯片中是否已存储与所述第二数据的根页地址相同的第三数据;
    当所述SCM芯片中已存储所述第三数据时,根据所述第二数据的逻辑地址从所述SCM芯片中读取所述第三数据,以及从所述闪存芯片中读取的所述第二数据;
    将所述读取到的所述第三数据和所述第二数据合并成整页数据后返回。
  7. 根据权利要求6所述的方法,其特征在于,根据所述第二数据的逻辑地址从所述SCM芯片中读取所述第三数据,包括:
    根据所述第二数据的逻辑地址,获得所述第二数据的根页地址;
    根据所述第二数据的根页地址,获得所述第三数据的子页地址;
    根据所述第三数据的子页地址,获得所述第三数据的物理地址,并根据所述第三数据的物理地址读取所述第三数据。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,在将所述第二数据的写入所述闪存芯片持久化存储之后,所述方法还包括:
    将所述第二数据的副本缓存在所述固态硬盘的动态随机存取存储器DRAM芯片中。
  9. 根据权利要求8所述的方法,其特征在于,在将所述第一数据的写入所述固态硬盘的SCM芯片中进行持久化存储之后,所述方法还包括:
    将所述第一数据的副本缓存在所述DRAM芯片中。
  10. 根据权利要求1-9所述的方法,其特征在于,所述方法还包括:
    将所述第一数据与其他数据以页为粒度迁移至所述闪存芯片中,所述其他数据与所述第一 数据属于同一个根页,所述其他数据在迁移之前位于所述SCM芯片或所述闪存芯片中。
  11. 一种固态硬盘,其特征在于,所述固态硬盘中包括主控制器、多个闪存芯片以及一个或多个SCM芯片;所述主控制器执行计算机指令以实现权利要求1-10任一项所述的方法。
  12. 根据权利要求11所述的固态硬盘,其特征在于,所述SCM芯片与所述闪存芯片连接在不同通道控制器上,其中,所述通道控制器位于所述主控制器中。
  13. 一种将数据写入存储设备的方法,其特征在于,所述方法包括:
    控制器向存储级内存SCM存储设备发送第一写入请求,所述第一写入请求中携带待写入的第一数据,所述第一数据的长度小于一个闪存页的大小;
    所述SCM存储设备通过字节级写入接口接收所述第一写入请求;
    所述控制器向闪存存储设备发送第二写入请求,所述第二写入请求中携带待写入的第二数据,所述第二数据长度等于一个闪存页的大小;
    所述闪存存储设备通过页级写入接口接收所述第二写入请求;
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    所述SCM存储设备持久化存储所述第一数据;
    所述闪存存储设备持久化存储所述第二数据。
  15. 根据权利要求14所述的方法,其特征在于,所述第一数据的长度大于或等于所述SCM存储设备的最小管理单元;所述SCM存储设备持久化存储所述第一数据包括:以所述最小管理单元存储所述第一数据。
  16. 根据权利要求14或15任一项所述的方法,其特征在于,所述第一写入请求中还携带所述第一数据的逻辑地址,所述第一数据写入所述SCM存储设备中的地址为第一物理地址;
    所述方法还包括:
    所述控制器根据所述第一数据的逻辑地址,获得所述第一数据的根页地址和子页地址;
    所述控制器保存所述根页地址和所述子页地址的索引关系,以及所述子页地址和所述第一物理地址之间的索引关系。
  17. 根据权利要求14-16任一项所述的方法,其特征在于,在所述闪存存储设备持久化存储所述第二数据之后,所述方法还包括:
    当所述SCM存储设备中存储的数据包含与所述第二数据的至少一部分数据相同的数据时,所述控制器删除所述SCM存储设备中存储的与所述的至少一部分数据相同的数据。
  18. 根据权利要求13-17任一项所述的方法,其特征在于,所述方法还包括:
    所述控制器向闪存存储设备发送第一读取请求以读取所述第二数据,所述第一读取请求中携带所述第二数据的逻辑地址;
    根据所述第二数据的逻辑地址,判断所述SCM存储设备中是否已存储与所述第二数据的根页地址相同的第三数据;
    当所述SCM存储设备中已存储所述第三数据时,根据所述第二数据的逻辑地址从所述SCM存储设备中读取所述第三数据,以及从所述闪存存储设备中读取所述第二数据;
    将所述读取到的所述第三数据和所述第二数据合并成整页数据后返回。
  19. 根据权利要求18所述的方法,其特征在于,根据所述第二数据的逻辑地址从所述SCM存储设备中读取所述第三数据,包括:
    所述控制器根据所述第二数据的逻辑地址,获得所述第二数据的根页地址;
    所述控制器根据所述第二数据的根页地址,获得所述第三数据的子页地址;
    所述控制器根据所述第三数据的子页地址,获得所述第三数据的物理地址,并根据所述第三数据的物理地址读取所述第三数据。
  20. 根据权利要求13-19任一项所述的方法,其特征在于,所述方法还包括:
    所述控制器将所述第一数据与其他数据以页为粒度迁移至所述闪存存储设备中,所述其他数据与所述第一数据属于同一个根页,所述其他数据在迁移之前位于所述SCM存储设备或所述闪存存储设备中。
  21. 一种存储系统,其特征在于,所述存储系统包括:控制器、SCM存储设备以及闪存存储设 备;
    所述控制器设备用于:
    发送第一写入请求,所述第一写入请求中携带待写入的第一数据,所述第一数据的长度小于一个闪存页的大小;
    发送第二写入请求,所述第二写入请求中携带待写入的第二数据,所述第一数据长度等于一个闪存页的大小;
    所述SCM存储设备通过字节级写入接口用于接收所述第一写入请求;
    所述闪存存储设备通过页级写入接口用于接收所述第二写入请求。
  22. 根据权利要求21所述的存储系统,其特征在于:
    所述SCM存储设备用于持久化存储所述第一数据;
    所述闪存存储设备用于持久化存储所述第二数据。
  23. 根据权利要求22所述的存储系统,其特征在于,所述第一数据的长度大于或等于所述SCM存储设备的最小管理单元;所述SCM存储设备具体用于以所述最小管理单元为粒度存储所述第一数据。
  24. 根据权利要求22或23任一项所述的存储系统,其特征在于,所述第一写入请求中还携带所述第一数据的逻辑地址,所述第一数据写入所述SCM存储设备中的地址为第一物理地址;
    所述控制器设备还用于:
    根据所述第一数据的逻辑地址,获得所述第一数据的根页地址和子页地址;
    保存所述根页地址和所述子页地址的索引关系,以及所述子页地址和所述第一物理地址之间的索引关系。
  25. 根据权利要求22-24任一项所述的存储系统,其特征在于,所述控制器设备还用于在所述闪存存储设备用于持久化存储所述第二数据之后:
    当所述SCM存储设备中存储的数据包含与所述第二数据的至少一部分数据相同的数据时,删除所述SCM存储设备中存储的与所述的至少一部分数据相同的数据。
  26. 根据权利要求21-25任一项所述的存储系统,其特征在于,所述控制器设备还用于:
    向闪存存储设备发送第一读取请求以读取第二数据,所述第一读取请求中携带所述第二数据的逻辑地址;
    根据所述第二数据的逻辑地址,判断所述SCM存储设备中是否已存储与所述第二数据的根页地址相同的第三数据;
    当所述SCM存储设备中已存储所述第三数据时,根据所述第二数据的逻辑地址从所述SCM存储设备中读取所述第三数据,以及从所述闪存存储设备中读取的所述第二数据;
    将所述读取到的所述第三数据和所述第二数据合并成整页数据后返回。
  27. 根据权利要求26所述的存储系统,其特征在于,所述根据所述第二数据的逻辑地址从所述SCM芯片中读取所述第三数据,包括:
    所述控制器根据所述第二数据的逻辑地址,获得所述第二数据的根页地址;
    所述控制器根据所述第二数据的根页地址,获得所述第三数据所的子页地址;
    所述控制器根据所述第三数据的子页地址,获得所述第三数据的物理地址,并根据所述第三数据的物理地址读取所述第三数据。
  28. 根据权利要求21-27任一项所述的存储系统,其特征在于所述控制器还用于:
    将所述第一数据与其他数据以页为粒度迁移至所述闪存存储设备中,所述其他数据与所述第一数据属于同一个根页,所述其他数据在迁移之前位于所述SCM存储设备或所述闪存存储设备中。
  29. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括程序指令,当所述程序指令在计算机或处理器上运行时,使得所述计算机或所述处理器执行权利要求1-10中任一项所述的方法。
    一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括程序指令,当所 述程序指令在计算机或处理器上运行时,使得所述计算机或所述处理器执行权利要求13-20任一项所述的由控制器执行的方法。
PCT/CN2022/077686 2021-04-08 2022-02-24 一种将数据写入固态硬盘的方法 WO2022213736A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22783806.7A EP4307129A1 (en) 2021-04-08 2022-02-24 Method for writing data into solid-state hard disk
US18/477,160 US20240020014A1 (en) 2021-04-08 2023-09-28 Method for Writing Data to Solid-State Drive

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110378320 2021-04-08
CN202110378320.0 2021-04-08
CN202110662875.8 2021-06-15
CN202110662875.8A CN115203079A (zh) 2021-04-08 2021-06-15 一种将数据写入固态硬盘的方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/477,160 Continuation US20240020014A1 (en) 2021-04-08 2023-09-28 Method for Writing Data to Solid-State Drive

Publications (1)

Publication Number Publication Date
WO2022213736A1 true WO2022213736A1 (zh) 2022-10-13

Family

ID=83545153

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/077686 WO2022213736A1 (zh) 2021-04-08 2022-02-24 一种将数据写入固态硬盘的方法

Country Status (3)

Country Link
US (1) US20240020014A1 (zh)
EP (1) EP4307129A1 (zh)
WO (1) WO2022213736A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115951841A (zh) * 2023-02-27 2023-04-11 浪潮电子信息产业股份有限公司 存储系统及创建方法、数据处理方法、装置、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662856A (zh) * 2012-04-27 2012-09-12 中国科学院计算技术研究所 一种固态硬盘及其存取方法
CN105630700A (zh) * 2015-04-29 2016-06-01 上海磁宇信息科技有限公司 一种具有二级缓存结构的存储系统及读写方法
US20200272566A1 (en) * 2019-02-21 2020-08-27 Hitachi, Ltd. Data processing device, storage device, and prefetch method
CN111625191A (zh) * 2020-05-21 2020-09-04 苏州浪潮智能科技有限公司 一种数据读写方法、装置及电子设备和存储介质
CN112035065A (zh) * 2020-08-28 2020-12-04 北京浪潮数据技术有限公司 一种数据写入方法、装置、设备及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662856A (zh) * 2012-04-27 2012-09-12 中国科学院计算技术研究所 一种固态硬盘及其存取方法
CN105630700A (zh) * 2015-04-29 2016-06-01 上海磁宇信息科技有限公司 一种具有二级缓存结构的存储系统及读写方法
US20200272566A1 (en) * 2019-02-21 2020-08-27 Hitachi, Ltd. Data processing device, storage device, and prefetch method
CN111625191A (zh) * 2020-05-21 2020-09-04 苏州浪潮智能科技有限公司 一种数据读写方法、装置及电子设备和存储介质
CN112035065A (zh) * 2020-08-28 2020-12-04 北京浪潮数据技术有限公司 一种数据写入方法、装置、设备及计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115951841A (zh) * 2023-02-27 2023-04-11 浪潮电子信息产业股份有限公司 存储系统及创建方法、数据处理方法、装置、设备和介质
CN115951841B (zh) * 2023-02-27 2023-06-20 浪潮电子信息产业股份有限公司 存储系统及创建方法、数据处理方法、装置、设备和介质

Also Published As

Publication number Publication date
EP4307129A1 (en) 2024-01-17
US20240020014A1 (en) 2024-01-18

Similar Documents

Publication Publication Date Title
US11893238B2 (en) Method of controlling nonvolatile semiconductor memory
CN108572796B (zh) 具有异构nvm类型的ssd
US20240134552A1 (en) Storage device that secures a block for a stream or namespace and system having the storage device
US8321639B2 (en) Command tracking for direct access block storage devices
US10359954B2 (en) Method and system for implementing byte-alterable write cache
US9645739B2 (en) Host-managed non-volatile memory
JP2013242908A (ja) ソリッドステートメモリ、それを含むコンピュータシステム及びその動作方法
WO2012050934A2 (en) Apparatus, system, and method for a direct interface between a memory controller and non-volatile memory using a command protocol
US11016905B1 (en) Storage class memory access
US20190303019A1 (en) Memory device and computer system for improving read performance and reliability
US20240020014A1 (en) Method for Writing Data to Solid-State Drive
CN115203079A (zh) 一种将数据写入固态硬盘的方法
WO2021035555A1 (zh) 一种固态硬盘的数据存储方法、装置及固态硬盘ssd
CN116364148A (zh) 一种面向分布式全闪存储系统的磨损均衡方法及系统
US20240053917A1 (en) Storage device, operation method of storage device, and storage system using the same
TWI755668B (zh) 用來在儲存伺服器中進行基於管線的存取管理的方法及設備
US20230120184A1 (en) Systems, methods, and devices for ordered access of data in block modified memory
WO2019148757A1 (zh) 非易失随机访问存储器及其提供方法
CN112558879A (zh) 一种提高固态盘内3D-flash性能的方法
CN111858401A (zh) 提供异构命名空间的存储设备及其在数据库中的应用
US20240176741A1 (en) Caching techniques using a two-level read cache
US11494303B1 (en) Data storage system with adaptive, memory-efficient cache flushing structure
WO2024060944A1 (zh) 键值存储方法及系统
WO2024108939A1 (zh) 一种多级映射框架、数据操作请求处理方法及系统
Vishwakarma et al. Enhancing eMMC using multi-stream technique

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22783806

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022783806

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022783806

Country of ref document: EP

Effective date: 20231011

NENP Non-entry into the national phase

Ref country code: DE