WO2020073233A1 - System and method for data recovery in parallel multi-tenancy ssd with finer granularity - Google Patents

System and method for data recovery in parallel multi-tenancy ssd with finer granularity Download PDF

Info

Publication number
WO2020073233A1
WO2020073233A1 PCT/CN2018/109650 CN2018109650W WO2020073233A1 WO 2020073233 A1 WO2020073233 A1 WO 2020073233A1 CN 2018109650 W CN2018109650 W CN 2018109650W WO 2020073233 A1 WO2020073233 A1 WO 2020073233A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
page
volatile memory
pages
blocks
Prior art date
Application number
PCT/CN2018/109650
Other languages
English (en)
French (fr)
Inventor
Shu Li
Ping Zhou
Yu Du
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Priority to CN201880098408.3A priority Critical patent/CN112823331B/zh
Priority to PCT/CN2018/109650 priority patent/WO2020073233A1/en
Publication of WO2020073233A1 publication Critical patent/WO2020073233A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/108Parity data distribution in semiconductor storages, e.g. in SSD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1032Reliability improvement, data loss prevention, degraded operation etc
    • G06F2212/1036Life time enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7205Cleaning, compaction, garbage collection, erase control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7207Details relating to flash memory management management of metadata or control data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7208Multiple device management, e.g. distributing data over multiple flash devices

Definitions

  • This disclosure is generally related to the field of data storage. More specifically, this disclosure is related to a system and method for data recovery in parallel multi-tenancy SSD with finer granularity.
  • a storage system or server can include volatile memory (e.g., dynamic random access memory (DRAM) and multiple drives (e.g., a solid state drive (SSD) ) .
  • a drive can include non-volatile memory for persistent storage (e.g., NAND flash) .
  • the memory in a server plays a crucial role in the performance and capacity of a storage system.
  • the system has no visibility into the data’s lifespan, nor into which data is updated and at what frequency.
  • data received from the host is considered equal, with no distinction between “hot” and “cold” data (i.e., frequently accessed data and not frequently accessed data, respectively) .
  • hot and cold data may be mixed together in read and write operations, an overhead in garbage collection may occur.
  • the system copies valid pages from the first NAND block into a new block (s) .
  • the first block is only “erased” after all valid data in the first block is copied into the new block (s) .
  • the large size of the superblock can result in a decreased efficiency in garbage collection and the overall organization of the NAND flash. This can result in a high write amplification, whereby the NAND bandwidth consumed by copying data can result in a decreased Quality of Service (QoS) and an increased latency.
  • QoS Quality of Service
  • One solution is to separate the data by its access frequency and create separate streams based on the access frequency, e.g., separate streams for hot data and cold data, or separate multi-stream regions in an SSD.
  • hot pages in a first stream have already expired because a more recent version has been written, but cold pages in a second stream are still valid.
  • this solution still results in a high write amplification, whereby the cold valid data must still be copied out in order to recycle a superblock during garbage collection.
  • this solution can result in an overdesign which affects the overall performance of the storage system. For example, if the system reserves a large number of blocks for future potential new streams, the free block pool becomes limited. Subsequently, if the system experiences intense random write operations, the performance of the storage system may suffer from decreased efficiency (e.g., sub-optimal performance) .
  • multi-stream regions in an SSD may provide some benefits, some issues remain, including a high write amplification, an increased latency, and a decreased efficiency.
  • One embodiment facilitates data recovery.
  • the system receives a request to write a first page of data to a non-volatile memory.
  • the system writes the first page to a first block in a first group of blocks of the non-volatile memory, wherein a number of blocks in a respective group is less than a number of blocks in a superblock, wherein data is written in a stripe one page at a time from a beginning block of the respective group to a last block of the respective group, wherein a stripe includes physical pages which each correspond to a sequentially ordered die of the non-volatile memory, and wherein the first block is a next sequentially available block in the first group of blocks.
  • the system generates, by a controller, a first incremental parity page based on at least the first page.
  • the system writes the first incremental parity page to a second block of the first group of blocks, wherein the second block is the next sequentially available block in the first group of blocks.
  • the system stores, by the controller, the first incremental parity page in a volatile memory buffer of the controller, thereby enhancing a garbage collection process based on a reduced granularity of a size of a group of blocks that is less than a size of a superblock.
  • the system receives a request to write a second page of data to the non-volatile memory.
  • the system determines that the first group of blocks is full.
  • the system writes the second page to a third block in a second group of blocks.
  • the system updates the first incremental parity page based on the second page.
  • the system stores, by the controller, the updated first incremental parity page in the volatile memory of the controller.
  • the system in response to detecting a condition which triggers recycling of the first block, the system: writes the first page to a new block; updates the first incremental parity page based on at least the first page, to remove the first page from a calculation of parity in the first incremental parity page; stores the updated first incremental parity page in the volatile memory of the controller; and releases the first block to a free block pool.
  • the system in response to detecting a condition which triggers recycling of the first block, the system: writes valid pages in the first block to one or more new blocks, wherein the valid pages do not include incremental parity pages; updates a set of incremental parity pages associated with the valid pages in the first block, to remove the valid pages from a respective calculation of parity in the set of incremental parity pages; stores the set of updated incremental parity pages in the volatile memory of the controller; maps, by a flash translation layer component, any incremental parity pages in the first block to new physical block addresses; and releases the first block to a free block pool.
  • the system maps, by a flash translation layer component, the first incremental parity page to a new physical block address.
  • the system in response to detecting a loss of power or in response to determining that data in the volatile memory of the controller cannot withstand a power loss, performs one or more of: flushing at least the first incremental parity page from the volatile memory of the controller to the non-volatile memory; and flushing all incremental parity pages from the volatile memory of the controller to the non-volatile memory.
  • the system prior to receiving the request to write the first page, the system: receives a request to write a first number of pages to the non-volatile memory, wherein the first number of pages includes the first page; and assigns, by a host, a second number of physical block addresses to the first number of pages and a third number of parity pages, wherein the second number is a sum of the first number and the third number, and wherein the third number of parity pages includes the first incremental parity page.
  • the number of blocks in the first group or the respective group is based on one or more of: a predetermined optimal block size for data recovery; a size configured by the controller; a size configured by a host; and a size determined by a user of the storage server.
  • FIG. 1A illustrates an exemplary environment that facilitates data recovery, in accordance with an embodiment of the present application.
  • FIG. 1B illustrates an exemplary storage device that facilitates data recovery, in accordance with an embodiment of the present application.
  • FIG. 2 illustrates an exemplary storage device which uses multiple streams based on access frequency, in accordance with the prior art.
  • FIG. 3 illustrates an exemplary organization of physical space in NAND based on multiple streams, in accordance with the prior art.
  • FIG. 4 illustrates an exemplary organization of physical space in NAND that facilitates data recovery, including incremental parity pages, in accordance with an embodiment of the present application.
  • FIG. 5 illustrates an exemplary storage device with power loss protection for flushing parity pages from DRAM to NAND, in accordance with an embodiment of the present application.
  • FIG. 6A illustrates an exemplary organization of physical space in NAND that facilitates data recovery, including updating an incremental parity page and recycling a block in a garbage collection process, in accordance with an embodiment of the present application.
  • FIG. 6B illustrates an exemplary environment that facilitates data recovery, including recycling a block in a garbage collection process, in accordance with an embodiment of the present application.
  • FIG. 7A presents a flowchart illustrating a method for facilitating data recovery, in accordance with an embodiment of the present application.
  • FIG. 7B presents a flowchart illustrating a method for facilitating data recovery, including updating an incremental parity page, in accordance with an embodiment of the present application.
  • FIG. 7C presents a flowchart illustrating a method for facilitating data recovery, including recycling a block, in accordance with an embodiment of the present application.
  • FIG. 7D presents a flowchart illustrating a method for facilitating data recovery, including flushing parity pages based on a power loss, in accordance with an embodiment of the present application.
  • FIG. 8 illustrates an exemplary computer system that facilitates data recovery, in accordance with an embodiment of the present application.
  • FIG. 9 illustrates an exemplary apparatus that facilitates data recovery, in accordance with an embodiment of the present application.
  • the embodiments described herein provide a system which solves the inefficiency problems inherent in multi-stream SSDs by providing an incremental parity generation for groups of blocks which are smaller in granularity than a superblock.
  • data may be placed in a “stream” (e.g., a dedicated independent region of physical space in the non-volatile memory) based on access frequency, such as “hot pages” or “cold pages. ”
  • a stream e.g., a dedicated independent region of physical space in the non-volatile memory
  • access frequency such as “hot pages” or “cold pages. ”
  • hot pages in a stream may expire due to a more recent version written to a new stream
  • cold pages with valid data are still retained in its respective stream.
  • garbage collection the cold pages with valid data must still be copied out in order to recycle the block associated with the streams. This results in a high write amplification, and can also result in an overdesign which affects the overall performance of the storage system. For example, if the system reserves a large number of blocks for future potential new streams, the free block pool becomes limited. Subsequently, if the system experiences intense random write operations, the performance of the storage system may suffer from decreased efficiency (e.g., sub
  • Multi-tenancy can refer to executing multiple independent write operations simultaneously by programming data into NAND flash.
  • the system can place data horizontally in “stripes” into groups of blocks which are of a size smaller than a size of a superblock. That is, data is written in a stripe (horizontally) one page at a time from a beginning block of a group to a last block of the group, and, if space remains in the group and upon reaching the last block, repeats at the next available page of the beginning block.
  • a stripe includes physical pages which each correspond to a sequentially ordered die of non-volatile memory (e.g., NAND) .
  • the system can determine the size of a group of blocks based on throughput requirements of the system.
  • the system can further generate incremental parity pages based on pages in blocks of the group, and can update the incremental parity pages when writing new pages to the stripe or to other blocks which are not in the group of blocks. Placing data and updating incremental parity pages is described below in relation to FIG. 4.
  • the embodiments of the system described herein also provide power loss protection for the incremental parity pages by storing (and updating) the incremental parity pages in a volatile memory buffer of a controller of the storage device (e.g., in the internal DRAM of an SSD controller) , as described below in relation to FIG. 5.
  • the system can efficiently perform garbage collection and recycle a block A by: 1) copying out valid pages from block A to a new block; 2) re-calculating the associated parity (e.g., updating the associated incremental parity page for a respective valid page from block A, by removing the respective valid page form the calculation of parity in the associated incremental parity page) ; and 3) mapping any incremental parity pages in block A to a new physical location.
  • Garbage collection and recycling a block is described below in relation to FIGs. 6A and 6B.
  • the embodiments described herein can avoid the high write amplification involved with the multi-stream regions in a conventional SSD. That is, the described system does not need to reserve a large number of blocks for future new streams, which results in eliminating the limitations on the free block pool. This in turn allows the system to execute intensive write operations without suffering from a decreased efficiency. Furthermore, the described system does not need to use a data recovery group which is a stripe across all blocks in a superblock, which results in eliminating the open parts of blocks which include unused and wasted space in the superblocks.
  • the embodiments described herein provide a system which improves and enhance the efficiency and performance of a storage system.
  • the system can significantly reduce both the number of reserved open blocks and the unused portions of blocks in a superblock.
  • groups of blocks which are smaller in size than a superblock the system can place data in a horizontal stripe fashion, and can also generate and update incremental parity pages within the smaller groups. This can result in an improved efficiency, e.g., by enhancing a garbage collection process based on a reduced granularity of a size of a group of blocks that is less than a size of a superblock.
  • conventional multi-stream SSDs may leave many parts of blocks of a superblock open, which can result in a large amount of unused and wasted space and can also result in a high write amplification for garbage collection at the granularity of a superblock.
  • the system improves the conventional SSDs by allowing for efficient multi-stream SSDs which can both efficiently use the non-volatile memory storage and perform data recovery in a parallel multi-tenancy SSD with a finer granularity, i.e., at a single block level.
  • the system provides a technological solution (e.g., enhancing garbage collection and data recovery in multi-stream SSDs based on a finer granularity) to the technological problem in the software arts (e.g., increased write amplification, wasted space, inefficient garbage collection, and decreased overall efficiency of a storage system) .
  • FIG. 1A illustrates an exemplary environment 100 that facilitates data recovery, in accordance with an embodiment of the present application.
  • Environment 100 can include a computing device 102 and an associated user 104.
  • Computing device 102 can communicate via a network 110 with storage servers 112, 114, and 116, which can be part of a distributed storage system and accessed via client servers (not shown) .
  • a storage server can include multiple storage drives, and each drive can include a controller and multiple physical media for data storage.
  • server 116 can include a network interface card (NIC) 122, a CPU 124, a DRAM dual in-line memory module (DIMM) 126, and SSDs 132, 136, 140, and 144 with, respectively, controllers 134, 138, 142, and 146.
  • NIC network interface card
  • DIMM DRAM dual in-line memory module
  • a controller can include interfaces to a host and to a non-volatile memory.
  • a controller can also include a buffer as well as firmware which includes instructions and/or code to execute the methods described herein.
  • SSD 140 can include SSD controller 142.
  • SSD controller 142 can include: a host interface 150; an embedded processor 152, which includes a buffer 154 and a firmware 156; and a channel management 158.
  • SSD controller 142 can communicate with a host (e.g., via host interface 150 and a communication to/from host 149) .
  • SSD controller 142 can also communicate with the non-volatile memory (via channel management 158) .
  • the non-volatile memory can be accessed via multiple channels. For example, NAND dies 162, 164, and 166 may be accessed via a channel 160, and NAND dies 172, 174, and 176 may be accessed via a channel 170.
  • firmware 156 can include instructions and/or code, which allow incoming write data from the host to be written in a “horizontal” manner in the physical pages of a page stripe across multiple blocks in a group of blocks, one page at a time from a beginning block of the group of blocks to a last block of the group of blocks, as described below in relation to FIG. 4.
  • FIG. 1B illustrates an exemplary storage device (e.g., SSD) 140 that facilitates data recovery, in accordance with an embodiment of the present application.
  • SSD 140 can receive data 180 from the host (via a communication 149) .
  • Data 180 can include k pages of data (e.g., Page_1, Page_2, Page_3, Page_21, Page_22, Page_23, Page_31, Page_32, Page_33, ..., Page_4, Page_5, Page_6, ..., and Page_k) .
  • the host can manage the data placement and the physical addresses in the non-volatile memory. That is, the host can define N physical addresses, where N is comprised of the k pages (or parts) of data and N-k parity pages (or parts) .
  • host interface 150 can receive data 180 and host-defined N physical addresses 184.
  • Host interface 150 can send data 180 to a data recovery engine 157, which can encode and decode data.
  • Data recovery engine 157 can be included in, e.g., firmware 156 and/or embedded processor 152 of FIG. 1A.
  • Data recovery engine 157 can generate processed data 182, which can include N pages, e.g., Page_1, Page_2, ..., Page_6, ..., Page_k, Parity_k+1, Parity_k+2, ..., and Parity_n) .
  • Data recovery engine 157 can send processed data 182 to channel management 158.
  • Host interface 150 can also send host-defined N physical addresses 184 to channel management 158.
  • SSD controller 142 via channel management 158, can write processed data 182 to the locations corresponding to host-defined N physical addresses 184 (e.g., to one or more of NAND dies 162-166 and 172-176 via channels 160 and 170, respectively) .
  • FIG. 2 illustrates an exemplary storage device 202 which uses multiple streams based on access frequency, in accordance with the prior art.
  • SSD 202 can include multiple streams, where a respective stream is filled with data based on the access frequency (hot or cold) of the data.
  • stream 210 can include hot data 212, 214, 216, and 218
  • stream 220 can include cold data 222, 224, 226, and 228
  • stream 230 can be reserved for incoming hot or cold data.
  • multi-stream 202 at a time t2 subsequent to time t1, because stream 210 updates frequently, the original hot data 212-218 may have already “expired, ” and new hot data can be written to stream 230, as hot_new data 212.1-218.1.
  • the system can erase the entire unit and copy fewer valid pages because of the expiration of the frequently accessed pages.
  • the conventional SSD may attempt to provide the optimal number of streams based on, e.g., the amount of data, the reliability of data, and the capacity of the drive.
  • this may lead to an overdesign which affects the overall performance of the storage system. For example, if the system reserves a large number of blocks for future potential new streams, the free block pool becomes limited. Subsequently, if the system experiences intense random write operations, the performance of the storage system may suffer from a decreased efficiency (e.g., sub-optimal performance) .
  • FIG. 3 illustrates an exemplary organization of physical space 300 in NAND based on multiple streams, in accordance with the prior art.
  • Physical space 300 can include a stream 310 (which corresponds to a superblock 312) and a stream 330 (which corresponds to a superblock 332) .
  • Stream 310 can include multiple blocks, including: a block 1 321; a block 2 322; a block 3 323; a block n-2 324; a block n-1 325; and a block n 326.
  • a data recovery group includes a “superpage stripe, ” which is a page stripe across all dies on the SSD, i.e., that includes one physical page from each die on the SSD, or one physical page across all blocks in a superblock.
  • a data recovery group 314 includes Page_1, Page_2, Page_3, ..., Page_n-2, Page n-1, and Page_n across, respectively, block 1 321 to block n 326.
  • a data recovery group 316 includes Page_n+1, Page_n+2, Page_n+3, ..., Page_2n-2, Page 2n-1, and Page_2n across, respectively, block 1 321 to block n 326.
  • a data recovery group 334 includes Page_1, Page_2, Page_3, ..., Page_n-2, Page n-1, and Page_n across, respectively, a block 1 341 to a block n 346.
  • a data recovery group 336 includes Page_n+1, Page_n+2, Page_n+3, ..., Page_2n-2, Page 2n-1, and Page_2n across, respectively, block 1 341 to block n 346.
  • each data recovery group is striped across a NAND superblock, and each data recovery strip must have n parts, where n corresponds to the number of blocks in the superblock, such as 128 blocks.
  • the superblock must remain open until all of the data recovery groups are written.
  • the entire superblock with n blocks must be erased together, which can significantly impact the garbage collection process. For example, in order to erase the entire superblock, the system must copy out all valid data from each data recovery group (across all the blocks of the superblock) before releasing the blocks of the superblock to the free block pool.
  • each of streams 310 and 330 includes a significant amount of open space in the respective blocks, e.g., an open space 318 in stream 310 and an open space 338 in stream 330. Depending on the write operations, this open space may be unused or wasted while waiting for garbage collection to occur.
  • FIG. 4 illustrates an exemplary organization of physical space 400 in NAND that facilitates data recovery, including incremental parity pages, in accordance with an embodiment of the present application.
  • Physical space 400 can include blocks 1-7 (e.g., block 1 421 to block 7 427) .
  • the system can define a number of parallel blocks (e.g., sequentially ordered dies of the non-volatile memory) as an optimal size for a group of blocks. This number can be based on the throughput requirements of the system.
  • the system does not open all blocks in the horizontal direction (e.g., as a superblock) to be the original data recovery group.
  • a first group of blocks 402 includes 4 blocks: block 1 421; block 2 422; block 3 423; and block 4 424.
  • the system can place Page_1 in block 1 421, Page_2 in block 2 422, and Page_3 in block 3 423.
  • the system can subsequently generate an incremental parity page based on these three pages. That is, the system can perform a function 406 by taking as input data from these three pages (via a communication 404) to generate a Parity_1 page.
  • the system can then write the generated Parity_1 to block 4 424 (via a communication 408) .
  • the system can store the generated Parity_1 in a volatile memory, e.g., in the internal DRAM buffer of the SSD, as in DRAM 540 depicted in FIG. 5.
  • the system can place the data of Page_21 in the second row in block 1 421 of the first group of blocks 402, and generate an associated incremental parity page based at least on Page_21 (e.g., Parity_2) .
  • Page_21 e.g., Parity_2
  • the system can determine where and when to generate the incremental parity pages within a particular group of blocks. This is in contrast to the conventional SSD, which would place Page_21 in block 4 424 immediately after placing Page_3 in block 3 423 (e.g., in the same page stripe across all the blocks of a superblock) .
  • the system can write incoming data to a next group of blocks. For example, the system can place Page_4 in block 5 425, Page_5 in block 6 426, and Page_6 in block 7 427.
  • the system can subsequently update an incremental parity page based on these three pages. Specifically, the system can perform function 406 by taking as input data from these three pages (via a communication 410) and the current value of Parity_1 (via a communication 412) to update the Parity_1 page.
  • the system can then “write” the updated Parity_1 to block 4 424 (via a communication 414) by updating a corresponding value in the volatile memory of the SSD. Because the data in a block of NAND flash cannot be written to unless the entire block is erased, the system can update the previously stored Parity_1 in the internal DRAM of the SSD.
  • physical space 400 depicts a system which uses a small group of blocks (i.e., smaller than a superblock) to place data in horizontal page stripes, and inserts an incremental parity page which accounts for data in the same group of blocks or another block.
  • the system can generate this incremental parity for a first set of pages, and subsequently update the incremental parity to: 1) account for additional valid pages which are written to the same or another block, which additional valid pages are written to a location associated with that incremental parity page; and 2) account for valid pages which are to copied out from a page which is to be recycled during a garbage collection process.
  • the system can store the generated and updated incremental parity page in a volatile memory buffer (e.g., DRAM) of the controller, which volatile memory buffer can be flushed to the non-volatile NAND memory (e.g., NAND) upon detecting a power loss or inability of data in the volatile memory to withstand a power loss.
  • a volatile memory buffer e.g., DRAM
  • NAND non-volatile NAND memory
  • the system can execute garbage collection on a reduced granularity, i.e., based on a single block at a time, instead of an entire superblock at a time, as in the conventional SSDs.
  • the host-based management can configure the physical NAND space prior to the data being programmed into NAND.
  • ASIC application-specific integrated circuit
  • FIG. 5 illustrates an exemplary storage device 500 with power loss protection for flushing parity pages from DRAM to NAND, in accordance with an embodiment of the present application.
  • SSD 500 can include an SSD controller 502, which can include: a host interface 504; buffers 506; and a channel management 508.
  • SSD 500 can also include a power loss protection module 530, which can include a plurality of capacitors.
  • SSD 500 can also include a DRAM 540 corresponding to buffers 506.
  • the system can store incremental parity pages in DRAM 540 (e.g., Parity_1 and Parity_2 from FIG. 4) .
  • the system can flush the incremental parity pages from DRAM 540 to the NAND (e.g., one of NAND dies 512-516 and 522-526, via channels 510 and 520, respectively) .
  • the system can also flush the incremental parity pages in response to determining that data in the volatile memory of the controller (e.g., in the DRAM) cannot withstand a power loss.
  • FIG. 6A illustrates an exemplary organization of physical space 600 in NAND that facilitates data recovery, including updating an incremental parity page and recycling a block in a garbage collection process, in accordance with an embodiment of the present application.
  • Physical space 600 can include blocks and data in pages of the blocks which are similar to physical space 400 of FIG. 4.
  • the system can determine during a garbage collection process to recycle block 1 421 (e.g., a block to be recycled 602) .
  • the system can write the valid pages of block 1 421 to a new block.
  • the system can update a set of incremental parity pages associated with the valid pages of block 1 421, to remove the valid pages from a calculation of parity in the set of incremental parity pages.
  • the system can write Page_1 (valid data) to a new block (not shown) , i.e., copying valid data to a new block.
  • the system can update Parity_1 associated with Page_1 by performing a function 608 based on Page_1 (via a communication 604) and the current value of Parity_1 (via a communication 606) .
  • Function 608 results in a Parity_1 new 610 value, which essentially removes the data from Page_1 from the calculation of parity.
  • the system can also determine that a valid page in block 1 421 is an incremental parity page. For these valid parity pages, the system does not need to write the valid incremental parity page to a new block. Instead, the system, via a flash translation layer (FTL) component, need only map the valid incremental parity page to a new physical block address. For example, an FTL component can map Parity_4 to a new physical block address.
  • FTL flash translation layer
  • an FTL component can map Parity_4 to a new physical block address.
  • FIG. 6B illustrates an exemplary environment 640 that facilitates data recovery, including recycling a block in a garbage collection process, in accordance with an embodiment of the present application.
  • Environment 640 can include multiple streams, such as a stream_h 642, a stream_i 644, and a stream_n 648.
  • Astream can include multiple blocks, and a “current stream” can be a group of blocks with a size that is less than a size of a superblock.
  • stream_i 644 can include multiple blocks (e.g., block 1 421, block 2 422, block 3 423, block 4 424, block 5 425, ..., block 7 427)
  • a current stream_i 646 can include 4 blocks (block 1 421 to block 4 424) .
  • the system can determine to recycle block 1 421, and perform the operations described above in relation to FIG. 6A.
  • the system can assign a new block 652, to which the valid pages of block 1 421 can be copied, and perform an erase 662 function by releasing block 1 421 back to a free block pool 660.
  • Environment 640 also depicts other new blocks (e.g., new blocks 654 and 656) being assigned or allocated to stream_i 644, to recycle blocks of the stream (e.g., block 5 425 and block 7 427) .
  • the system can assign a new block 664 to handle other operations, such as a data refresh 670, a bad block management 672, and a burst write 674.
  • FIG. 7A presents a flowchart 700 illustrating a method for facilitating data recovery, in accordance with an embodiment of the present application.
  • the system receives a request to write a first number of pages of data to a non-volatile memory (operation 702) .
  • the system assigns, by a host, a second number of physical block addresses to the first number of pages and a third number of parity pages, wherein the second number is a sum of the first number and the third number, wherein the first number of pages includes a first page, and wherein the third number of parity pages includes a first incremental parity page (operation 704) .
  • Host-defined physical addresses are described above in relation to FIG. 1B.
  • the system writes the first page to a first block of a first group of blocks of the non-volatile memory, wherein a number of blocks in a respective group is less than a number of blocks in a superblock, wherein data is written in a stripe one page at a time from a beginning block of the respective group to a last block of the respective group, wherein a stripe includes physical pages which each correspond to a sequentially ordered die of the non-volatile memory, and wherein the first block is a next sequentially available block in the first group of blocks (operation 706) .
  • the number of blocks in the first group of blocks or a respective group of blocks can define a data recovery group, as described above in relation to FIG. 4, and can be based on the throughput requirements of the system, e.g., as a predetermined optimal block size for data recovery.
  • the controller, the host, or a user of the storage server can also configure the number of blocks.
  • the system generates, by a controller, the first incremental parity page based on at least the first page (operation 708) .
  • the system writes the first incremental parity page to a second block of the first group of blocks, wherein the second block is the next sequentially available block in the first group of blocks (operation 710) .
  • the system stores, by the controller, the first incremental parity page in a volatile memory buffer of the controller (operation 712) .
  • the system enhances a garbage collection process based on a reduced granularity of a size of a group of blocks that is less than a size of a superblock (operation 714) .
  • the operation can subsequently continue as depicted at any of Labels A, B, and C in FIGs. 7B, 7C, and 7D, respectively.
  • FIG. 7B presents a flowchart 720 illustrating a method for facilitating data recovery, including updating an incremental parity page, in accordance with an embodiment of the present application.
  • the system receives a request to write a second page of data to the non-volatile memory (operation 722) . If the first group of blocks is not full (decision 724) , the operation continues as described above at operation 706 of FIG. 7A, where the second page is written to a next available page of the next sequentially ordered die in the first group of blocks. If the first group of blocks is full (decision 724) , the system writes the second page to a third block in a second group of blocks (operation 726) , as described above in relation to FIG. 4.
  • the system updates, based on the second page, a second incremental parity page associated with the location of the second page (operation 730) , and the operation continues at operation 734. If the second page is written to a location associated with the first incremental parity page (decision 728) , the system updates the first incremental parity page based on the second page (operation 732) , as described above in relation to FIG. 4. The system stores, by the controller, the updated first (or second) incremental parity page in the volatile memory of the controller (operation 734) , and the operation returns.
  • FIG. 7C presents a flowchart 740 illustrating a method for facilitating data recovery, including recycling a block, in accordance with an embodiment of the present application.
  • the operation returns. If the system does detect a condition which triggers recycling of the first block (decision 742) , the system writes the first page and valid pages in the first block to one or more new blocks (operation 744) .
  • the system updates the first incremental parity page based on at least the first page, to remove the first page from a calculation of parity in the first incremental parity page (operation 746) .
  • the system also updates a set of incremental parity pages associated with the valid pages in the first block, to remove the valid pages from a respective calculation of parity in the set of incremental parity pages (operation 748) .
  • the system stores the updated first incremental parity page and the updated set of incremental parity pages in the volatile memory of the controller (operation 750) .
  • the system maps, by a flash translation layer component, any incremental parity pages in the first block to new physical addresses (operation 752) .
  • the system releases the first block to a free block pool (operation 754) , and the operation returns.
  • FIG. 7D presents a flowchart 760 illustrating a method for facilitating data recovery, including flushing parity pages based on a power loss, in accordance with an embodiment of the present application.
  • the system detects a loss of power or determines that data in the volatile memory of the controller cannot withstand a power loss (operation 762) .
  • the system flushes at least the first incremental parity page from the volatile memory of the controller to the non-volatile memory (operation 764) .
  • the system flushes all incremental parity pages from the volatile memory of the controller to the non-volatile memory (operation 766) .
  • FIG. 8 illustrates an exemplary computer system that facilitates data recovery, in accordance with an embodiment of the present application.
  • Computer system 800 includes a processor 802, a memory 804, and a storage device/firmware 808.
  • Computer system 800 may be a computing device or a storage device.
  • Volatile memory 804 can include memory (e.g., RAM) that serves as a managed memory, and can be used to store one or more memory pools.
  • Non-volatile memory 806 can include memory (e.g., NAND flash) which is used for persistent storage.
  • computer system 800 can be coupled to a display device 810, a keyboard 812, and a pointing device 814.
  • Storage device/firmware 808 can store an operating system 816, a content-processing system 818, and data 832. Note that firmware 808 may alternatively be located in or included in other components of computer system 800.
  • Content-processing system 818 can include instructions, which when executed by computer system 800, can cause computer system 800 to perform methods and/or processes described in this disclosure.
  • content-processing system 818 can include instructions for receiving and transmitting data packets, including a request to write or read data, data to be encoded, decoded, stored, deleted, or access, or a block or a page of data.
  • Content-processing system 818 can further include instructions for receiving a request to write a first page of data to a non-volatile memory (communication module 820) .
  • Content-processing system 818 can include instructions for writing the first page to a first block in a first group of blocks of the non-volatile memory, wherein a number of blocks in a respective group is less than a number of blocks in a superblock, wherein data is written in a stripe one page at a time from a beginning block of the respective group to a last block of the respective group, wherein a stripe includes physical pages which each correspond to a sequentially ordered die of the non-volatile memory, and wherein the first block is a next sequentially available block in the first group of blocks (block-writing module 822) .
  • Content-processing system 818 can include instructions for generating, by a controller, a first incremental parity page based on at least the first page (incremental parity-managing module 824) .
  • Content-processing system 818 can include instructions for writing the first incremental parity page to a second block of the first group of blocks, wherein the second block is the next sequentially available block in the first group of blocks (block-writing module 822) .
  • Content-processing system 818 can include instructions for storing, by the controller, the first incremental parity page in a volatile memory buffer of the controller (buffer-managing module 826) .
  • Content-processing system 818 can include instructions for enhancing a garbage collection process based on a reduced granularity of a size of a group of blocks that is less than a size of a superblock (garbage collection-processing module 832) .
  • Content-processing system 818 can also include instructions for receiving a request to write a second page of data to the non-volatile memory (communication module 820) .
  • Content-processing system 818 can include instructions for determining that the first group of blocks is full (block-writing module 822) .
  • Content-processing system 818 can include instructions for writing the second page to a third block in a second group of blocks (block-writing module 822) .
  • Content-processing system 818 can include instructions for in response to determining that the second page is written to a location associated with the first incremental parity page (incremental parity-managing module 824) , updating the first incremental parity page based on the second page (incremental parity-managing module 824) .
  • Content-processing system 818 can include instructions for storing, by the controller, the updated first incremental parity page in the volatile memory of the controller (buffer-managing module 826) .
  • Content-processing system 818 can include instructions for operations in response to detecting a condition which triggers recycling of the first or second block (garbage collection-processing module 830) .
  • Content-processing system 818 can include instructions for operations in response to detecting a loss of power or determining that data in the volatile memory of the controller cannot withstand a power loss (power loss-protecting module 828) .
  • Data 832 can include any data that is required as input or that is generated as output by the methods and/or processes described in this disclosure. Specifically, data 832 can store at least: data to be stored, written, read, loaded, moved, retrieved, deleted, or copied; a logical unit of data; a physical unit of data; a physical page of data; a logical page of data; a block of data; a data stripe to which data is written in a stripe one page at a time in a horizontal manner across blocks in a group of blocks; a group of blocks of a size that is less than a superblock; indicators of sequentially ordered dies of non-volatile memory; multiple streams; an incremental parity page; an operation which generates an incremental parity page, either by including or removing a page of data from a calculation of the incremental parity; a new block; an indication that a block is to be recycled for garbage collection; a released block; a location associated with a page of data or an incremental parity page; a calculation of parity
  • FIG. 9 illustrates an exemplary apparatus that facilitates data recovery, in accordance with an embodiment of the present application.
  • Apparatus 900 can comprise a plurality of units or apparatuses which may communicate with one another via a wired, wireless, quantum light, or electrical communication channel.
  • Apparatus 900 may be realized using one or more integrated circuits, and may include fewer or more units or apparatuses than those shown in FIG. 9.
  • apparatus 900 may be integrated in a computer system, or realized as a separate device which is capable of communicating with other computer systems and/or devices.
  • apparatus 900 can comprise units 902-912 which perform functions or operations similar to modules 820-830 of computer system 800 of FIG. 8, including: a communication unit 902; a block-writing unit 904; an incremental parity-managing unit 906; a buffer-managing unit 908; a power loss-protecting unit 910; and a garbage collection-processing unit 912.
  • the data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system.
  • the computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs) , DVDs (digital versatile discs or digital video discs) , or other media capable of storing computer-readable media now known or later developed.
  • the methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above.
  • a computer system reads and executes the code and/or data stored on the computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the computer-readable storage medium.
  • the methods and processes described above can be included in hardware modules.
  • the hardware modules can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs) , and other programmable-logic devices now known or later developed.
  • ASIC application-specific integrated circuit
  • FPGAs field-programmable gate arrays
  • the hardware modules When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Memory System (AREA)
PCT/CN2018/109650 2018-10-10 2018-10-10 System and method for data recovery in parallel multi-tenancy ssd with finer granularity WO2020073233A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880098408.3A CN112823331B (zh) 2018-10-10 2018-10-10 用于具有较细粒度的并行多租户ssd中的数据恢复的系统和方法
PCT/CN2018/109650 WO2020073233A1 (en) 2018-10-10 2018-10-10 System and method for data recovery in parallel multi-tenancy ssd with finer granularity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/109650 WO2020073233A1 (en) 2018-10-10 2018-10-10 System and method for data recovery in parallel multi-tenancy ssd with finer granularity

Publications (1)

Publication Number Publication Date
WO2020073233A1 true WO2020073233A1 (en) 2020-04-16

Family

ID=70163729

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/109650 WO2020073233A1 (en) 2018-10-10 2018-10-10 System and method for data recovery in parallel multi-tenancy ssd with finer granularity

Country Status (2)

Country Link
CN (1) CN112823331B (zh)
WO (1) WO2020073233A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708481A (zh) * 2020-04-24 2020-09-25 浙江大学 一种基于超级块的固态硬盘ssd双区磨损均衡处理方法
CN111897495A (zh) * 2020-07-28 2020-11-06 深圳忆联信息系统有限公司 提高ssd写性能的实现方法、装置、计算机设备及存储介质
CN112199044A (zh) * 2020-10-10 2021-01-08 中国人民大学 面向多租户的ftl设置方法、系统、计算机程序及存储介质
CN116483280A (zh) * 2023-04-26 2023-07-25 珠海妙存科技有限公司 固件存储方法、固件查找方法、设备及介质
WO2024016257A1 (en) * 2022-07-21 2024-01-25 Micron Technology, Inc. Handling parity data during data folding in a memory device
US11966607B2 (en) 2021-09-29 2024-04-23 Silicon Motion, Inc. Method and non-transitory computer-readable storage medium and apparatus for accessing to encoding-history information

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI821965B (zh) * 2021-09-29 2023-11-11 慧榮科技股份有限公司 編碼歷程資訊的存取方法及電腦程式產品及裝置
CN117149091A (zh) * 2023-10-23 2023-12-01 四川云海芯科微电子科技有限公司 一种固态硬盘数据保存方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080082728A1 (en) * 2006-09-28 2008-04-03 Shai Traister Memory systems for phased garbage collection using phased garbage collection block or scratch pad block as a buffer
CN103530237A (zh) * 2013-10-31 2014-01-22 厦门大学 一种固态盘阵列的垃圾回收方法
CN106528004A (zh) * 2016-12-14 2017-03-22 湖南国科微电子股份有限公司 提高DRAM less SSD垃圾回收效率的方法、Block及垃圾回收系统
US20170249209A1 (en) * 2016-02-26 2017-08-31 SK Hynix Inc. Data storage device and operating method thereof
US20180165189A1 (en) * 2016-12-14 2018-06-14 Via Technologies, Inc. Non-volatile memory apparatus and garbage collection method thereof

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8775901B2 (en) * 2011-07-28 2014-07-08 SanDisk Technologies, Inc. Data recovery for defective word lines during programming of non-volatile memory arrays
US8924820B2 (en) * 2012-07-27 2014-12-30 Kabushiki Kaisha Toshiba Memory controller, semiconductor memory system, and memory control method
US8949692B1 (en) * 2014-01-23 2015-02-03 DSSD, Inc. Method and system for service-aware parity placement in a storage system
US9564212B2 (en) * 2014-05-06 2017-02-07 Western Digital Technologies, Inc. Solid-state memory corruption mitigation
JP2016118815A (ja) * 2014-12-18 2016-06-30 パナソニックIpマネジメント株式会社 不揮発性メモリ装置
KR102527992B1 (ko) * 2016-03-14 2023-05-03 삼성전자주식회사 데이터 저장 장치와 이를 포함하는 데이터 처리 시스템

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080082728A1 (en) * 2006-09-28 2008-04-03 Shai Traister Memory systems for phased garbage collection using phased garbage collection block or scratch pad block as a buffer
CN103530237A (zh) * 2013-10-31 2014-01-22 厦门大学 一种固态盘阵列的垃圾回收方法
US20170249209A1 (en) * 2016-02-26 2017-08-31 SK Hynix Inc. Data storage device and operating method thereof
CN106528004A (zh) * 2016-12-14 2017-03-22 湖南国科微电子股份有限公司 提高DRAM less SSD垃圾回收效率的方法、Block及垃圾回收系统
US20180165189A1 (en) * 2016-12-14 2018-06-14 Via Technologies, Inc. Non-volatile memory apparatus and garbage collection method thereof

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708481A (zh) * 2020-04-24 2020-09-25 浙江大学 一种基于超级块的固态硬盘ssd双区磨损均衡处理方法
CN111708481B (zh) * 2020-04-24 2021-04-06 浙江大学 一种基于超级块的固态硬盘ssd双区磨损均衡处理方法
CN111897495A (zh) * 2020-07-28 2020-11-06 深圳忆联信息系统有限公司 提高ssd写性能的实现方法、装置、计算机设备及存储介质
CN111897495B (zh) * 2020-07-28 2023-07-04 深圳忆联信息系统有限公司 提高ssd写性能的实现方法、装置、计算机设备及存储介质
CN112199044A (zh) * 2020-10-10 2021-01-08 中国人民大学 面向多租户的ftl设置方法、系统、计算机程序及存储介质
US11966607B2 (en) 2021-09-29 2024-04-23 Silicon Motion, Inc. Method and non-transitory computer-readable storage medium and apparatus for accessing to encoding-history information
WO2024016257A1 (en) * 2022-07-21 2024-01-25 Micron Technology, Inc. Handling parity data during data folding in a memory device
CN116483280A (zh) * 2023-04-26 2023-07-25 珠海妙存科技有限公司 固件存储方法、固件查找方法、设备及介质
CN116483280B (zh) * 2023-04-26 2023-11-28 珠海妙存科技有限公司 固件存储方法、固件查找方法、设备及介质

Also Published As

Publication number Publication date
CN112823331B (zh) 2024-03-29
CN112823331A (zh) 2021-05-18

Similar Documents

Publication Publication Date Title
WO2020073233A1 (en) System and method for data recovery in parallel multi-tenancy ssd with finer granularity
JP7366795B2 (ja) メモリシステムおよび制御方法
US10198215B2 (en) System and method for multi-stream data write
TWI805323B (zh) 儲存裝置
US10795586B2 (en) System and method for optimization of global data placement to mitigate wear-out of write cache and NAND flash
US10877898B2 (en) Method and system for enhancing flash translation layer mapping flexibility for performance and lifespan improvements
US11379155B2 (en) System and method for flash storage management using multiple open page stripes
CN112596667B (zh) 在固态驱动器中组织nand块并放置数据以便于随机写入的高吞吐量的方法和系统
CN111742291A (zh) 具有用户空间闪存转换层的用户空间存储i/o栈的方法和系统
CN110795272B (zh) 用于在可变大小的i/o上促进的原子性和延迟保证的方法和系统
US11200159B2 (en) System and method for facilitating efficient utilization of NAND flash memory
CN114372007A (zh) 存储器系统及控制非易失性存储器的控制方法
KR20200032527A (ko) 메모리 시스템의 동작 방법 및 메모리 시스템
US11449386B2 (en) Method and system for optimizing persistent memory on data retention, endurance, and performance for host memory
US11204869B2 (en) System and method for facilitating data storage with low-latency input/output and persistent data
US9990280B2 (en) Methods for reading data from a storage unit of a flash memory and apparatuses using the same
CN114780018A (zh) 用于促进多流顺序读取性能改进并减少阅读放大的方法和系统
CN113590505A (zh) 地址映射方法、固态硬盘控制器及固态硬盘
US11372774B2 (en) Method and system for a solid state drive with on-chip memory integration
CN110119245B (zh) 用于操作nand闪存物理空间以扩展存储器容量的方法和系统
JP2007233838A (ja) メモリシステムの制御方法
US11281575B2 (en) Method and system for facilitating data placement and control of physical addresses with multi-queue I/O blocks
US11307766B2 (en) Apparatus and method and computer program product for programming flash administration tables
US11429519B2 (en) System and method for facilitating reduction of latency and mitigation of write amplification in a multi-tenancy storage drive
US11263132B2 (en) Method and system for facilitating log-structure data organization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18936512

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18936512

Country of ref document: EP

Kind code of ref document: A1