US20220391115A1 - Dissimilar Write Prioritization in ZNS Devices - Google Patents

Dissimilar Write Prioritization in ZNS Devices Download PDF

Info

Publication number
US20220391115A1
US20220391115A1 US17/338,487 US202117338487A US2022391115A1 US 20220391115 A1 US20220391115 A1 US 20220391115A1 US 202117338487 A US202117338487 A US 202117338487A US 2022391115 A1 US2022391115 A1 US 2022391115A1
Authority
US
United States
Prior art keywords
zone
data storage
storage device
zones
controller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US17/338,487
Other versions
US11537303B1 (en
Inventor
Ramanathan Muthiah
Rakesh Balakrishnan
Eldhose Peter
Judah Gamliel Hahn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SanDisk Technologies LLC
Original Assignee
Western Digital Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Western Digital Technologies Inc filed Critical Western Digital Technologies Inc
Priority to US17/338,487 priority Critical patent/US11537303B1/en
Assigned to WESTERN DIGITAL TECHNOLOGIES, INC. reassignment WESTERN DIGITAL TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BALAKRISHNAN, RAKESH, HAHN, JUDAH GAMLIEL, MUTHIAH, RAMANATHAN, PETER, ELDHOSE
Assigned to JPMORGAN CHASE BANK, N.A., AS AGENT reassignment JPMORGAN CHASE BANK, N.A., AS AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WESTERN DIGITAL TECHNOLOGIES, INC.
Assigned to WESTERN DIGITAL TECHNOLOGIES, INC. reassignment WESTERN DIGITAL TECHNOLOGIES, INC. RELEASE OF SECURITY INTEREST AT REEL 057651 FRAME 0296 Assignors: JPMORGAN CHASE BANK, N.A.
Publication of US20220391115A1 publication Critical patent/US20220391115A1/en
Application granted granted Critical
Publication of US11537303B1 publication Critical patent/US11537303B1/en
Assigned to JPMORGAN CHASE BANK, N.A. reassignment JPMORGAN CHASE BANK, N.A. PATENT COLLATERAL AGREEMENT - A&R LOAN AGREEMENT Assignors: WESTERN DIGITAL TECHNOLOGIES, INC.
Assigned to JPMORGAN CHASE BANK, N.A. reassignment JPMORGAN CHASE BANK, N.A. PATENT COLLATERAL AGREEMENT - DDTL LOAN AGREEMENT Assignors: WESTERN DIGITAL TECHNOLOGIES, INC.
Assigned to SanDisk Technologies, Inc. reassignment SanDisk Technologies, Inc. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WESTERN DIGITAL TECHNOLOGIES, INC.
Assigned to SanDisk Technologies, Inc. reassignment SanDisk Technologies, Inc. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: SanDisk Technologies, Inc.
Assigned to JPMORGAN CHASE BANK, N.A., AS THE AGENT reassignment JPMORGAN CHASE BANK, N.A., AS THE AGENT PATENT COLLATERAL AGREEMENT Assignors: SanDisk Technologies, Inc.
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0635Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0607Interleaved addressing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0616Improving the reliability of storage systems in relation to life time, e.g. increasing Mean Time Between Failures [MTBF]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0625Power saving in storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/40Specific encoding of data in memory or cache
    • G06F2212/403Error protection encoding, e.g. using parity or ECC codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7201Logical to physical mapping or translation of blocks or pages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7211Wear leveling

Definitions

  • Embodiments of the present disclosure generally relate to efficient utilization of dies in a zone namespace (ZNS) device.
  • ZNS zone namespace
  • logical blocks are formed across physical blocks to increase the hardware (HW) workload efficiency during sequential performances.
  • HW hardware
  • the logical blocks are not formed across an entire set of dies, but rather, a logical zone itself could be a single physical block or a few physical blocks, unlike full-fledged interleaving. Multiple such smaller logical blocks constitute a zoned namespace within the ZNS device.
  • the zoned namespaces in a ZNS device is helpful for data segregation, but may impact performance if all dies/flash interface modules (FIMs) are not used in parallel.
  • FIMs flash interface modules
  • the host device typically has substantial control over the data storage device.
  • the host device selects the NVMe set/endurance group for zone creation (i.e., zone open and zone append).
  • zone creation i.e., zone open and zone append.
  • the host device may not be aware of the workload on physical resources of the data storage device such as dies, flash channel(s), and other storage specific resources such as parity check engines and cache buffers.
  • the data storage device may act upon the zone commands according to the submission queue, and may not be able to bias the zone commands according to the state of the data storage device itself. Biasing the zone commands according to the data storage device state may optimize resource utilization, thereby increasing the quality of service (QoS) of the system.
  • QoS quality of service
  • the present disclosure generally relates to creating new zones in a data storage device in a manner that ensures substantially even workload of the memory device storage locations.
  • the data storage device can guide a host device to select a particular zone to open in zone namespace (ZNS) systems where the host device selects which zone to open.
  • ZNS zone namespace
  • the data storage device tracks the workload of the various storage locations and create zones.
  • the data storage device then provides selected zones having the least used storage locations with the idea of guiding the host device to select the zone having the least used storage locations.
  • the host will select, based upon guidance from the data storage device, zones that contain the least utilized storage location. In so doing, generally even workload of the memory device storage locations is achieved.
  • a data storage device comprises: a memory device; and a controller coupled to the memory device, wherein the controller is configured to: create an empty zone map, wherein the empty zone map ranks zones within the empty zone map from least utilized to most utilized; send the empty zone map to a host device; and receive a zone open command from the host device.
  • a data storage device comprises: a memory device; and a controller coupled to the memory device, wherein the controller is configured to: create a plurality of zones in a backend of the memory device; rank the zones in a list from least utilized area of the backend to most utilized area of the backend; forward at least a part of the list to a host device; and receive a zone open command from the host device.
  • a data storage device comprises: memory means; a controller coupled to the memory means, wherein the controller is configured to: recommend a zone to be opened to a host device, wherein recommending a zone comprises providing a list of zones that can be opened to the host device, wherein the list comprises the recommended zone and other zones; and receive a zone open command from the host device to open the recommended zone.
  • FIG. 1 is a schematic block diagram illustrating a storage system, according to one embodiment.
  • FIG. 2 A illustrates zoned namespaces (ZNSs) utilized in a storage device, according to one embodiment.
  • ZNSs zoned namespaces
  • FIG. 2 B illustrates a state diagram for the ZNSs of the storage device of FIG. 2 A , according to one embodiment.
  • FIG. 3 illustrates a NVMe set/endurance group in a ZNS system according to one embodiment.
  • FIG. 4 illustrates a ZNS system for efficient backend utilization according to one embodiment.
  • FIG. 5 is a flowchart illustrating efficient backend utilization of a ZNS device.
  • the present disclosure generally relates to creating new zones in a data storage device in a manner that ensures substantially even workload of the memory device storage locations.
  • the data storage device can guide a host device to select a particular zone to open in zone namespace (ZNS) systems where the host device selects which zone to open.
  • ZNS zone namespace
  • the data storage device tracks the workload of the various storage locations and create zones.
  • the data storage device then provides selected zones having the least used storage locations with the idea of guiding the host device to select the zone having the least used storage locations.
  • the host will select, based upon guidance from the data storage device, zones that contain the least utilized storage location. In so doing, generally even workload of the memory device storage locations is achieved.
  • FIG. 1 is a schematic block diagram illustrating a storage system 100 in which data storage device 106 may function as a storage device for a host device 104 , in accordance with one or more techniques of this disclosure.
  • the host device 104 may utilize non-volatile memory 110 included in data storage device 106 to store and retrieve data.
  • the host device 104 includes a host DRAM 138 .
  • the storage system 100 may include a plurality of storage devices, such as the data storage device 106 , which may operate as a storage array.
  • the storage system 100 may include a plurality of data storages devices 106 configured as a redundant array of inexpensive/independent disks (RAID) that collectively function as a mass storage device for the host device 104 .
  • RAID redundant array of inexpensive/independent disks
  • the storage system 100 includes the host device 104 which may store and/or retrieve data to and/or from one or more storage devices, such as the data storage device 106 . As illustrated in FIG. 1 , the host device 104 may communicate with the storage device 106 via an interface 114 .
  • the host device 104 may comprise any of a wide range of devices, including computer servers, network attached storage (NAS) units, desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart” phones, so-called “smart” pads, televisions, cameras, display devices, digital media players, video gaming consoles, video streaming device, and the like.
  • NAS network attached storage
  • the data storage device 106 includes a controller 108 , non-volatile memory 110 (NVM 110 ), a power supply 111 , volatile memory 112 , and an interface 114 .
  • the controller 108 comprises an internal memory 120 or buffer.
  • the data storage device 106 may include additional components not shown in FIG. 1 for sake of clarity.
  • the data storage device 106 may include a printed board (PB) to which components of the data storage device 106 are mechanically attached and which includes electrically conductive traces that electrically interconnect components of the data storage device 106 , or the like.
  • PB printed board
  • the physical dimensions and connector configurations of the data storage device 106 may conform to one or more standard form factors.
  • Some example standard form factors include, but are not limited to, 3.5′′ data storage device (e.g., an HDD or SSD), 2.5′′ data storage device, 1.8′′ data storage device, peripheral component interconnect (PCI), PCI-extended (PCI-X), PCI Express (PCIe) (e.g., PCIe x1, x4, x8, x16, PCIe Mini Card, MiniPCI, etc.).
  • the data storage device 106 may be directly coupled (e.g., directly soldered) to a motherboard of the host device 104 .
  • the interface 114 of the data storage device 106 may include one or both of a data bus for exchanging data with the host device 104 and a control bus for exchanging commands with the host device 104 .
  • the interface 114 may operate in accordance with any suitable protocol.
  • the interface 114 may operate in accordance with one or more of the following protocols: advanced technology attachment (ATA) (e.g., serial-ATA (SATA) and parallel-ATA (PATA)), Fibre Channel Protocol (FCP), small computer system interface (SCSI), serially attached SCSI (SAS), PCI, and PCIe, non-volatile memory express (NVMe), OpenCAPI, GenZ, Cache Coherent Interface Accelerator (CCIX), Open Channel SSD (OCSSD), or the like.
  • ATA advanced technology attachment
  • SATA serial-ATA
  • PATA parallel-ATA
  • FCP Fibre Channel Protocol
  • SCSI small computer system interface
  • SAS serially attached SCSI
  • PCI PCI
  • PCIe non-volatile memory express
  • the electrical connection of the interface 114 (e.g., the data bus, the control bus, or both) is electrically connected to the controller 108 , providing electrical connection between the host device 104 and the controller 108 , allowing data to be exchanged between the host device 104 and the controller 108 .
  • the electrical connection of the interface 114 may also permit the data storage device 106 to receive power from the host device 104 .
  • the power supply 111 may receive power from the host device 104 via the interface 114 .
  • the data storage device 106 includes NVM 110 , which may include a plurality of media units or memory devices.
  • NVM 110 may be configured to store and/or retrieve data.
  • a media unit of NVM 110 may receive data and a message from the controller 108 that instructs the memory device to store the data.
  • the media unit of NVM 110 may receive a message from the controller 108 that instructs the memory device to retrieve data.
  • each of the media units may be referred to as a die.
  • a single physical chip may include a plurality of dies (i.e., a plurality of memory devices).
  • each memory devices may be configured to store relatively large amounts of data (e.g., 128MB, 256MB, 512MB, 1GB, 2GB, 4GB, 8GB, 16GB, 32GB, 64GB, 128GB, 256GB, 512GB, 1TB, etc.).
  • relatively large amounts of data e.g., 128MB, 256MB, 512MB, 1GB, 2GB, 4GB, 8GB, 16GB, 32GB, 64GB, 128GB, 256GB, 512GB, 1TB, etc.
  • each media unit of NVM 110 may include any type of non-volatile memory devices, such as flash memory devices, phase-change memory (PCM) devices, resistive random-access memory (ReRAM) devices, magnetoresistive random-access memory (MRAM) devices, ferroelectric random-access memory (F-RAM), holographic memory devices, and any other type of non-volatile memory devices.
  • non-volatile memory devices such as flash memory devices, phase-change memory (PCM) devices, resistive random-access memory (ReRAM) devices, magnetoresistive random-access memory (MRAM) devices, ferroelectric random-access memory (F-RAM), holographic memory devices, and any other type of non-volatile memory devices.
  • the NVM 110 may comprise a plurality of flash memory devices.
  • Flash memory devices may include NAND or NOR based flash memory devices, and may store data based on a charge contained in a floating gate of a transistor for each flash memory cell.
  • the flash memory device may be divided into a plurality of blocks which may divided into a plurality of pages.
  • Each block of the plurality of blocks within a particular memory device may include a plurality of NAND cells.
  • Rows of NAND cells may be electrically connected using a word line to define a page of a plurality of pages.
  • Respective cells in each of the plurality of pages may be electrically connected to respective bit lines.
  • NAND flash memory devices may be 2D or 3D devices, and may be single level cell (SLC), multi-level cell (MLC), triple level cell (TLC), or quad level cell (QLC).
  • the controller 108 may write data to and read data from NAND flash memory devices at the page level and erase data from NAND flash memory devices at the block level.
  • the data storage device 106 includes a power supply 111 , which may provide power to one or more components of the data storage device 106 .
  • the power supply 111 may provide power to the one or more components using power provided by an external device, such as the host device 104 .
  • the power supply 111 may provide power to the one or more components using power received from the host device 104 via the interface 114 .
  • the power supply 111 may include one or more power storage components configured to provide power to the one or more components when operating in a shutdown mode, such as where power ceases to be received from the external device. In this way, the power supply 111 may function as an onboard backup power source.
  • the one or more power storage components include, but are not limited to, capacitors, super capacitors, batteries, and the like.
  • the amount of power that may be stored by the one or more power storage components may be a function of the cost and/or the size (e.g., area/volume) of the one or more power storage components. In other words, as the amount of power stored by the one or more power storage components increases, the cost and/or the size of the one or more power storage components also increases.
  • the data storage device 106 also includes volatile memory 112 , which may be used by controller 108 to store information.
  • Volatile memory 112 may be comprised of one or more volatile memory devices.
  • the controller 108 may use volatile memory 112 as a cache. For instance, the controller 108 may store cached information in volatile memory 112 until cached information is written to non-volatile memory 110 . As illustrated in FIG. 1 , volatile memory 112 may consume power received from the power supply 111 .
  • volatile memory 112 examples include, but are not limited to, random-access memory (RAM), dynamic random access memory (DRAM), static RAM (SRAM), and synchronous dynamic RAM (SDRAM (e.g., DDR1, DDR2, DDR3, DDR3L, LPDDR3, DDR4, LPDDR4, and the like)).
  • RAM random-access memory
  • DRAM dynamic random access memory
  • SRAM static RAM
  • SDRAM synchronous dynamic RAM
  • the data storage device 106 includes a controller 108 , which may manage one or more operations of the data storage device 106 .
  • the controller 108 may manage the reading of data from and/or the writing of data to the NVM 110 .
  • the controller 108 may initiate a data storage command to store data to the NVM 110 and monitor the progress of the data storage command.
  • the controller 108 may determine at least one operational characteristic of the storage system 100 and store the at least one operational characteristic to the NVM 110 .
  • the controller 108 when the data storage device 106 receives a write command from the host device 104 , the controller 108 temporarily stores the data associated with the write command in the internal memory 120 before sending the data to the NVM 110 .
  • FIG. 2 A illustrates a zoned namespace (ZNS) 202 view utilized in a storage device 200 , according to one embodiment.
  • the storage device 200 may present the ZNS 202 view to a host device.
  • FIG. 2 B illustrates a state diagram 250 for the ZNS 202 of the storage device 200 , according to one embodiment.
  • the storage device 200 may be the data storage device 106 of the storage system 100 of FIG. 1 .
  • the storage device 200 may have one or more ZNS 202 , and each ZNS 202 may be different sizes.
  • the storage device 200 may further comprise one or more conventional namespaces in addition to the one or more ZNS 202 .
  • the ZNS 202 may be a zoned block command (ZBC) for SAS and/or a zoned-device ATA command set (ZAC) for SATA.
  • ZBC zoned block command
  • ZAC zoned-device ATA command set
  • the ZNS 202 is the quantity of NVM that can be formatted into logical blocks such that the capacity is divided into a plurality of zones 206 a - 206 n (collectively referred to as zones 206 ).
  • Each of the zones 206 comprise a plurality of physical or erase blocks (now shown) of a media unit or NVM 204 , and each of the erase blocks are associated a plurality of logical blocks (not shown).
  • the controller 208 receives a command, such as from a host device (not shown) or the submission queue of a host device, the controller 208 can read data from and write data to the plurality of logical blocks associated with the plurality of erase blocks of the ZNS 202 .
  • Each of the logical blocks is associated with a unique LBA or sector.
  • the NVM 204 is a NAND device.
  • the NAND device comprises one or more dies.
  • Each of the one or more dies comprises one or more planes.
  • Each of the one or more planes comprises one or more erase blocks.
  • Each of the one or more erase blocks comprises one or more wordlines (e.g., 256 wordlines).
  • Each of the one or more wordlines may be addressed in one or more pages.
  • an MLC NAND die may use an upper page and a lower page to reach the two bits in each cell of the full wordline (e.g., 16 kB per page).
  • each page can be accessed at a granularity equal to or smaller than the full page.
  • a controller can frequently access NAND in user data granularity LBA sizes of 512 bytes.
  • NAND locations are equal to a granularity of 512 bytes.
  • an LBA size of 512 bytes and a page size of 16 kB for two pages of an MCL NAND results in about 16 NAND locations per wordline.
  • the NAND location size is not intended to be limiting, and is merely used as an example.
  • one or more logical blocks are correspondingly updated within a zone 206 to track where the data is located within the NVM 204 .
  • Data may be written to one zone 206 at a time until a zone 206 is full, or to multiple zones 206 such that multiple zones 206 may be partially full.
  • data may be written to the plurality of erase blocks one block at a time, in sequential order of NAND locations, page-by-page, or wordline-by-wordline, until moving to an adjacent block (i.e., write to a first erase block until the first erase block is full before moving to the second erase block), or to multiple blocks at once, in sequential order of NAND locations, page-by-page, or wordline-by-wordline, to partially fill each block in a more parallel fashion (i.e., writing the first NAND location or page of each erase block before writing to the second NAND location or page of each erase block).
  • Each of the zones 206 is associated with a zone starting logical block address (ZSLBA).
  • the ZSLBA is the first available LBA in the zone 206 .
  • the first zone 206 a is associated with Z a SLBA
  • the second zone 206 b is associated with Z b SLBA
  • the third zone 206 c is associated with Z c SLBA
  • the fourth zone 206 d is associated with Z d SLBA
  • the n th zone 206 n i.e., the last zone
  • Each zone 206 is identified by its ZSLBA, and is configured to receive sequential writes (i.e., writing data to the NVM 110 in the order the write commands are received).
  • a write pointer 210 is advanced or updated to point to or to indicate the next available block in the zone 206 to write data to in order to track the next write starting point (i.e., the completion point of the prior write equals the starting point of a subsequent write).
  • the write pointer 210 indicates where the subsequent write to the zone 206 will begin.
  • Subsequent write commands are ‘zone append’ commands, where the data associated with the subsequent write command appends to the zone 206 at the location the write pointer 210 is indicating as the next starting point.
  • An ordered list of LBAs within the zone 206 may be stored for write ordering.
  • Each zone 206 may have its own write pointer 210 . Thus, when a write command is received, a zone is identified by its ZSLBA, and the write pointer 210 determines where the write of the data begins within the identified zone.
  • FIG. 2 B illustrates a state diagram 250 for the ZNS 202 of FIG. 2 A .
  • each zone may be in a different state, such as empty, active, full, or offline.
  • An empty zone switches to an open and active zone once a write is scheduled to the zone or if a zone open command is issued by the host.
  • Zone management (ZM) commands can be used to move a zone between zone open and zone closed states, which are both active states. If a zone is active, the zone comprises open blocks that may be written to, and the host may be provided a description of recommended time in the active state.
  • the controller may comprise the ZM.
  • written to includes programming user data on 0 or more word lines in an erase block, erasure, and/or partially filled word lines in an erase block when user data has not filled all of the available word lines.
  • the term “written to” may further include closing a zone due to internal drive handling needs (open block data retention concerns because the bits in error accumulate more quickly on open erase blocks), the storage device 200 closing a zone due to resource constraints, like too many open zones to track or discovered defect state, among others, or a host device closing the zone for concerns such as there being no more data to send the drive, computer shutdown, error handling on the host, limited host resources for tracking, among others.
  • the active zones may be either open or closed.
  • An open zone is an empty or partially full zone that is ready to be written to and has resources currently allocated.
  • the data received from the host device with a write command or zone append command may be programmed to an open erase block that is not currently filled with prior data.
  • New data pulled-in from the host device or valid data being relocated may be written to an open zone.
  • Valid data may be moved from one zone (e.g. the first zone 202 a ) to another zone (e.g. the third zone 202 c ) for garbage collection purposes.
  • a closed zone is an empty or partially full zone that is not currently receiving writes from the host in an ongoing basis. The movement of a zone from an open state to a closed state allows the controller 208 to reallocate resources to other tasks. These tasks may include, but are not limited to, other zones that are open, other conventional non-zone regions, or other controller needs.
  • ZCAP zone capacity
  • the ZM may reset a full zone, scheduling an erasure of the data stored in the zone such that the zone switches back to an empty zone.
  • a full zone When a full zone is reset, the zone may not be immediately cleared of data, though the zone may be marked as an empty zone ready to be written to. However, the reset zone must be erased prior to switching to an active zone.
  • a zone may be erased any time between a ZM reset and a ZM open.
  • An offline zone is a zone that is unavailable to write data to. An offline zone may be in the full state, the empty state, or in a partially full state without being active.
  • the storage device 200 may mark one or more erase blocks for erasure. When a new zone is going to be formed and the storage device 200 anticipates a ZM open, the one or more erase blocks marked for erasure may then be erased. The storage device 200 may further decide and create the physical backing of the zone upon erase of the erase blocks. Thus, once the new zone is opened and erase blocks are being selected to form the zone, the erase blocks will have been erased.
  • a new order for the LBAs and the write pointer 210 for the zone 206 may be selected, enabling the zone 206 to be tolerant to receive commands out of sequential order.
  • the write pointer 210 may optionally be turned off such that a command may be written to whatever starting LBA is indicated for the command.
  • the controller 208 may select an empty zone 206 to write the data associated with the command to, and the empty zone 206 switches to an active zone 206 .
  • the controller 208 initiating or pulling-in a write command comprises receiving a write command or direct memory access (DMA) reading the write command.
  • the write command may be a command to write new data, or a command to move valid data to another zone for garbage collection purposes.
  • the controller 208 is configured to DMA read or pull-in new commands from a submission queue populated by a host device.
  • the data is written to the zone 206 starting at the ZSLBA, as the write pointer 210 is indicating the logical block associated with the ZSLBA as the first available logical block.
  • the data may be written to one or more erase blocks or NAND locations that have been allocated for the physical location of the zone 206 .
  • the write pointer 210 is updated to point to the next available block in the zone 206 to track the next write starting point (i.e., the completion point of the first write).
  • the controller 208 may select an active zone to write the data to. In an active zone, the data is written to the logical block indicated by the write pointer 210 as the next available block.
  • a NAND location may be equal to a wordline.
  • the controller may optionally aggregate several write commands in another memory location such as DRAM or SRAM prior to programming a full wordline composed of multiple write commands. Write commands that are longer than a wordline will be able to program and fill a complete wordline with some of the data, and the excess data beyond a wordline will be used to fill the next wordline.
  • a NAND location is not limited to being equal to a wordline, and may have a larger or smaller size than a wordline.
  • a NAND location may be equal to the size of a page.
  • the controller 208 may receive, pull-in, or DMA read a first write command to a third zone 206 c , or a first zone append command.
  • the host identifies sequentially which logical block of the zone 206 to write the data associated with the first command to.
  • the data associated with the first command is then written to the first or next available LBA(s) in the third zone 206 c as indicated by the write pointer 210 , and the write pointer 210 is advanced or updated to point to the next available LBA available for a host write (i.e., WP> 0 ).
  • the controller 208 receives or pulls-in a second write command to the third zone 206 c , the data associated with the second write command is written to the next available LBA(s) in the third zone 206 c identified by the write pointer 210 .
  • FIG. 3 illustrates a NVMe set/endurance group in a ZNS system according to one embodiment.
  • a plurality of applications 302 a - 302 c connect to the data storage device firmware 304 through respective flash transition layers (FTLs) to access the memory devices 306 a - 306 c that each comprise a plurality of dies 308 .
  • FTLs flash transition layers
  • each application which could be different hosts of virtual hosts, has access to the same data storage device 310 and hence, memory devices 306 a - 306 c and dies 308 .
  • the zones for a ZNS device can be set up across the dies 308 or within any die 308 .
  • a zone open command occurs when the first write triggers the logic that allocates a new zone.
  • the zone write occurs when the start LBA (SLBA) is equal to the zone start LBA (ZSLBA).
  • SLBA start LBA
  • ZSLBA zone start LBA
  • a zone append/write command is where the SLBA does not equal ZSLBA such that the writes are added to an already opened zone.
  • backend units include physical blocks in the ZNS NVMe set/endurance group.
  • a ZNS device looks for zone append and zone read commands in a submission queue and prioritizes those commands to the backend without any conditions.
  • zone open commands the zones are created using backend units that are not currently in use by other zones. If those zones are created randomly or utilizing the same dies over time, then the dies do not receive an even workload and hence, the ZNS device has uneven workload, impacting QoS. It is desirable to create a zone where the workload in terms of die/FIM is least utilized so that each time a new zone is created, the zone utilizes the least utilized die/FIM so that the device has an even workload and even wear over time.
  • the disclosure involves dissimilar prioritization for writes into zones based on the zone offset.
  • the controller utilizes the non-attachment of the device resources until the zone start for writing. Subsequently, the zone mapping is updated for the new zone with allocated physical blocks as in a typical ZNS system.
  • the zone append and zone read commands are non-negotiable candidates in terms of a particular resource workload. That is, when a zone read is issued, the zone read can only be read from the particular die/FIM (physical block in that ZNS NVMe set/endurance group) where the data is written. Similarly, for a zone append, the zone can be appended to only that particular zone where the write was previously started (already open physical block is in use).
  • Zone open commands are cases where the ZNS hosts wants to write data from zero zone offset (zone create to start with) to the device, and the host does not care where those data physically reside, as long as the zone is part of the NVMe set/endurance group and that the zone to physical (Z2P) mapping is intact.
  • the ZNS controller can leverage the condition.
  • the host device not only instructs the data storage device to open a new zone, but also specifically instructs which zone to open.
  • the data storage device can recommend or suggest which zone to open. More specifically, the data storage device triggers an automatic zone selection process where the host device gives the data storage device a set of priorities and expected workloads and the data storage device in turn provides an empty zone map as to which future zones will be ideal rather than selecting one on its own. For example, in a high priority write request, the host device may choose an appropriate zone from a list of zones provided by the data storage device for corresponding use cases.
  • the ZNS device i.e., data storage device populates an empty zone map with appropriate ranking on least utilized, with zero write offsets (i.e., potential new zones) backend memory devices and/or memory device areas.
  • a program-erase parameter is used to determine which zone to open.
  • the program-erase parameter is used when there is a ‘tie’ in terms of workload history such that there are multiple zones that could be open that have the same workload history in terms of die, FIM, cache, and/or error correction engine workload. In so doing, the zone with the fewest program-erase cycles is chosen. Thus, wear leveling across the blocks is achieved.
  • the mapping for a zone open command is populated with those backend units (i.e., physical blocks in the ZNS NVMe set/endurance group) where the workload in terms of die/FIM is least utilized in sync with the provided flowchart.
  • the ZNS device selects the appropriate empty zone from the set of available zones in the zone map and issues a zone open to the device.
  • FIG. 4 illustrates a ZNS system 400 for efficient backend utilization according to one embodiment.
  • the ZNS data storage device 404 is coupled to a ZNS host device 402 .
  • the ZNS data storage device 404 includes an empty zone map creation module 406 , and the ZNS host device 402 comprises an empty zone map analyzing module 408 .
  • the empty zone map creation module 406 creates and maintains an empty zone map based on the write offsets and backend loads.
  • the empty zone map may include different maps for different configurations.
  • the ZNS data storage device 404 shares the maps 410 with the ZNS host device 402 where the empty zone map analyzing module 408 determines which zone from the maps would be best for the next zone open command.
  • the ZNS host device 402 then sends to the ZNS data storage device 404 a zone open command with wordline request in the appropriate backend based on the ZNS data storage device 404 hinted zone map 412 . It is to be understood that the ZNS host device 402 may not have an empty zone map analyzing module 408 and thus, simply instructs the ZNS data storage device 404 to open a specific zone from the zone map.
  • ZNS does not differentiate between die layouts, but rather, on the lines of higher and lower performance. If some zones were allocated to a specific die or die group, then the die layouts can be a factor in empty zone map. For example, multiple empty zone maps can be populated for different interleaving configurations, such as 1 DIL or 2 DIL (die interleaving), which itself is created for different performances. In some cases, zone maps can also be populated with zones made up only of worn-out old blocks collected from various dies, which allows the host device a holistic view of the data storage device to enable the host device to make better decisions for zone creation for variable purposes (i.e., input/output, garbage collection, or word lines).
  • zone maps can also be populated with zones made up only of worn-out old blocks collected from various dies, which allows the host device a holistic view of the data storage device to enable the host device to make better decisions for zone creation for variable purposes (i.e., input/output, garbage collection, or word lines).
  • the device may also incorporate program-erase cycles of different blocks in the backend logic to populate the empty zone map such that wear leveling of those zones is also honored. Such a logic minimizes unwanted data transfers from zone to zone at a later point.
  • the workload of the storage backend unit (i.e., die/FIM) in any ZNS NVMe set is associated with a credit point according to the read/write activity, and a moving window to decide on a zone open at any point in time.
  • Any known methods either firmware (FW) or HW may be employed by the data storage device to evaluate such workloads.
  • FW firmware
  • Counter based FW methods for different resources is one example.
  • wear leveling schemes may be accommodated during the process wherein if the controller determines that multiple backend units are eligible for zone open in terms of least workload, the unit offering the physical block with a lower program-erase count may be the winner for selection as the new zone to open. Over a long run, such a strategy will provide better sustained mixed load performance as well as better wear leveling.
  • the controller chooses a destination block in the backend unit of a NVMe set of a ZNS device which has the least workload for HW and lower program-erase block attached to that set. The controller does the selection after all the host issued ZNS appends and ZNS reads are prioritized.
  • a way to ensure wear leveling is processing a zone open command to open zones that have the least utilized memory devices.
  • the zone open command differs from the zone read command and the zone append command. Both the zone read command and the zone append command are commands that are performed on already opened zones.
  • Already opened zones have memory devices or areas of memory devices that are already allocated thereto.
  • New zones can be created from memory devices or areas of memory devices that are not currently in allocated to a zone. The devices or areas not currently allocated to a zone can be evaluated to determine which devices or areas have the least amount of workload history.
  • New zones can be created from the areas or devices that have the least amount of workload history. Number of program-erase cycles can be used as a tiebreaker for any areas or devices that have the same amount of workload history.
  • FIG. 5 is a flowchart 500 illustrating efficient backend utilization of a ZNS device.
  • the data storage device analyzes the backend of the data storage device 502 to determine workload of the memory devices, memory dies, and memory areas of the memory dies. It is to be understood that the memory areas include pages, planes, and blocks of the memory dies.
  • the empty zone map creation module creates one or more maps 504 based upon the available memory locations.
  • the maps contain potential zones that may be opened upon instruction from the host device.
  • the maps may include not only the potential zones, but also the particular configurations for which the potential zones may be ideal.
  • the potential zones are created based upon historical workload.
  • such configurations may include high performance levels and low performance levels.
  • the maps do not include all potential zones that can be created. As the maps are to hint or suggest to the host device the specific zone to open, in order to influence the host device's selection, the data storage device can include less than all potential zones in the maps.
  • the maps are delivered to the host device 506 .
  • the host device selects a particular zone to create 508 and instructs the data storage device 510 .
  • the data storage device then opens the zone 512 and writes data to the new zone 514 .
  • the data storage device then reanalyzes the backend and repeats the process.
  • the data storage device can hint or suggest to the host device which zone should be opened next.
  • memory device wear leveling can be achieved which improves memory device lifetime and quality of service (QoS).
  • a data storage device comprises: a memory device; and a controller coupled to the memory device, wherein the controller is configured to: create an empty zone map, wherein the empty zone map ranks zones within the empty zone map from least utilized to most utilized; send the empty zone map to a host device; and receive a zone open command from the host device.
  • the empty zone map comprises at least one zone that has die interleaving.
  • the empty zone map comprises zones made from worn-out blocks of the memory device.
  • the controller is further configured to receive a set of priorities and expected workloads from the host device.
  • the controller is further configured to create the empty zone map after receiving the set of priorities and expected workloads from the host device.
  • the empty zone map is additionally ranked based upon number of program-erase cycles for memory blocks within the memory device.
  • a data storage device comprises: a memory device; and a controller coupled to the memory device, wherein the controller is configured to: create a plurality of zones in a backend of the memory device; rank the zones in a list from least utilized area of the backend to most utilized area of the backend; forward at least a part of the list to a host device; and receive a zone open command from the host device.
  • the controller is configured to receive zone append commands, zone read commands, and the zone open command, wherein the commands are evaluated for priority of processing. The evaluation occurs at a flash translation layer (FTL).
  • the zone open command is a zone command with zero offset.
  • the controller is further configured to update a zone to logical block address table after opening a new zone.
  • the controller is further configured to route zone read commands and zone append commands to the backend prior to opening a new zone.
  • the part of the list comprises a zone from the least utilized area of the backend.
  • a data storage device comprises: memory means; a controller coupled to the memory means, wherein the controller is configured to: recommend a zone to be opened to a host device, wherein recommending a zone comprises providing a list of zones that can be opened to the host device, wherein the list comprises the recommended zone and other zones; and receive a zone open command from the host device to open the recommended zone.
  • the recommended zone has a lower workload history compared to the other zones. The lower workload history is based upon die workload, flash interface module (FIM) workload, cache workload, and/or parity check engine workload.
  • the recommended zone has a same workload history compared to at least one other zone of the other zones.
  • the recommended zone has undergone a fewer program-erase cycles compared to the at least one other zone.
  • the controller is further configured to maintain wear leveling across physical blocks of the memory means. The recommending occurs prior to the receiving.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The present disclosure generally relates to creating new zones in a data storage device in a manner that ensures substantially even workload of the memory device storage locations. The data storage device can guide a host device to select a particular zone to open in zone namespace (ZNS) systems where the host device selects which zone to open. The data storage device tracks the workload of the various storage locations and create zones. The data storage device then provides selected zones having the least used storage locations with the idea of guiding the host device to select the zone having the least used storage locations. Thus, rather than utilizing a randomly selected unopened zone, the host will select, based upon guidance from the data storage device, zones that contain the least utilized storage location. In so doing, generally even workload of the memory device storage locations is achieved.

Description

    BACKGROUND OF THE DISCLOSURE Field of the Disclosure
  • Embodiments of the present disclosure generally relate to efficient utilization of dies in a zone namespace (ZNS) device.
  • Description of the Related Art
  • In data storage devices, logical blocks are formed across physical blocks to increase the hardware (HW) workload efficiency during sequential performances. In ZNS devices, the logical blocks are not formed across an entire set of dies, but rather, a logical zone itself could be a single physical block or a few physical blocks, unlike full-fledged interleaving. Multiple such smaller logical blocks constitute a zoned namespace within the ZNS device. The zoned namespaces in a ZNS device is helpful for data segregation, but may impact performance if all dies/flash interface modules (FIMs) are not used in parallel.
  • The host device typically has substantial control over the data storage device. Typically, the host device selects the NVMe set/endurance group for zone creation (i.e., zone open and zone append). However, within a NVMe set/endurance group, the host device may not be aware of the workload on physical resources of the data storage device such as dies, flash channel(s), and other storage specific resources such as parity check engines and cache buffers.
  • Additionally, in a typical ZNS system, the data storage device may act upon the zone commands according to the submission queue, and may not be able to bias the zone commands according to the state of the data storage device itself. Biasing the zone commands according to the data storage device state may optimize resource utilization, thereby increasing the quality of service (QoS) of the system.
  • Thus, there is a need in the art to efficiently utilize the data storage device resources while maintaining the data segregation benefits of ZNS.
  • SUMMARY OF THE DISCLOSURE
  • The present disclosure generally relates to creating new zones in a data storage device in a manner that ensures substantially even workload of the memory device storage locations. The data storage device can guide a host device to select a particular zone to open in zone namespace (ZNS) systems where the host device selects which zone to open. The data storage device tracks the workload of the various storage locations and create zones. The data storage device then provides selected zones having the least used storage locations with the idea of guiding the host device to select the zone having the least used storage locations. Thus, rather than utilizing a randomly selected unopened zone, the host will select, based upon guidance from the data storage device, zones that contain the least utilized storage location. In so doing, generally even workload of the memory device storage locations is achieved.
  • In one embodiment, a data storage device comprises: a memory device; and a controller coupled to the memory device, wherein the controller is configured to: create an empty zone map, wherein the empty zone map ranks zones within the empty zone map from least utilized to most utilized; send the empty zone map to a host device; and receive a zone open command from the host device.
  • In another embodiment, a data storage device comprises: a memory device; and a controller coupled to the memory device, wherein the controller is configured to: create a plurality of zones in a backend of the memory device; rank the zones in a list from least utilized area of the backend to most utilized area of the backend; forward at least a part of the list to a host device; and receive a zone open command from the host device.
  • In another embodiment, a data storage device comprises: memory means; a controller coupled to the memory means, wherein the controller is configured to: recommend a zone to be opened to a host device, wherein recommending a zone comprises providing a list of zones that can be opened to the host device, wherein the list comprises the recommended zone and other zones; and receive a zone open command from the host device to open the recommended zone.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • So that the manner in which the above recited features of the present disclosure can be understood in detail, a more particular description of the disclosure, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this disclosure and are therefore not to be considered limiting of its scope, for the disclosure may admit to other equally effective embodiments.
  • FIG. 1 is a schematic block diagram illustrating a storage system, according to one embodiment.
  • FIG. 2A illustrates zoned namespaces (ZNSs) utilized in a storage device, according to one embodiment.
  • FIG. 2B illustrates a state diagram for the ZNSs of the storage device of FIG. 2A, according to one embodiment.
  • FIG. 3 illustrates a NVMe set/endurance group in a ZNS system according to one embodiment.
  • FIG. 4 illustrates a ZNS system for efficient backend utilization according to one embodiment.
  • FIG. 5 is a flowchart illustrating efficient backend utilization of a ZNS device.
  • To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements disclosed in one embodiment may be beneficially utilized on other embodiments without specific recitation.
  • DETAILED DESCRIPTION
  • In the following, reference is made to embodiments of the disclosure. However, it should be understood that the disclosure is not limited to specific described embodiments. Instead, any combination of the following features and elements, whether related to different embodiments or not, is contemplated to implement and practice the disclosure. Furthermore, although embodiments of the disclosure may achieve advantages over other possible solutions and/or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the disclosure. Thus, the following aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim(s). Likewise, reference to “the disclosure” shall not be construed as a generalization of any inventive subject matter disclosed herein and shall not be considered to be an element or limitation of the appended claims except where explicitly recited in a claim(s).
  • The present disclosure generally relates to creating new zones in a data storage device in a manner that ensures substantially even workload of the memory device storage locations. The data storage device can guide a host device to select a particular zone to open in zone namespace (ZNS) systems where the host device selects which zone to open. The data storage device tracks the workload of the various storage locations and create zones. The data storage device then provides selected zones having the least used storage locations with the idea of guiding the host device to select the zone having the least used storage locations. Thus, rather than utilizing a randomly selected unopened zone, the host will select, based upon guidance from the data storage device, zones that contain the least utilized storage location. In so doing, generally even workload of the memory device storage locations is achieved.
  • FIG. 1 is a schematic block diagram illustrating a storage system 100 in which data storage device 106 may function as a storage device for a host device 104, in accordance with one or more techniques of this disclosure. For instance, the host device 104 may utilize non-volatile memory 110 included in data storage device 106 to store and retrieve data. The host device 104 includes a host DRAM 138. In some examples, the storage system 100 may include a plurality of storage devices, such as the data storage device 106, which may operate as a storage array. For instance, the storage system 100 may include a plurality of data storages devices 106 configured as a redundant array of inexpensive/independent disks (RAID) that collectively function as a mass storage device for the host device 104.
  • The storage system 100 includes the host device 104 which may store and/or retrieve data to and/or from one or more storage devices, such as the data storage device 106. As illustrated in FIG. 1 , the host device 104 may communicate with the storage device 106 via an interface 114. The host device 104 may comprise any of a wide range of devices, including computer servers, network attached storage (NAS) units, desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart” phones, so-called “smart” pads, televisions, cameras, display devices, digital media players, video gaming consoles, video streaming device, and the like.
  • The data storage device 106 includes a controller 108, non-volatile memory 110 (NVM 110), a power supply 111, volatile memory 112, and an interface 114. The controller 108 comprises an internal memory 120 or buffer. In some examples, the data storage device 106 may include additional components not shown in FIG. 1 for sake of clarity. For example, the data storage device 106 may include a printed board (PB) to which components of the data storage device 106 are mechanically attached and which includes electrically conductive traces that electrically interconnect components of the data storage device 106, or the like. In some examples, the physical dimensions and connector configurations of the data storage device 106 may conform to one or more standard form factors. Some example standard form factors include, but are not limited to, 3.5″ data storage device (e.g., an HDD or SSD), 2.5″ data storage device, 1.8″ data storage device, peripheral component interconnect (PCI), PCI-extended (PCI-X), PCI Express (PCIe) (e.g., PCIe x1, x4, x8, x16, PCIe Mini Card, MiniPCI, etc.). In some examples, the data storage device 106 may be directly coupled (e.g., directly soldered) to a motherboard of the host device 104.
  • The interface 114 of the data storage device 106 may include one or both of a data bus for exchanging data with the host device 104 and a control bus for exchanging commands with the host device 104. The interface 114 may operate in accordance with any suitable protocol. For example, the interface 114 may operate in accordance with one or more of the following protocols: advanced technology attachment (ATA) (e.g., serial-ATA (SATA) and parallel-ATA (PATA)), Fibre Channel Protocol (FCP), small computer system interface (SCSI), serially attached SCSI (SAS), PCI, and PCIe, non-volatile memory express (NVMe), OpenCAPI, GenZ, Cache Coherent Interface Accelerator (CCIX), Open Channel SSD (OCSSD), or the like. The electrical connection of the interface 114 (e.g., the data bus, the control bus, or both) is electrically connected to the controller 108, providing electrical connection between the host device 104 and the controller 108, allowing data to be exchanged between the host device 104 and the controller 108. In some examples, the electrical connection of the interface 114 may also permit the data storage device 106 to receive power from the host device 104. For example, as illustrated in FIG. 1 , the power supply 111 may receive power from the host device 104 via the interface 114.
  • The data storage device 106 includes NVM 110, which may include a plurality of media units or memory devices. NVM 110 may be configured to store and/or retrieve data. For instance, a media unit of NVM 110 may receive data and a message from the controller 108 that instructs the memory device to store the data. Similarly, the media unit of NVM 110 may receive a message from the controller 108 that instructs the memory device to retrieve data. In some examples, each of the media units may be referred to as a die. In some examples, a single physical chip may include a plurality of dies (i.e., a plurality of memory devices). In some examples, each memory devices may be configured to store relatively large amounts of data (e.g., 128MB, 256MB, 512MB, 1GB, 2GB, 4GB, 8GB, 16GB, 32GB, 64GB, 128GB, 256GB, 512GB, 1TB, etc.).
  • In some examples, each media unit of NVM 110 may include any type of non-volatile memory devices, such as flash memory devices, phase-change memory (PCM) devices, resistive random-access memory (ReRAM) devices, magnetoresistive random-access memory (MRAM) devices, ferroelectric random-access memory (F-RAM), holographic memory devices, and any other type of non-volatile memory devices.
  • The NVM 110 may comprise a plurality of flash memory devices. Flash memory devices may include NAND or NOR based flash memory devices, and may store data based on a charge contained in a floating gate of a transistor for each flash memory cell. In NAND flash memory devices, the flash memory device may be divided into a plurality of blocks which may divided into a plurality of pages. Each block of the plurality of blocks within a particular memory device may include a plurality of NAND cells. Rows of NAND cells may be electrically connected using a word line to define a page of a plurality of pages. Respective cells in each of the plurality of pages may be electrically connected to respective bit lines. Furthermore, NAND flash memory devices may be 2D or 3D devices, and may be single level cell (SLC), multi-level cell (MLC), triple level cell (TLC), or quad level cell (QLC). The controller 108 may write data to and read data from NAND flash memory devices at the page level and erase data from NAND flash memory devices at the block level.
  • The data storage device 106 includes a power supply 111, which may provide power to one or more components of the data storage device 106. When operating in a standard mode, the power supply 111 may provide power to the one or more components using power provided by an external device, such as the host device 104. For instance, the power supply 111 may provide power to the one or more components using power received from the host device 104 via the interface 114. In some examples, the power supply 111 may include one or more power storage components configured to provide power to the one or more components when operating in a shutdown mode, such as where power ceases to be received from the external device. In this way, the power supply 111 may function as an onboard backup power source. Some examples of the one or more power storage components include, but are not limited to, capacitors, super capacitors, batteries, and the like. In some examples, the amount of power that may be stored by the one or more power storage components may be a function of the cost and/or the size (e.g., area/volume) of the one or more power storage components. In other words, as the amount of power stored by the one or more power storage components increases, the cost and/or the size of the one or more power storage components also increases.
  • The data storage device 106 also includes volatile memory 112, which may be used by controller 108 to store information. Volatile memory 112 may be comprised of one or more volatile memory devices. In some examples, the controller 108 may use volatile memory 112 as a cache. For instance, the controller 108 may store cached information in volatile memory 112 until cached information is written to non-volatile memory 110. As illustrated in FIG. 1 , volatile memory 112 may consume power received from the power supply 111. Examples of volatile memory 112 include, but are not limited to, random-access memory (RAM), dynamic random access memory (DRAM), static RAM (SRAM), and synchronous dynamic RAM (SDRAM (e.g., DDR1, DDR2, DDR3, DDR3L, LPDDR3, DDR4, LPDDR4, and the like)).
  • The data storage device 106 includes a controller 108, which may manage one or more operations of the data storage device 106. For instance, the controller 108 may manage the reading of data from and/or the writing of data to the NVM 110. In some embodiments, when the data storage device 106 receives a write command from the host device 104, the controller 108 may initiate a data storage command to store data to the NVM 110 and monitor the progress of the data storage command. The controller 108 may determine at least one operational characteristic of the storage system 100 and store the at least one operational characteristic to the NVM 110. In some embodiments, when the data storage device 106 receives a write command from the host device 104, the controller 108 temporarily stores the data associated with the write command in the internal memory 120 before sending the data to the NVM 110.
  • FIG. 2A illustrates a zoned namespace (ZNS) 202 view utilized in a storage device 200, according to one embodiment. The storage device 200 may present the ZNS 202 view to a host device. FIG. 2B illustrates a state diagram 250 for the ZNS 202 of the storage device 200, according to one embodiment. The storage device 200 may be the data storage device 106 of the storage system 100 of FIG. 1 . The storage device 200 may have one or more ZNS 202, and each ZNS 202 may be different sizes. The storage device 200 may further comprise one or more conventional namespaces in addition to the one or more ZNS 202. Moreover, the ZNS 202 may be a zoned block command (ZBC) for SAS and/or a zoned-device ATA command set (ZAC) for SATA.
  • In the storage device 200, the ZNS 202 is the quantity of NVM that can be formatted into logical blocks such that the capacity is divided into a plurality of zones 206 a-206 n (collectively referred to as zones 206). Each of the zones 206 comprise a plurality of physical or erase blocks (now shown) of a media unit or NVM 204, and each of the erase blocks are associated a plurality of logical blocks (not shown). When the controller 208 receives a command, such as from a host device (not shown) or the submission queue of a host device, the controller 208 can read data from and write data to the plurality of logical blocks associated with the plurality of erase blocks of the ZNS 202. Each of the logical blocks is associated with a unique LBA or sector.
  • In one embodiment, the NVM 204 is a NAND device. The NAND device comprises one or more dies. Each of the one or more dies comprises one or more planes. Each of the one or more planes comprises one or more erase blocks. Each of the one or more erase blocks comprises one or more wordlines (e.g., 256 wordlines). Each of the one or more wordlines may be addressed in one or more pages. For example, an MLC NAND die may use an upper page and a lower page to reach the two bits in each cell of the full wordline (e.g., 16 kB per page). Furthermore, each page can be accessed at a granularity equal to or smaller than the full page. A controller can frequently access NAND in user data granularity LBA sizes of 512 bytes. Thus, as referred to in the below description, NAND locations are equal to a granularity of 512 bytes. As such, an LBA size of 512 bytes and a page size of 16 kB for two pages of an MCL NAND results in about 16 NAND locations per wordline. However, the NAND location size is not intended to be limiting, and is merely used as an example.
  • When data is written to an erase block, one or more logical blocks are correspondingly updated within a zone 206 to track where the data is located within the NVM 204. Data may be written to one zone 206 at a time until a zone 206 is full, or to multiple zones 206 such that multiple zones 206 may be partially full. Similarly, when writing data to a particular zone 206, data may be written to the plurality of erase blocks one block at a time, in sequential order of NAND locations, page-by-page, or wordline-by-wordline, until moving to an adjacent block (i.e., write to a first erase block until the first erase block is full before moving to the second erase block), or to multiple blocks at once, in sequential order of NAND locations, page-by-page, or wordline-by-wordline, to partially fill each block in a more parallel fashion (i.e., writing the first NAND location or page of each erase block before writing to the second NAND location or page of each erase block).
  • Each of the zones 206 is associated with a zone starting logical block address (ZSLBA). The ZSLBA is the first available LBA in the zone 206. For example, the first zone 206 a is associated with ZaSLBA, the second zone 206 b is associated with ZbSLBA, the third zone 206 c is associated with ZcSLBA, the fourth zone 206 d is associated with ZdSLBA, and the nth zone 206 n (i.e., the last zone) is associated with ZnSLBA. Each zone 206 is identified by its ZSLBA, and is configured to receive sequential writes (i.e., writing data to the NVM 110 in the order the write commands are received).
  • As data is written to a zone 206, a write pointer 210 is advanced or updated to point to or to indicate the next available block in the zone 206 to write data to in order to track the next write starting point (i.e., the completion point of the prior write equals the starting point of a subsequent write). Thus, the write pointer 210 indicates where the subsequent write to the zone 206 will begin. Subsequent write commands are ‘zone append’ commands, where the data associated with the subsequent write command appends to the zone 206 at the location the write pointer 210 is indicating as the next starting point. An ordered list of LBAs within the zone 206 may be stored for write ordering. Each zone 206 may have its own write pointer 210. Thus, when a write command is received, a zone is identified by its ZSLBA, and the write pointer 210 determines where the write of the data begins within the identified zone.
  • FIG. 2B illustrates a state diagram 250 for the ZNS 202 of FIG. 2A. In the state diagram 250, each zone may be in a different state, such as empty, active, full, or offline. When a zone is empty, the zone is free of data (i.e., none of the erase blocks in the zone are currently storing data) and the write pointer is at the ZSLBA (i.e., WP=0). An empty zone switches to an open and active zone once a write is scheduled to the zone or if a zone open command is issued by the host. Zone management (ZM) commands can be used to move a zone between zone open and zone closed states, which are both active states. If a zone is active, the zone comprises open blocks that may be written to, and the host may be provided a description of recommended time in the active state. The controller may comprise the ZM.
  • The term “written to” includes programming user data on 0 or more word lines in an erase block, erasure, and/or partially filled word lines in an erase block when user data has not filled all of the available word lines. The term “written to” may further include closing a zone due to internal drive handling needs (open block data retention concerns because the bits in error accumulate more quickly on open erase blocks), the storage device 200 closing a zone due to resource constraints, like too many open zones to track or discovered defect state, among others, or a host device closing the zone for concerns such as there being no more data to send the drive, computer shutdown, error handling on the host, limited host resources for tracking, among others.
  • The active zones may be either open or closed. An open zone is an empty or partially full zone that is ready to be written to and has resources currently allocated. The data received from the host device with a write command or zone append command may be programmed to an open erase block that is not currently filled with prior data. New data pulled-in from the host device or valid data being relocated may be written to an open zone. Valid data may be moved from one zone (e.g. the first zone 202 a) to another zone (e.g. the third zone 202 c) for garbage collection purposes. A closed zone is an empty or partially full zone that is not currently receiving writes from the host in an ongoing basis. The movement of a zone from an open state to a closed state allows the controller 208 to reallocate resources to other tasks. These tasks may include, but are not limited to, other zones that are open, other conventional non-zone regions, or other controller needs.
  • In both the open and closed zones, the write pointer is pointing to a place in the zone somewhere between the ZSLBA and the end of the last LBA of the zone (i.e., WP>0). Active zones may switch between the open and closed states per designation by the ZM, or if a write is scheduled to the zone. Additionally, the ZM may reset an active zone to clear or erase the data stored in the zone such that the zone switches back to an empty zone. Once an active zone is full, the zone switches to the full state. A full zone is one that is completely filled with data, and has no more available blocks to write data to (i.e., WP=zone capacity (ZCAP)). Read commands of data stored in full zones may still be executed.
  • The ZM may reset a full zone, scheduling an erasure of the data stored in the zone such that the zone switches back to an empty zone. When a full zone is reset, the zone may not be immediately cleared of data, though the zone may be marked as an empty zone ready to be written to. However, the reset zone must be erased prior to switching to an active zone. A zone may be erased any time between a ZM reset and a ZM open. An offline zone is a zone that is unavailable to write data to. An offline zone may be in the full state, the empty state, or in a partially full state without being active.
  • Since resetting a zone clears or schedules an erasure of the data stored in the zone, the need for garbage collection of individual erase blocks is eliminated, improving the overall garbage collection process of the storage device 200. The storage device 200 may mark one or more erase blocks for erasure. When a new zone is going to be formed and the storage device 200 anticipates a ZM open, the one or more erase blocks marked for erasure may then be erased. The storage device 200 may further decide and create the physical backing of the zone upon erase of the erase blocks. Thus, once the new zone is opened and erase blocks are being selected to form the zone, the erase blocks will have been erased. Moreover, each time a zone is reset, a new order for the LBAs and the write pointer 210 for the zone 206 may be selected, enabling the zone 206 to be tolerant to receive commands out of sequential order. The write pointer 210 may optionally be turned off such that a command may be written to whatever starting LBA is indicated for the command.
  • Referring back to FIG. 2A, when the controller 208 initiates or pulls-in a write command, the controller 208 may select an empty zone 206 to write the data associated with the command to, and the empty zone 206 switches to an active zone 206. As used herein, the controller 208 initiating or pulling-in a write command comprises receiving a write command or direct memory access (DMA) reading the write command. The write command may be a command to write new data, or a command to move valid data to another zone for garbage collection purposes. The controller 208 is configured to DMA read or pull-in new commands from a submission queue populated by a host device.
  • In an empty zone 206 just switched to an active zone 206, the data is written to the zone 206 starting at the ZSLBA, as the write pointer 210 is indicating the logical block associated with the ZSLBA as the first available logical block. The data may be written to one or more erase blocks or NAND locations that have been allocated for the physical location of the zone 206. After the data associated with the write command has been written to the zone 206, the write pointer 210 is updated to point to the next available block in the zone 206 to track the next write starting point (i.e., the completion point of the first write). Alternatively, the controller 208 may select an active zone to write the data to. In an active zone, the data is written to the logical block indicated by the write pointer 210 as the next available block.
  • In some embodiments, a NAND location may be equal to a wordline. In such an embodiment, if the write command is smaller than a wordline, the controller may optionally aggregate several write commands in another memory location such as DRAM or SRAM prior to programming a full wordline composed of multiple write commands. Write commands that are longer than a wordline will be able to program and fill a complete wordline with some of the data, and the excess data beyond a wordline will be used to fill the next wordline. However, a NAND location is not limited to being equal to a wordline, and may have a larger or smaller size than a wordline. For example, in some embodiments, a NAND location may be equal to the size of a page.
  • For example, the controller 208 may receive, pull-in, or DMA read a first write command to a third zone 206 c, or a first zone append command. The host identifies sequentially which logical block of the zone 206 to write the data associated with the first command to. The data associated with the first command is then written to the first or next available LBA(s) in the third zone 206 c as indicated by the write pointer 210, and the write pointer 210 is advanced or updated to point to the next available LBA available for a host write (i.e., WP>0). If the controller 208 receives or pulls-in a second write command to the third zone 206 c, the data associated with the second write command is written to the next available LBA(s) in the third zone 206 c identified by the write pointer 210. Once the data associated with the second command is written to the third zone 206 c, the write pointer 210 once again advances or updates to point to the next available LBA available for a host write. Resetting the third zone 206 c moves the write pointer 210 back to the ZcSLBA (i.e., WP=0), and the third zone 406 c switches to an empty zone.
  • FIG. 3 illustrates a NVMe set/endurance group in a ZNS system according to one embodiment. As shown in FIG. 3 , a plurality of applications 302 a-302 c connect to the data storage device firmware 304 through respective flash transition layers (FTLs) to access the memory devices 306 a-306 c that each comprise a plurality of dies 308. In operation, each application, which could be different hosts of virtual hosts, has access to the same data storage device 310 and hence, memory devices 306 a-306 c and dies 308. The zones for a ZNS device can be set up across the dies 308 or within any die 308.
  • In ZNS, a zone open command occurs when the first write triggers the logic that allocates a new zone. The zone write occurs when the start LBA (SLBA) is equal to the zone start LBA (ZSLBA). Similarly, a zone append/write command is where the SLBA does not equal ZSLBA such that the writes are added to an already opened zone.
  • It is to be understood that backend units include physical blocks in the ZNS NVMe set/endurance group. In general, a ZNS device looks for zone append and zone read commands in a submission queue and prioritizes those commands to the backend without any conditions. However, for zone open commands, the zones are created using backend units that are not currently in use by other zones. If those zones are created randomly or utilizing the same dies over time, then the dies do not receive an even workload and hence, the ZNS device has uneven workload, impacting QoS. It is desirable to create a zone where the workload in terms of die/FIM is least utilized so that each time a new zone is created, the zone utilizes the least utilized die/FIM so that the device has an even workload and even wear over time. The intention is to make the zone open command use the leftover backend resources due to non-dependency on exact hardware (HW). In other words, the disclosure involves dissimilar prioritization for writes into zones based on the zone offset. The controller utilizes the non-attachment of the device resources until the zone start for writing. Subsequently, the zone mapping is updated for the new zone with allocated physical blocks as in a typical ZNS system.
  • The zone append and zone read commands are non-negotiable candidates in terms of a particular resource workload. That is, when a zone read is issued, the zone read can only be read from the particular die/FIM (physical block in that ZNS NVMe set/endurance group) where the data is written. Similarly, for a zone append, the zone can be appended to only that particular zone where the write was previously started (already open physical block is in use).
  • Zone open commands, on the other hand, are cases where the ZNS hosts wants to write data from zero zone offset (zone create to start with) to the device, and the host does not care where those data physically reside, as long as the zone is part of the NVMe set/endurance group and that the zone to physical (Z2P) mapping is intact. The ZNS controller can leverage the condition.
  • In certain situations, the host device not only instructs the data storage device to open a new zone, but also specifically instructs which zone to open. As discussed herein, the data storage device can recommend or suggest which zone to open. More specifically, the data storage device triggers an automatic zone selection process where the host device gives the data storage device a set of priorities and expected workloads and the data storage device in turn provides an empty zone map as to which future zones will be ideal rather than selecting one on its own. For example, in a high priority write request, the host device may choose an appropriate zone from a list of zones provided by the data storage device for corresponding use cases.
  • The ZNS device (i.e., data storage device) populates an empty zone map with appropriate ranking on least utilized, with zero write offsets (i.e., potential new zones) backend memory devices and/or memory device areas. In one embodiment, a program-erase parameter is used to determine which zone to open. The program-erase parameter is used when there is a ‘tie’ in terms of workload history such that there are multiple zones that could be open that have the same workload history in terms of die, FIM, cache, and/or error correction engine workload. In so doing, the zone with the fewest program-erase cycles is chosen. Thus, wear leveling across the blocks is achieved. The mapping for a zone open command is populated with those backend units (i.e., physical blocks in the ZNS NVMe set/endurance group) where the workload in terms of die/FIM is least utilized in sync with the provided flowchart. The ZNS device selects the appropriate empty zone from the set of available zones in the zone map and issues a zone open to the device.
  • FIG. 4 illustrates a ZNS system 400 for efficient backend utilization according to one embodiment. As shown in FIG. 4 , the ZNS data storage device 404 is coupled to a ZNS host device 402. The ZNS data storage device 404 includes an empty zone map creation module 406, and the ZNS host device 402 comprises an empty zone map analyzing module 408. The empty zone map creation module 406 creates and maintains an empty zone map based on the write offsets and backend loads. The empty zone map may include different maps for different configurations. The ZNS data storage device 404 shares the maps 410 with the ZNS host device 402 where the empty zone map analyzing module 408 determines which zone from the maps would be best for the next zone open command. The ZNS host device 402 then sends to the ZNS data storage device 404 a zone open command with wordline request in the appropriate backend based on the ZNS data storage device 404 hinted zone map 412. It is to be understood that the ZNS host device 402 may not have an empty zone map analyzing module 408 and thus, simply instructs the ZNS data storage device 404 to open a specific zone from the zone map.
  • ZNS, as a technology, does not differentiate between die layouts, but rather, on the lines of higher and lower performance. If some zones were allocated to a specific die or die group, then the die layouts can be a factor in empty zone map. For example, multiple empty zone maps can be populated for different interleaving configurations, such as 1 DIL or 2 DIL (die interleaving), which itself is created for different performances. In some cases, zone maps can also be populated with zones made up only of worn-out old blocks collected from various dies, which allows the host device a holistic view of the data storage device to enable the host device to make better decisions for zone creation for variable purposes (i.e., input/output, garbage collection, or word lines). Along with the write-offsets based mapping, the device may also incorporate program-erase cycles of different blocks in the backend logic to populate the empty zone map such that wear leveling of those zones is also honored. Such a logic minimizes unwanted data transfers from zone to zone at a later point.
  • The workload of the storage backend unit (i.e., die/FIM) in any ZNS NVMe set is associated with a credit point according to the read/write activity, and a moving window to decide on a zone open at any point in time. Any known methods, either firmware (FW) or HW may be employed by the data storage device to evaluate such workloads. Counter based FW methods for different resources is one example. Stated another way, wear leveling schemes may be accommodated during the process wherein if the controller determines that multiple backend units are eligible for zone open in terms of least workload, the unit offering the physical block with a lower program-erase count may be the winner for selection as the new zone to open. Over a long run, such a strategy will provide better sustained mixed load performance as well as better wear leveling. The same logic is applicable to read scrub operations, failure management, and other storage backend management techniques. The controller chooses a destination block in the backend unit of a NVMe set of a ZNS device which has the least workload for HW and lower program-erase block attached to that set. The controller does the selection after all the host issued ZNS appends and ZNS reads are prioritized.
  • As noted above, uneven wear in a data storage device can be an issue. Wear leveling ensures efficient use of the device, longer device lifetime, and good QoS. As discussed herein, for a ZNS device, a way to ensure wear leveling is processing a zone open command to open zones that have the least utilized memory devices. The zone open command differs from the zone read command and the zone append command. Both the zone read command and the zone append command are commands that are performed on already opened zones. Already opened zones have memory devices or areas of memory devices that are already allocated thereto. New zones, on the other hand, can be created from memory devices or areas of memory devices that are not currently in allocated to a zone. The devices or areas not currently allocated to a zone can be evaluated to determine which devices or areas have the least amount of workload history. New zones can be created from the areas or devices that have the least amount of workload history. Number of program-erase cycles can be used as a tiebreaker for any areas or devices that have the same amount of workload history.
  • FIG. 5 is a flowchart 500 illustrating efficient backend utilization of a ZNS device. Initially, the data storage device analyzes the backend of the data storage device 502 to determine workload of the memory devices, memory dies, and memory areas of the memory dies. It is to be understood that the memory areas include pages, planes, and blocks of the memory dies.
  • After analyzing the backend of the data storage device, the empty zone map creation module creates one or more maps 504 based upon the available memory locations. The maps contain potential zones that may be opened upon instruction from the host device. The maps may include not only the potential zones, but also the particular configurations for which the potential zones may be ideal. For the potential zones, the potential zones are created based upon historical workload. For the particular configurations, such configurations may include high performance levels and low performance levels. In one embodiment, the maps do not include all potential zones that can be created. As the maps are to hint or suggest to the host device the specific zone to open, in order to influence the host device's selection, the data storage device can include less than all potential zones in the maps.
  • Once the maps are created, the maps are delivered to the host device 506. The host device then selects a particular zone to create 508 and instructs the data storage device 510. The data storage device then opens the zone 512 and writes data to the new zone 514. The data storage device then reanalyzes the backend and repeats the process.
  • By taking into consideration the backend workload and creating potential zone creation maps, the data storage device can hint or suggest to the host device which zone should be opened next. By hinting or suggesting to the host device, and having the host device following the suggestion or hint, memory device wear leveling can be achieved which improves memory device lifetime and quality of service (QoS).
  • In one embodiment, a data storage device comprises: a memory device; and a controller coupled to the memory device, wherein the controller is configured to: create an empty zone map, wherein the empty zone map ranks zones within the empty zone map from least utilized to most utilized; send the empty zone map to a host device; and receive a zone open command from the host device. The empty zone map comprises at least one zone that has die interleaving. The empty zone map comprises zones made from worn-out blocks of the memory device. The controller is further configured to receive a set of priorities and expected workloads from the host device. The controller is further configured to create the empty zone map after receiving the set of priorities and expected workloads from the host device. The empty zone map is additionally ranked based upon number of program-erase cycles for memory blocks within the memory device.
  • In another embodiment, a data storage device comprises: a memory device; and a controller coupled to the memory device, wherein the controller is configured to: create a plurality of zones in a backend of the memory device; rank the zones in a list from least utilized area of the backend to most utilized area of the backend; forward at least a part of the list to a host device; and receive a zone open command from the host device. The controller is configured to receive zone append commands, zone read commands, and the zone open command, wherein the commands are evaluated for priority of processing. The evaluation occurs at a flash translation layer (FTL). The zone open command is a zone command with zero offset. The controller is further configured to update a zone to logical block address table after opening a new zone. The controller is further configured to route zone read commands and zone append commands to the backend prior to opening a new zone. The part of the list comprises a zone from the least utilized area of the backend.
  • In another embodiment, a data storage device comprises: memory means; a controller coupled to the memory means, wherein the controller is configured to: recommend a zone to be opened to a host device, wherein recommending a zone comprises providing a list of zones that can be opened to the host device, wherein the list comprises the recommended zone and other zones; and receive a zone open command from the host device to open the recommended zone. The recommended zone has a lower workload history compared to the other zones. The lower workload history is based upon die workload, flash interface module (FIM) workload, cache workload, and/or parity check engine workload. The recommended zone has a same workload history compared to at least one other zone of the other zones. The recommended zone has undergone a fewer program-erase cycles compared to the at least one other zone. The controller is further configured to maintain wear leveling across physical blocks of the memory means. The recommending occurs prior to the receiving.
  • While the foregoing is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (20)

1. A data storage device, comprising:
a memory device; and
a controller coupled to the memory device, wherein the controller is configured to:
create an empty zone map, wherein the empty zone map ranks zones within
the empty zone map from least utilized to most utilized;
send the empty zone map to a host device;
receive a zone open command from the host device; and
receive a set of priorities and expected workloads from a host device.
2. The data storage device of claim 1, wherein the empty zone map comprises at least one zone that has die interleaving.
3. The data storage device of claim 1, wherein the empty zone map comprises zones made from worn-out blocks of the memory device.
4. The data storage device of claim 1, wherein the controller is further configured to exclude one or more potential zones that can be created in the zone map.
5. The data storage device of claim 1, wherein the controller is further configured to create the empty zone map after receiving the set of priorities and expected workloads from the host device.
6. The data storage device of claim 1, wherein the empty zone map is additionally ranked based upon number of program-erase cycles for memory blocks within the memory device.
7. A data storage device, comprising:
a memory device; and
a controller coupled to the memory device, wherein the controller is configured to:
create a plurality of zones in a backend of the memory device;
rank the zones in a list from least utilized area of the backend to most
utilized area of the backend;
forward at least a part of the list to a host device;
receive a zone open command from the host device; and
receive a set of priorities and expected workloads from the host device.
8. A data storage device, comprising:
a memory device; and
a controller coupled to the memory device, wherein the controller is configured to:
create a plurality of zones in a backend of the memory device;
rank the zones in a list from least utilized area of the backend to most
utilized area of the backend;
forward at least a part of the list to a host device;
receive a zone open command from the host device; and
receive zone append commands, zone read commands, and the zone open
command, wherein the commands are evaluated for priority of processing.
9. The data storage device of claim 8, wherein the evaluation occurs at a flash translation layer (FTL).
10. The data storage device of claim 7, wherein the zone open command is a zone command with zero offset.
11. The data storage device of claim 8, wherein the controller is further configured to update a zone to logical block address table after opening a new zone.
12. The data storage device of claim 7, wherein the controller is further configured to route zone read commands and zone append commands to the backend prior to opening a new zone.
13. The data storage device of claim 7, wherein the part of the list comprises a zone from the least utilized area of the backend.
14. A data storage device, comprising:
memory means;
a controller coupled to the memory means, wherein the controller is configured to:
recommend a zone to be opened to a host device, wherein recommending
a zone comprises providing a list of zones that can be opened to the host device,
wherein the list comprises the recommended zone and other zones;
receive a zone open command from the host device to open the
recommended zone; and
receive a set of priorities and expected workloads from the host device.
15. The data storage device of claim 14, wherein the recommended zone has a lower workload history compared to the other zones.
16. The data storage device of claim 15, wherein the lower workload history is based upon die workload, flash interface module (FIM) workload, cache workload, and/or parity check engine workload.
17. The data storage device of claim 14, wherein the recommended zone has a same workload history compared to at least one other zone of the other zones.
18. The data storage device of claim 17, wherein the recommended zone has undergone a fewer program-erase cycles compared to the at least one other zone.
19. The data storage device of claim 14, wherein the controller is further configured to maintain wear leveling across physical blocks of the memory means.
20. The data storage device of claim 14, wherein the recommending occurs prior to the receiving.
US17/338,487 2021-06-03 2021-06-03 Dissimilar write prioritization in ZNS devices Active US11537303B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/338,487 US11537303B1 (en) 2021-06-03 2021-06-03 Dissimilar write prioritization in ZNS devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US17/338,487 US11537303B1 (en) 2021-06-03 2021-06-03 Dissimilar write prioritization in ZNS devices

Publications (2)

Publication Number Publication Date
US20220391115A1 true US20220391115A1 (en) 2022-12-08
US11537303B1 US11537303B1 (en) 2022-12-27

Family

ID=84284101

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/338,487 Active US11537303B1 (en) 2021-06-03 2021-06-03 Dissimilar write prioritization in ZNS devices

Country Status (1)

Country Link
US (1) US11537303B1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230342028A1 (en) * 2020-09-29 2023-10-26 Microsoft Technology Licensing, Llc Zone hints for zoned namespace storage devices
US20240069722A1 (en) * 2022-08-31 2024-02-29 Nvidia Corporation Dynamically assigning namespace type to memory devices
US12131042B2 (en) * 2022-06-29 2024-10-29 SK Hynix Inc. Memory system for managing namespace using write pointer and write count, memory controller, and method for operating memory system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170220264A1 (en) * 2016-02-01 2017-08-03 Seagate Technology Llc Zone forward drive management
US10379742B2 (en) * 2015-12-28 2019-08-13 Netapp, Inc. Storage zone set membership
US20200241804A1 (en) * 2019-01-29 2020-07-30 EMC IP Holding Company LLC Affinity Sensitive Data Convolution for Data Storage Systems
US20210255803A1 (en) * 2020-02-14 2021-08-19 Kioxia Corporation Memory system and method of controlling nonvolatile memory
US20210318820A1 (en) * 2020-04-09 2021-10-14 SK Hynix Inc. Data storage device and operating method thereof
US20220035561A1 (en) * 2020-07-28 2022-02-03 SK Hynix Inc. Data storage device and method of operating the same
US20220137858A1 (en) * 2020-10-30 2022-05-05 SK Hynix Inc. Memory system and method of operating memory controller included therein

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11113006B2 (en) 2019-05-06 2021-09-07 Micron Technology, Inc. Dynamic data placement for collision avoidance among concurrent write streams
CN111124305B (en) 2019-12-20 2021-08-31 浪潮电子信息产业股份有限公司 Solid state disk wear leveling method and device and computer readable storage medium
CN111240601B (en) 2020-01-19 2022-07-22 苏州浪潮智能科技有限公司 Method, device, equipment and storage medium for determining superblock of partitioned space
CN111694515B (en) 2020-05-23 2023-01-10 苏州浪潮智能科技有限公司 Zone writing distribution method and system based on ZNS solid state disk
US20200393974A1 (en) 2020-08-27 2020-12-17 Intel Corporation Method of detecting read hotness and degree of randomness in solid-state drives (ssds)

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10379742B2 (en) * 2015-12-28 2019-08-13 Netapp, Inc. Storage zone set membership
US20170220264A1 (en) * 2016-02-01 2017-08-03 Seagate Technology Llc Zone forward drive management
US20200241804A1 (en) * 2019-01-29 2020-07-30 EMC IP Holding Company LLC Affinity Sensitive Data Convolution for Data Storage Systems
US20210255803A1 (en) * 2020-02-14 2021-08-19 Kioxia Corporation Memory system and method of controlling nonvolatile memory
US20210318820A1 (en) * 2020-04-09 2021-10-14 SK Hynix Inc. Data storage device and operating method thereof
US20220035561A1 (en) * 2020-07-28 2022-02-03 SK Hynix Inc. Data storage device and method of operating the same
US20220137858A1 (en) * 2020-10-30 2022-05-05 SK Hynix Inc. Memory system and method of operating memory controller included therein

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230342028A1 (en) * 2020-09-29 2023-10-26 Microsoft Technology Licensing, Llc Zone hints for zoned namespace storage devices
US12073079B2 (en) * 2020-09-29 2024-08-27 Microsoft Technology Licensing, Llc Zone hints for zoned namespace storage devices
US12131042B2 (en) * 2022-06-29 2024-10-29 SK Hynix Inc. Memory system for managing namespace using write pointer and write count, memory controller, and method for operating memory system
US20240069722A1 (en) * 2022-08-31 2024-02-29 Nvidia Corporation Dynamically assigning namespace type to memory devices

Also Published As

Publication number Publication date
US11537303B1 (en) 2022-12-27

Similar Documents

Publication Publication Date Title
US11126378B1 (en) Rate limit on the transitions of zones to open
US11416161B2 (en) Zone formation for zoned namespaces
US11294827B2 (en) Non-sequential zoned namespaces
US11520660B2 (en) Storage devices hiding parity swapping behavior
US11537305B1 (en) Dissimilar write prioritization in ZNS devices
US11500727B2 (en) ZNS parity swapping to DRAM
US11194521B1 (en) Rate limit on the transitions of streams to open
US11537303B1 (en) Dissimilar write prioritization in ZNS devices
US20240220155A1 (en) Solution for Super Device Imbalance in ZNS SSD
CN114730290A (en) Moving change log tables to align with partitions
US11520523B2 (en) Data integrity protection of ZNS needs
WO2023027782A1 (en) Purposeful super device imbalance for zns ssd efficiency
US11537293B2 (en) Wear leveling methods for zoned namespace solid state drive
US11853565B2 (en) Support higher number of active zones in ZNS SSD
US20230075329A1 (en) Super Block Allocation Across Super Device In ZNS SSD
WO2023101719A1 (en) Full die recovery in zns ssd
US11409459B2 (en) Data parking for SSDs with zones
US11640254B2 (en) Controlled imbalance in super block allocation in ZNS SSD

Legal Events

Date Code Title Description
AS Assignment

Owner name: WESTERN DIGITAL TECHNOLOGIES, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MUTHIAH, RAMANATHAN;BALAKRISHNAN, RAKESH;PETER, ELDHOSE;AND OTHERS;REEL/FRAME:056434/0631

Effective date: 20210601

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., AS AGENT, ILLINOIS

Free format text: SECURITY INTEREST;ASSIGNOR:WESTERN DIGITAL TECHNOLOGIES, INC.;REEL/FRAME:057651/0296

Effective date: 20210907

AS Assignment

Owner name: WESTERN DIGITAL TECHNOLOGIES, INC., CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST AT REEL 057651 FRAME 0296;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:058981/0958

Effective date: 20220203

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., ILLINOIS

Free format text: PATENT COLLATERAL AGREEMENT - A&R LOAN AGREEMENT;ASSIGNOR:WESTERN DIGITAL TECHNOLOGIES, INC.;REEL/FRAME:064715/0001

Effective date: 20230818

Owner name: JPMORGAN CHASE BANK, N.A., ILLINOIS

Free format text: PATENT COLLATERAL AGREEMENT - DDTL LOAN AGREEMENT;ASSIGNOR:WESTERN DIGITAL TECHNOLOGIES, INC.;REEL/FRAME:067045/0156

Effective date: 20230818

AS Assignment

Owner name: SANDISK TECHNOLOGIES, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:WESTERN DIGITAL TECHNOLOGIES, INC.;REEL/FRAME:067567/0682

Effective date: 20240503

AS Assignment

Owner name: SANDISK TECHNOLOGIES, INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:SANDISK TECHNOLOGIES, INC.;REEL/FRAME:067982/0032

Effective date: 20240621

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., AS THE AGENT, ILLINOIS

Free format text: PATENT COLLATERAL AGREEMENT;ASSIGNOR:SANDISK TECHNOLOGIES, INC.;REEL/FRAME:068762/0494

Effective date: 20240820