US20200264978A1 - Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones - Google Patents

Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones Download PDF

Info

Publication number
US20200264978A1
US20200264978A1 US16/277,708 US201916277708A US2020264978A1 US 20200264978 A1 US20200264978 A1 US 20200264978A1 US 201916277708 A US201916277708 A US 201916277708A US 2020264978 A1 US2020264978 A1 US 2020264978A1
Authority
US
United States
Prior art keywords
storage drive
data
storage
requested data
drive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US16/277,708
Other versions
US10970212B2 (en
Inventor
Shu Li
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to US16/277,708 priority Critical patent/US10970212B2/en
Assigned to ALIBABA GROUP HOLDING LIMITED reassignment ALIBABA GROUP HOLDING LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LI, SHU
Publication of US20200264978A1 publication Critical patent/US20200264978A1/en
Application granted granted Critical
Publication of US10970212B2 publication Critical patent/US10970212B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0617Improving the reliability of storage systems in relation to availability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0635Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/154Networked environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/21Employing a record carrier using a specific recording technology
    • G06F2212/217Hybrid disk, e.g. using both magnetic and solid state storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/26Using a specific storage system architecture
    • G06F2212/261Storage comprising a plurality of storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/28Using a specific disk cache architecture
    • G06F2212/283Plural cache memories
    • G06F2212/284Plural cache memories being distributed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/31Providing disk cache in a specific location of a storage system
    • G06F2212/311In host system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/31Providing disk cache in a specific location of a storage system
    • G06F2212/313In storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7208Multiple device management, e.g. distributing data over multiple flash devices

Definitions

  • This disclosure is generally related to the field of data storage. More specifically, this disclosure is related to a method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones.
  • a hyperscale storage system which facilitates achieving a massive scale in computing, e.g., for big data or cloud computing.
  • a hyperscale infrastructure must ensure both high availability and data reliability for the corresponding massive scale in computing.
  • One way to ensure the high availability and data reliability in a hyperscale infrastructure is to use multiple available zones, which are constructed to synchronize data and provide service in a consistent manner. Each available zone may include multiple storage clusters, and each storage cluster may be deployed with a distributed file system which maintains multiple replicas of given data.
  • One embodiment facilitates data placement in a storage device.
  • the system receives, from a host, a request to read data.
  • the system determines that the data is not available in a read cache.
  • the system issues the read request to a first storage drive and a second storage drive of a different type than the first storage drive.
  • the system sends the requested data to the host.
  • the system issues the read request to a third storage drive; and the system sends the requested data to the host.
  • the system identifies, based on previously stored path information, the first storage drive, the second storage drive, and the third storage drive.
  • the system selects, from a plurality of storage drives on which the data is stored, the second storage drive.
  • the system selects, from the plurality of storage drives, the third storage drive.
  • the system in response to successfully reading the requested data from the first storage drive, the system sends the requested data to the host, and drops data read from the second storage drive.
  • the system in response to unsuccessfully reading the requested data from the first storage drive, the system reports a fault associated with the solid state drive. In response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive, the system reports a fault associated with the second storage drive.
  • the third storage drive is of a same or a different type as the second storage drive, and a type for the first storage drive, the second storage drive, and the third storage drive comprises one or more of: a solid state drive; a hard disk drive; and a storage medium which comprises one or more of: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
  • MRAM magnetoresistive random-access memory
  • ReRAM resistive RAM
  • PCM phase change memory
  • NRAM nano-RAM
  • FRAM ferroelectric RAM
  • the system in response to unsuccessfully reading the requested data from the third storage drive, the system reports a fault associated with the third storage drive, and the system generates a notification indicating that the requested data is not available from a first available zone comprising the first storage drive, the second storage drive, and the third storage drive, wherein the notification further indicates to recover the requested data from a second available zone.
  • the first storage drive, the second storage drive, and the third storage drive comprise a first available zone of a plurality of available zones, and replicas of the requested data are stored in a respective available zone.
  • the system prior to receiving the request to read the data, receives a request to write the data to the first storage drive, the second storage drive, and the third storage drive, which involves: simultaneously writing the data to a write cache of each of the first storage drive, the second storage drive, and the third storage drive; and committing the write request upon successfully writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive.
  • the system subsequent to writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive, the system writes the data asynchronously from the write cache to a non-volatile memory of each of the first storage drive, the second storage drive, and the third storage drive.
  • FIG. 1 illustrates an exemplary environment which demonstrates data placement in a distributed storage system with multiple available zones, in accordance with the prior art.
  • FIG. 2 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • FIG. 3 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, including exemplary write operations, in accordance with an embodiment of the present application.
  • FIG. 4 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, including exemplary read operations, in accordance with an embodiment of the present application.
  • FIG. 5 illustrates an exemplary hierarchy for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • FIG. 6A presents a flowchart illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • FIG. 6B presents a flowchart illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • FIG. 7 illustrates an exemplary computer system that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • FIG. 8 illustrates an exemplary apparatus that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • the embodiments described herein solve the problem of increasing the efficiency and performance of a distributed storage system by providing a hierarchy of access layers, which can ensure the high availability of service and data while reducing the total cost of ownership.
  • a distributed storage system is a hyperscale storage system, which facilitates achieving a massive scale in computing, e.g., for big data or cloud computing.
  • a hyperscale infrastructure must ensure both high availability and data reliability for the corresponding massive scale in computing.
  • One way to ensure the high availability and data reliability in a hyperscale infrastructure is to use multiple available zones, which are constructed to synchronize data and provide service in a consistent manner.
  • Each available zone may include multiple storage clusters, and each storage cluster may be deployed with a distributed file system which maintains multiple replicas of given data. For example, in a hyperscale infrastructure with three available zones, where each available zone has three replicas, the same data is stored nine times.
  • the embodiments described herein address these challenges by providing a system which increases the efficiency and performance of a distributed storage system by providing a hierarchy of access layers.
  • the system can use “low-cost SSDs” and “low-cost HDDs” in a layered hierarchy to store the multiple replicas of data required by a given application.
  • An example of a low-cost SSD is a quad-level cell (QLC) SSD, which includes multi-level cell (MLC) memory elements which can store four bits of information, in contrast with a single-level cell (SLC) memory element which can only store a single bit of information.
  • QLC quad-level cell
  • MLC multi-level cell
  • SLC single-level cell
  • An MLC-based SSD can refer to a multi-level cell (MLC) which stores two bits, while a triple-level cell (TLC) stores three bits and a quad-level cell (QLC) stores four bits.
  • MLC multi-level cell
  • TLC triple-level cell
  • QLC quad-level cell
  • an SLC-based SSD has features which include a high capacity and a high endurance (e.g., can endure a large number of program/erase cycles), and are currently the most expensive SSDs.
  • MLC ⁇ TLC ⁇ QLC As the number of bits per cell increases (e.g., MLC ⁇ TLC ⁇ QLC), so decreases the cost of the associated SSD as well as the endurance.
  • the QLC SSD is an example of a low-cost SSD.
  • An example of a low-cost HDD is a shingled magnetic recording (SMR) drive, which writes data sequentially to overlapping or “shingled” tracks.
  • SMR HDD generally has a higher read latency than the read latency of a conventional magnetic recording (CMR) drive, but the SMR HDD is generally a lower cost alternative, and can be useful for sequential workloads where large amounts of data can be written sequentially, followed by random reads for processing and archive retrieval (e.g., video surveillance, object storage, and cloud services).
  • CMR magnetic recording
  • an SMR drive is an example of a low-cost HDD.
  • the system can construct available zones (or storage clusters) which include low-cost SSDs and low-cost HDDs.
  • an available zone can include one low-cost SSD and two low-cost HDDs, as described below in relation to FIGS. 3 and 4 .
  • the system can ensure the high availability of service and data without a loss in performance (e.g., in access latency and throughput), and can also provide a reduction of the total cost of ownership (TCO).
  • TCO total cost of ownership
  • Each available zone or storage cluster can include multiple storage nodes, and each storage node can include multiple low-cost SSDs and multiple low-cost HDDs.
  • each storage node can deploy a write cache.
  • an available zone which can include, heterogeneous storage devices, e.g., one low-cost SSD and two low-cost HDDs on different storage nodes
  • each write cache associated with a specific storage node can be written simultaneously, and can thus provide execution of a low-latency write, followed by a commit to the host.
  • the data in the write cache can be written asynchronously from the write cache to the non-volatile memory of the respective low-cost SSD or low-cost HDD.
  • Data is thus written in a layered, hierarchical manner, which can result in an improved distributed storage system that can support a growing hyperscale infrastructure.
  • An exemplary write operation is described below in relation to FIG. 3
  • an exemplary read operation is described below in relation to FIG. 4 .
  • the embodiments described herein provide a distributed storage system which can ensure both high availability and data reliability to meet the increasing needs of current applications.
  • a “storage drive” refers to a device or a drive with a non-volatile memory which can provide persistent storage of data, e.g., a solid state drive (SSD) or a hard disk drive (HDD).
  • SSD solid state drive
  • HDD hard disk drive
  • a “storage server” or a “storage node” refers to a computing device which can include multiple storage drives.
  • a distributed storage system can include multiple storage servers or storage nodes.
  • a “compute node” refers to a computing device which can perform as a client device or a host device.
  • a distributed storage system can include multiple compute nodes.
  • a “storage cluster” or an “available zone” is a grouping of storage servers, storage nodes, or storage drives in a distributed storage system.
  • a “low-cost SSD” refers to an SSD which has a lower cost compared to currently available SSDs, and may have a lower endurance than other currently available SSDS.
  • An example of a low-cost SSD is a QLC SSD.
  • a “low-cost HDD” refers to a HDD which has a lower cost compared to currently available HDDs, and may have a higher read or access latency than other currently available HDDs.
  • An example of a low-cost HDD is an SMR HDD. While the embodiments and Figures described herein refer to low-cost SSDs and low-cost HDDs, in some embodiments, the storage drives depicted as low-cost SSDs and low-cost HDDs can include other types of storage drives, including but not limited to: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
  • MRAM magnetoresistive random-access memory
  • ReRAM resistive RAM
  • PCM phase change memory
  • NRAM nano-RAM
  • FRAM ferroelectric RAM
  • FIG. 1 illustrates an exemplary environment 100 which demonstrates data placement in a distributed storage system with multiple available zones, in accordance with the prior art.
  • Environment 100 includes three available zones (AZ): an available zone 1 102 ; an available zone 2 104 ; and an available zone 3 106 .
  • the distributed storage system can use the multiple available zones to synchronize data and provide service in a consistent manner, by storing multiple replicas of given data on each available zone.
  • each available zone can indicate a storage cluster, and each cluster can be deployed with a distributed file system which maintains three replicas on each cluster.
  • available zone 3 106 can include a compute cluster 112 and a storage cluster 114 .
  • Storage cluster 114 can include three storage drives, and each storage drive can include a copy of given data.
  • a storage drive 122 can include a data_copy 1 124 ;
  • a storage drive 126 can include a data_copy 2 128 ; and
  • a storage drive 130 can include a data_copy 3 132 .
  • the distributed storage system with three storage clusters (e.g., AZs) depicted in environment 100 stores the data nine separate times.
  • each storage drive can be a standard solid state drive (SSD). These standard SSDs are expensive, which can lead to a high overall total cost of ownership (TCO).
  • SSD solid state drive
  • FIG. 2 illustrates an exemplary environment 200 which facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • Environment 200 can be a distributed storage system which includes compute nodes and storage nodes, which communicate with each other over a data center network 202 (e.g., via communications 204 and 206 ).
  • Each compute node can include a read cache, and each storage node can include a write cache.
  • a compute node 212 can include a read cache 214
  • a compute node 216 can include a read cache 218
  • a compute node 220 can include a read cache 222 .
  • a storage node 232 can include a write cache 234 .
  • storage nodes 242 , 252 , and 262 can include, respectively, write caches 244 , 254 , and 264 .
  • each storage node can include multiple storage drives, including a low-cost SSD and a low-cost HDD. This is in contrast to conventional storage nodes which include only the conventional high-cost SSDs, as described above in relation to FIG. 1 .
  • storage node 232 can include a low-cost SSD 236 and a low-cost HDD 238 .
  • storage nodes 242 , 252 , and 262 can include, respectively, low-cost SSDs ( 246 , 256 , and 266 ) as well as low-cost HDDs ( 248 , 258 , and 268 ).
  • Each storage node can maintain the low-cost SSD to ensure the shorter (e.g., faster) read latency as compared to an HDD. While each storage node in environment 200 is depicted as including only one low-cost SSD and one low-cost HDD, each storage node can include any number of low-cost SSDs and low-cost HDDs. Furthermore, the exemplary environments of FIGS. 3 and 4 depict one embodiment of the present application, in which each storage node, storage cluster, or available zone includes one low-cost SSD and two low-cost HDDs.
  • the system can use the read cache of a compute node to decrease the average latency of a read operation, and can also use the write cache of a storage node to decrease the average latency of a write operation.
  • Exemplary environments for facilitating communications via communications 204 and 206 are described below in relation to FIG. 3 (write operation) and FIG. 4 (read operation).
  • FIG. 3 illustrates an exemplary environment 300 which facilitates data placement in a distributed storage system with multiple available zones, including exemplary write operations, in accordance with an embodiment of the present application.
  • Environment 300 can include a storage node, storage cluster, or available zone which includes multiple drives, such as a low-cost SSD 236 , a low-cost HDD 248 , and a low-cost HDD 268 .
  • these multiple drives can be part of a same storage cluster or available zone, but can reside on different storage nodes/servers.
  • these multiple drives ( 236 , 248 , and 268 ) can reside on the same storage node/server.
  • the multiple drives can communicate with compute nodes or other client devices via a data center network 302 .
  • the system can receive a user write operation 312 via a communication 304 .
  • the system can write data corresponding to user write operation 312 to multiple write caches simultaneously (e.g., via a communication 306 ).
  • the system can: write data 372 to write cache 1 234 associated with low-cost SSD 236 ; write data 382 to write cache 2 244 associated with low-cost HDD 248 ; and write data 392 to write cache 3 264 associated with low-cost HDD 268 .
  • the system can commit the current write operation.
  • the system can send an acknowledgement to the host confirming the successful write (e.g., via acknowledgments 374 , 384 , and 394 ).
  • the system can perform an asynchronous write operation by writing the data stored in the write cache to the non-volatile memory of the storage drive.
  • the system can perform an asynchronous write 342 to write the data from write cache 1 234 to the non-volatile memory of low-cost SSD 236 .
  • the system can perform an asynchronous write 352 to write the data from write cache 2 244 to the non-volatile memory of low-cost HDD 248 .
  • the system can perform an asynchronous write 362 to write the data from write cache 3 264 to the non-volatile memory of low-cost HDD 268 .
  • the asynchronous write can be part of a background operation, and can remain not visible to a front-end user. That is, the asynchronous write operation can be performed without affecting the front-end user.
  • an asynchronous write e.g., asynchronous write 342
  • a respective write cache e.g., data 372 to write cache 1 234
  • acknowledgment 374 e.g., acknowledgment 374
  • FIG. 4 illustrates an exemplary environment 400 which facilitates data placement in a distributed storage system with multiple available zones, including exemplary read operations, in accordance with an embodiment of the present application.
  • environment 400 can include an available zone which includes low-cost SSD 236 , low-cost HDD 248 , and low-cost HDD 268 , along with their respective write caches (e.g., 234 , 244 , and 264 ).
  • These drives can communicate with compute nodes or other client devices via a data center network 402 (via communications 426 and 430 ).
  • the system can receive a user read operation 412 .
  • the system can initially check the read cache of the corresponding compute node (or another compute node, depending on the configuration of the compute nodes). For example, the system can check, via a communication 422 , whether read cache 414 stores the requested data. If it does, the system can return the requested data via a communication 424 . If it does not, the system can return a message, via communication 424 , indicating that the requested data is not stored in the read cache.
  • the system can then issue the read request to both a low-cost SSD and a low-cost HDD of an available zone.
  • the system can pick the low-cost HDD randomly. For example, the system can identify low-cost SSD 236 and low-cost HDD 268 of an available zone.
  • the system can issue the read request to both low-cost SSD 236 (via an operation 432 ) and low-cost HDD 268 (via an operation 436 ). If the requested data can be obtained from (i.e., is stored on) low-cost SSD 236 , the system can read the requested data from low-cost SSD 236 , and return the requested data to the host (via a communication 434 ).
  • the system can drop the data obtained, if any, from low-cost HDD 268 in response to the read request (via communication 436 ), and can also report a fault associated with low-cost SSD 236 .
  • the system can read the requested data from low-cost HDD 268 , and return the requested data to the host (via a communication 438 ). If the requested data cannot be obtained from either low-cost SSD 236 or low-cost HDD 268 , the system can report a fault associated with low-cost HDD 268 , and can also identify another low-cost HDD on which the data is stored (e.g., low-cost HDD 248 , which is part of the same available zone as low-cost SSD 236 and low-cost HDD 268 ). The system can then issue the read request to low-cost HDD 248 . If the requested data can be obtained from (i.e., is stored on) low-cost HDD 248 , the system can read the requested data from low-cost HDD 248 , and return the requested data to the host (via a communication, not shown).
  • the system can report a fault associated with low-cost HDD 248 , and can also generate a message or notification indicating that the requested data is not available from the available zone comprising low-cost SSD, low-cost HDD 248 , and low-cost HDD 268 .
  • the notification can further indicate to recover the requested data from another available zone.
  • the read request is always issued to both the low-cost SSD and the (randomly selected) low-cost HDD.
  • This hierarchy allows the system to provide control over the long-tail latency, such that if the read operation from the low-cost SSD encounters an error, the system can proceed with the simultaneously issued read operation from the first low-cost HDD.
  • the second low-cost HDD provides an additional backup layer.
  • the embodiments of the system described herein can provide a more efficient and distributed storage system and a reduced total cost of ownership for multiple available zones.
  • FIG. 5 illustrates an exemplary hierarchy 500 for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • Hierarchy 500 can include a read cache layer 502 , a write cache layer 504 , a normal read layer (from low-cost SSD) 506 , and a backup read layer (from low-cost HDD) 508 .
  • Read cache layer 502 can include features such as a large capacity, local access, and a short read latency (e.g., read cache 214 of FIGS. 2 and 4 ).
  • Write cache layer 504 can include features such as a small capacity, global access, storage of multiple replicas, a short write latency, and a high endurance (e.g., write cache 234 of FIGS. 2 and 3 ).
  • Normal read layer (from low-cost SSD) 506 can include features such as a large capacity, global access, storage for multiple replicas, a short read latency, and a low endurance (e.g., via communications 432 / 434 with low-cost SSD 236 of FIG. 4 ).
  • Backup read layer (from low-cost HDD) 508 can include features such as a large capacity, global access, storage for multiple replicas, a long read latency, and a high endurance (e.g., via communications 436 / 438 with low-cost HDD 268 of FIG. 4 ).
  • the low-cost high-capacity read cache can improve the overall performance of a distributed storage system by reducing the average read latency.
  • the write cache which has a small capacity and a low latency, can lead to an improvement in the overall performance of the distributed storage system while maintaining a limited cost. For example, assuming that the dollars per Gigabyte ($/GB) comparison between a conventional high-cost SSD and an HDD is 10 to 1, by storing two of the three copies of data on a low-cost HDD rather than on a conventional high-cost SSD, the system can provide an improved cost ratio of (10+10+10) to (10+1+1), which is 30 to 12, or 5 to 2, which is significantly better than the conventional 10 to 1 ratio, thus resulting in a significant cost savings and overall efficiency of the distributed storage system.
  • FIG. 6A presents a flowchart 600 illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • the system receives, from a host, a request to read data (operation 602 ).
  • the system determines whether the data is available in a read cache (operation 604 ). If the system determines that the data is available in the read cache (i.e., a read cache hit) (decision 606 ), the operation continues at operation 640 of FIG. 6B .
  • the system determines paths to replicas of the requested data stored on at least a solid state drive (SSD), a first hard disk drive (HDD1), and a second hard disk drive (HDD2) (operation 608 ).
  • the system can store replicas on one or more available zones, and a first available zone can include the SSD, HDD1, and HDD2.
  • the system can select HDD1 randomly from a plurality of HDDs in the first available zone.
  • the system issues the read request to the solid state drive (SSD) and the first hard disk drive (HDD1) (operation 610 ).
  • the system reads the data from the solid state drive and the first hard disk drive (operation 612 ). If the system successfully reads the requested data from the solid state drive (decision 614 ), the operation continues at operation 640 of FIG. 6B . If the system unsuccessfully reads the requested data from the solid state drive (decision 614 ), the system reports a fault associated with the solid state drive (operation 616 ), and the operation continues at Label A of FIG. 6B .
  • FIG. 6B presents a flowchart 620 illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • the operation continues at operation 640 . If the system unsuccessfully reads the requested data from the first hard disk drive (decision 622 ), the system reports a fault associated with the first hard disk drive (operation 624 ). The system issues the read request to and reads the data from the second hard disk drive (operation 626 ).
  • the operation continues at operation 640 of FIG. 6B .
  • the system sends the requested data to the host (operation 640 ), and the operation returns.
  • the system If the system unsuccessfully reads the requested data from the second hard disk drive (decision 628 ), the system reports a fault associated with the second hard disk drive (operation 630 ).
  • the system generates a notification that the requested data is not available from a first storage cluster (or a first available zone) comprising the SSD, HDD1, and HDD2, wherein the notification indicates to recover the requested data from a second storage cluster or a second available zone (operation 632 ), and the operation returns.
  • FIG. 7 illustrates an exemplary computer system 700 that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • Computer system 700 includes a processor 702 , a controller 704 , a volatile memory 706 , and a storage device 708 .
  • Volatile memory 706 can include, e.g., random access memory (RAM), that serves as a managed memory, and can be used to store one or more memory pools.
  • Storage device 708 can include persistent storage which can be managed or accessed via controller 704 .
  • computer system 700 can be coupled to peripheral input/output user devices 710 , such as a display device 711 , a keyboard 712 , and a pointing device 714 .
  • Storage device 708 can store an operating system 716 , a content-processing system 718 , and data 732 .
  • Content-processing system 718 can include instructions, which when executed by computer system 700 , can cause computer system 700 to perform methods and/or processes described in this disclosure. Specifically, content-processing system 718 can include instructions for receiving and transmitting data packets, including data to be read or written, a read request, and a write request (communication module 720 ).
  • Content-processing system 718 can further include instructions for receiving, from a host, a request to read data (communication module 720 ).
  • Content-processing system 718 can include instructions for determining that the data is not available in a read cache (read cache-managing module 722 ).
  • Content-processing system 718 can include instructions for issuing the read request to a solid state drive and a first hard disk drive (request-issuing module 724 ).
  • Content-processing system 718 can include instructions for, in response to unsuccessfully reading the requested data from the solid state drive (SSD-managing module 726 ) and successfully reading the requested data from the first hard disk drive (HDD-managing module 728 ), sending the requested data to the host (communication module 720 ).
  • Content-processing system 718 can include instructions for, in response to unsuccessfully reading the requested data from both the solid state drive and the first hard disk drive (SSD-managing module 726 and HDD-managing module 728 ): issuing the read request to a second hard disk drive (request-issuing module 724 ); and sending the requested data to the host (communication module 720 ).
  • Content-processing system 718 can include instructions for identifying, based on previously stored path information, the solid state drive, the first hard disk drive, and the second hard disk drive (zone-managing module 730 ). Content-processing system 718 can include instructions for selecting, from a plurality of hard disk drives on which the data is stored, the first hard drive (zone-managing module 730 ). Content-processing system 718 can include instructions for selecting, from the plurality of hard disk drives, the second hard disk drive (zone-managing module 730 ).
  • Data 732 can include any data that is required as input or that is generated as output by the methods and/or processes described in this disclosure. Specifically, data 732 can store at least: data; a request; a read request; a write request; data associated with a read cache or a write cache; path information for drives in a storage cluster or available zone; an identification or indicator of an available zone, a solid state drive, a hard disk drive, or other storage device; an indicator of a fault; a message; a notification; a replica; a copy of data; and an indicator of multiple available zones.
  • FIG. 8 illustrates an exemplary apparatus 800 that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • Apparatus 800 can comprise a plurality of units or apparatuses which may communicate with one another via a wired, wireless, quantum light, or electrical communication channel.
  • Apparatus 800 may be realized using one or more integrated circuits, and may include fewer or more units or apparatuses than those shown in FIG. 8 .
  • apparatus 800 may be integrated in a computer system, or realized as a separate device which is capable of communicating with other computer systems and/or devices.
  • apparatus 800 can comprise units 802 - 812 which perform functions or operations similar to modules 720 - 730 of computer system 700 of FIG.
  • a communication unit 802 including: a communication unit 802 ; a read cache-managing unit 804 ; a request-issuing unit 806 ; an SSD-managing unit 808 ; an HDD-managing unit 810 ; and a zone-managing unit 812 .
  • the data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system.
  • the computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed.
  • the methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above.
  • a computer system reads and executes the code and/or data stored on the computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the computer-readable storage medium.
  • the methods and processes described above can be included in hardware modules.
  • the hardware modules can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs), and other programmable-logic devices now known or later developed.
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate arrays
  • the hardware modules When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One embodiment facilitates data placement in a storage device. During operation, the system receives, from a host, a request to read data. The system determines that the data is not available in a read cache. The system issues the read request to a solid state drive and a first hard disk drive. In response to unsuccessfully reading the requested data from the solid state drive and successfully reading the requested data from the first hard disk drive, the system sends the requested data to the host. In response to unsuccessfully reading the requested data from both the solid state drive and the first hard disk drive: the system issues the read request to a second hard disk drive; and the system sends the requested data to the host.

Description

    BACKGROUND Field
  • This disclosure is generally related to the field of data storage. More specifically, this disclosure is related to a method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones.
  • Related Art
  • The proliferation of the Internet and e-commerce continues to create a vast amount of digital content. Various distributed storage systems have been created to access and store such digital content. One example of a distributed storage system is a hyperscale storage system, which facilitates achieving a massive scale in computing, e.g., for big data or cloud computing. A hyperscale infrastructure must ensure both high availability and data reliability for the corresponding massive scale in computing. One way to ensure the high availability and data reliability in a hyperscale infrastructure is to use multiple available zones, which are constructed to synchronize data and provide service in a consistent manner. Each available zone may include multiple storage clusters, and each storage cluster may be deployed with a distributed file system which maintains multiple replicas of given data. For example, in a hyperscale infrastructure with three available zones, where each available zone has three replicas, the same data is stored nine times. As current applications continue to pursue and require fast access, all nine of the replicas may be stored on high-speed solid state drives (SSDs). However, these high-speed SSDs can be expensive. Meeting the needs of current applications in this way can thus result in a high cost for the overall storage system. As the hyperscale infrastructure scales out and grows, the ability to provide an efficient system which can both scale out and perform at a reasonable pace becomes critical. Furthermore, the total cost of ownership (TCO) can become a critical factor.
  • SUMMARY
  • One embodiment facilitates data placement in a storage device. During operation, the system receives, from a host, a request to read data. The system determines that the data is not available in a read cache. The system issues the read request to a first storage drive and a second storage drive of a different type than the first storage drive. In response to unsuccessfully reading the requested data from the first storage drive and successfully reading the requested data from the second storage drive, the system sends the requested data to the host. In response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive: the system issues the read request to a third storage drive; and the system sends the requested data to the host.
  • In some embodiments, the system identifies, based on previously stored path information, the first storage drive, the second storage drive, and the third storage drive. The system selects, from a plurality of storage drives on which the data is stored, the second storage drive. The system selects, from the plurality of storage drives, the third storage drive.
  • In some embodiments, in response to successfully reading the requested data from the first storage drive, the system sends the requested data to the host, and drops data read from the second storage drive.
  • In some embodiments, in response to unsuccessfully reading the requested data from the first storage drive, the system reports a fault associated with the solid state drive. In response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive, the system reports a fault associated with the second storage drive.
  • In some embodiments, the third storage drive is of a same or a different type as the second storage drive, and a type for the first storage drive, the second storage drive, and the third storage drive comprises one or more of: a solid state drive; a hard disk drive; and a storage medium which comprises one or more of: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
  • In some embodiments, in response to unsuccessfully reading the requested data from the third storage drive, the system reports a fault associated with the third storage drive, and the system generates a notification indicating that the requested data is not available from a first available zone comprising the first storage drive, the second storage drive, and the third storage drive, wherein the notification further indicates to recover the requested data from a second available zone.
  • In some embodiments, the first storage drive, the second storage drive, and the third storage drive comprise a first available zone of a plurality of available zones, and replicas of the requested data are stored in a respective available zone.
  • In some embodiments, prior to receiving the request to read the data, the system receives a request to write the data to the first storage drive, the second storage drive, and the third storage drive, which involves: simultaneously writing the data to a write cache of each of the first storage drive, the second storage drive, and the third storage drive; and committing the write request upon successfully writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive.
  • In some embodiments, subsequent to writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive, the system writes the data asynchronously from the write cache to a non-volatile memory of each of the first storage drive, the second storage drive, and the third storage drive.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 illustrates an exemplary environment which demonstrates data placement in a distributed storage system with multiple available zones, in accordance with the prior art.
  • FIG. 2 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • FIG. 3 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, including exemplary write operations, in accordance with an embodiment of the present application.
  • FIG. 4 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, including exemplary read operations, in accordance with an embodiment of the present application.
  • FIG. 5 illustrates an exemplary hierarchy for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • FIG. 6A presents a flowchart illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • FIG. 6B presents a flowchart illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • FIG. 7 illustrates an exemplary computer system that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • FIG. 8 illustrates an exemplary apparatus that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
  • In the figures, like reference numerals refer to the same figure elements.
  • DETAILED DESCRIPTION
  • The following description is presented to enable any person skilled in the art to make and use the embodiments, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the embodiments described herein are not limited to the embodiments shown, but are to be accorded the widest scope consistent with the principles and features disclosed herein.
  • Overview
  • The embodiments described herein solve the problem of increasing the efficiency and performance of a distributed storage system by providing a hierarchy of access layers, which can ensure the high availability of service and data while reducing the total cost of ownership.
  • As described above, one example of a distributed storage system is a hyperscale storage system, which facilitates achieving a massive scale in computing, e.g., for big data or cloud computing. A hyperscale infrastructure must ensure both high availability and data reliability for the corresponding massive scale in computing. One way to ensure the high availability and data reliability in a hyperscale infrastructure is to use multiple available zones, which are constructed to synchronize data and provide service in a consistent manner. Each available zone may include multiple storage clusters, and each storage cluster may be deployed with a distributed file system which maintains multiple replicas of given data. For example, in a hyperscale infrastructure with three available zones, where each available zone has three replicas, the same data is stored nine times. As current applications continue to pursue and require fast access, all nine of the replicas may be stored on high-speed solid state drives (SSDs). However, these high-speed SSDs can be expensive. Meeting the needs of current applications in this way can thus result in a high cost for the overall storage system. As the hyperscale infrastructure scales out and grows, the ability to provide an efficient system which can both scale out and perform at a reasonable pace becomes critical. Furthermore, the total cost of ownership (TCO) can become a critical factor.
  • The embodiments described herein address these challenges by providing a system which increases the efficiency and performance of a distributed storage system by providing a hierarchy of access layers. The system can use “low-cost SSDs” and “low-cost HDDs” in a layered hierarchy to store the multiple replicas of data required by a given application. An example of a low-cost SSD is a quad-level cell (QLC) SSD, which includes multi-level cell (MLC) memory elements which can store four bits of information, in contrast with a single-level cell (SLC) memory element which can only store a single bit of information. An MLC-based SSD can refer to a multi-level cell (MLC) which stores two bits, while a triple-level cell (TLC) stores three bits and a quad-level cell (QLC) stores four bits. In general, an SLC-based SSD has features which include a high capacity and a high endurance (e.g., can endure a large number of program/erase cycles), and are currently the most expensive SSDs. As the number of bits per cell increases (e.g., MLC→TLC→QLC), so decreases the cost of the associated SSD as well as the endurance. Hence, the QLC SSD is an example of a low-cost SSD.
  • An example of a low-cost HDD is a shingled magnetic recording (SMR) drive, which writes data sequentially to overlapping or “shingled” tracks. An SMR HDD generally has a higher read latency than the read latency of a conventional magnetic recording (CMR) drive, but the SMR HDD is generally a lower cost alternative, and can be useful for sequential workloads where large amounts of data can be written sequentially, followed by random reads for processing and archive retrieval (e.g., video surveillance, object storage, and cloud services). Hence, an SMR drive is an example of a low-cost HDD.
  • In the embodiments described herein, the system can construct available zones (or storage clusters) which include low-cost SSDs and low-cost HDDs. In one embodiment, an available zone can include one low-cost SSD and two low-cost HDDs, as described below in relation to FIGS. 3 and 4. The system can ensure the high availability of service and data without a loss in performance (e.g., in access latency and throughput), and can also provide a reduction of the total cost of ownership (TCO).
  • Each available zone or storage cluster can include multiple storage nodes, and each storage node can include multiple low-cost SSDs and multiple low-cost HDDs. In addition, each storage node can deploy a write cache. When a replica of a given data is to be written to an available zone (which can include, heterogeneous storage devices, e.g., one low-cost SSD and two low-cost HDDs on different storage nodes), each write cache associated with a specific storage node can be written simultaneously, and can thus provide execution of a low-latency write, followed by a commit to the host. Subsequently, the data in the write cache can be written asynchronously from the write cache to the non-volatile memory of the respective low-cost SSD or low-cost HDD. Data is thus written in a layered, hierarchical manner, which can result in an improved distributed storage system that can support a growing hyperscale infrastructure. An exemplary write operation is described below in relation to FIG. 3, and an exemplary read operation is described below in relation to FIG. 4.
  • Thus, by constructing multiple available zones and by using the hierarchical layers of access, the embodiments described herein provide a distributed storage system which can ensure both high availability and data reliability to meet the increasing needs of current applications.
  • A “storage drive” refers to a device or a drive with a non-volatile memory which can provide persistent storage of data, e.g., a solid state drive (SSD) or a hard disk drive (HDD).
  • A “storage server” or a “storage node” refers to a computing device which can include multiple storage drives. A distributed storage system can include multiple storage servers or storage nodes.
  • A “compute node” refers to a computing device which can perform as a client device or a host device. A distributed storage system can include multiple compute nodes.
  • A “storage cluster” or an “available zone” is a grouping of storage servers, storage nodes, or storage drives in a distributed storage system.
  • A “low-cost SSD” refers to an SSD which has a lower cost compared to currently available SSDs, and may have a lower endurance than other currently available SSDS. An example of a low-cost SSD is a QLC SSD.
  • A “low-cost HDD” refers to a HDD which has a lower cost compared to currently available HDDs, and may have a higher read or access latency than other currently available HDDs. An example of a low-cost HDD is an SMR HDD. While the embodiments and Figures described herein refer to low-cost SSDs and low-cost HDDs, in some embodiments, the storage drives depicted as low-cost SSDs and low-cost HDDs can include other types of storage drives, including but not limited to: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
  • Inefficiency of Data Placement in a Distributed Storage System with Multiple Available Zones in the Prior Art
  • FIG. 1 illustrates an exemplary environment 100 which demonstrates data placement in a distributed storage system with multiple available zones, in accordance with the prior art. Environment 100 includes three available zones (AZ): an available zone 1 102; an available zone 2 104; and an available zone 3 106. The distributed storage system can use the multiple available zones to synchronize data and provide service in a consistent manner, by storing multiple replicas of given data on each available zone. In environment 100, each available zone can indicate a storage cluster, and each cluster can be deployed with a distributed file system which maintains three replicas on each cluster.
  • For example, available zone 3 106 can include a compute cluster 112 and a storage cluster 114. Storage cluster 114 can include three storage drives, and each storage drive can include a copy of given data. A storage drive 122 can include a data_copy1 124; a storage drive 126 can include a data_copy2 128; and a storage drive 130 can include a data_copy3 132. Thus, the distributed storage system with three storage clusters (e.g., AZs) depicted in environment 100 stores the data nine separate times. As current applications require fast access to stored data, each storage drive can be a standard solid state drive (SSD). These standard SSDs are expensive, which can lead to a high overall total cost of ownership (TCO).
  • Exemplary Environment and Architecture for Facilitating Data Placement in a Distributed Storage System with Multiple Zones
  • FIG. 2 illustrates an exemplary environment 200 which facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. Environment 200 can be a distributed storage system which includes compute nodes and storage nodes, which communicate with each other over a data center network 202 (e.g., via communications 204 and 206). Each compute node can include a read cache, and each storage node can include a write cache. For example, a compute node 212 can include a read cache 214, a compute node 216 can include a read cache 218, and a compute node 220 can include a read cache 222. A storage node 232 can include a write cache 234. Similarly, storage nodes 242, 252, and 262 can include, respectively, write caches 244, 254, and 264.
  • In addition to a write cache, each storage node can include multiple storage drives, including a low-cost SSD and a low-cost HDD. This is in contrast to conventional storage nodes which include only the conventional high-cost SSDs, as described above in relation to FIG. 1. For example, storage node 232 can include a low-cost SSD 236 and a low-cost HDD 238. Similarly, storage nodes 242, 252, and 262 can include, respectively, low-cost SSDs (246, 256, and 266) as well as low-cost HDDs (248, 258, and 268).
  • Each storage node can maintain the low-cost SSD to ensure the shorter (e.g., faster) read latency as compared to an HDD. While each storage node in environment 200 is depicted as including only one low-cost SSD and one low-cost HDD, each storage node can include any number of low-cost SSDs and low-cost HDDs. Furthermore, the exemplary environments of FIGS. 3 and 4 depict one embodiment of the present application, in which each storage node, storage cluster, or available zone includes one low-cost SSD and two low-cost HDDs.
  • The system can use the read cache of a compute node to decrease the average latency of a read operation, and can also use the write cache of a storage node to decrease the average latency of a write operation. Exemplary environments for facilitating communications via communications 204 and 206 are described below in relation to FIG. 3 (write operation) and FIG. 4 (read operation).
  • Exemplary Environment for Facilitating Write Operations
  • FIG. 3 illustrates an exemplary environment 300 which facilitates data placement in a distributed storage system with multiple available zones, including exemplary write operations, in accordance with an embodiment of the present application. Environment 300 can include a storage node, storage cluster, or available zone which includes multiple drives, such as a low-cost SSD 236, a low-cost HDD 248, and a low-cost HDD 268. Note that these multiple drives can be part of a same storage cluster or available zone, but can reside on different storage nodes/servers. In some embodiments, these multiple drives (236, 248, and 268) can reside on the same storage node/server. The multiple drives can communicate with compute nodes or other client devices via a data center network 302.
  • During operation, the system can receive a user write operation 312 via a communication 304. The system can write data corresponding to user write operation 312 to multiple write caches simultaneously (e.g., via a communication 306). For example, at the same time or in a parallel manner, the system can: write data 372 to write cache 1 234 associated with low-cost SSD 236; write data 382 to write cache 2 244 associated with low-cost HDD 248; and write data 392 to write cache 3 264 associated with low-cost HDD 268. Once the data (372, 382, and 392) is successfully written to the respective write cache, the system can commit the current write operation. That is, the system can send an acknowledgement to the host confirming the successful write (e.g., via acknowledgments 374, 384, and 394). At a subsequent time, the system can perform an asynchronous write operation by writing the data stored in the write cache to the non-volatile memory of the storage drive.
  • For example, after data 372 is successfully stored in write cache 1 234, the system can perform an asynchronous write 342 to write the data from write cache 1 234 to the non-volatile memory of low-cost SSD 236. Similarly, after data 382 is successfully stored in write cache 2 244, the system can perform an asynchronous write 352 to write the data from write cache 2 244 to the non-volatile memory of low-cost HDD 248. Additionally, after data 392 is successfully stored in write cache 3 264, the system can perform an asynchronous write 362 to write the data from write cache 3 264 to the non-volatile memory of low-cost HDD 268.
  • The asynchronous write can be part of a background operation, and can remain not visible to a front-end user. That is, the asynchronous write operation can be performed without affecting the front-end user. Furthermore, an asynchronous write (e.g., asynchronous write 342) can be performed subsequent to the successful write of data to a respective write cache (e.g., data 372 to write cache 1 234), or upon sending the acknowledgment to the host (e.g., acknowledgment 374). Thus, the system can efficiently serve the front-end user with a low latency for the write operation, via the low-capacity, low-latency write cache.
  • Exemplary Environment for Facilitating Read Operations
  • FIG. 4 illustrates an exemplary environment 400 which facilitates data placement in a distributed storage system with multiple available zones, including exemplary read operations, in accordance with an embodiment of the present application. Similar to environment 300 of FIG. 3, environment 400 can include an available zone which includes low-cost SSD 236, low-cost HDD 248, and low-cost HDD 268, along with their respective write caches (e.g., 234, 244, and 264). These drives can communicate with compute nodes or other client devices via a data center network 402 (via communications 426 and 430).
  • During operation, the system can receive a user read operation 412. The system can initially check the read cache of the corresponding compute node (or another compute node, depending on the configuration of the compute nodes). For example, the system can check, via a communication 422, whether read cache 414 stores the requested data. If it does, the system can return the requested data via a communication 424. If it does not, the system can return a message, via communication 424, indicating that the requested data is not stored in the read cache.
  • The system can then issue the read request to both a low-cost SSD and a low-cost HDD of an available zone. The system can pick the low-cost HDD randomly. For example, the system can identify low-cost SSD 236 and low-cost HDD 268 of an available zone. The system can issue the read request to both low-cost SSD 236 (via an operation 432) and low-cost HDD 268 (via an operation 436). If the requested data can be obtained from (i.e., is stored on) low-cost SSD 236, the system can read the requested data from low-cost SSD 236, and return the requested data to the host (via a communication 434). The system can drop the data obtained, if any, from low-cost HDD 268 in response to the read request (via communication 436), and can also report a fault associated with low-cost SSD 236.
  • If the requested data cannot be obtained from (i.e., is not stored on) low-cost SSD 236, the system can read the requested data from low-cost HDD 268, and return the requested data to the host (via a communication 438). If the requested data cannot be obtained from either low-cost SSD 236 or low-cost HDD 268, the system can report a fault associated with low-cost HDD 268, and can also identify another low-cost HDD on which the data is stored (e.g., low-cost HDD 248, which is part of the same available zone as low-cost SSD 236 and low-cost HDD 268). The system can then issue the read request to low-cost HDD 248. If the requested data can be obtained from (i.e., is stored on) low-cost HDD 248, the system can read the requested data from low-cost HDD 248, and return the requested data to the host (via a communication, not shown).
  • If the requested data cannot be obtained from (i.e., is not stored on) low-cost HDD 248, the system can report a fault associated with low-cost HDD 248, and can also generate a message or notification indicating that the requested data is not available from the available zone comprising low-cost SSD, low-cost HDD 248, and low-cost HDD 268. The notification can further indicate to recover the requested data from another available zone.
  • Note that in environment 400, upon a read cache miss, the read request is always issued to both the low-cost SSD and the (randomly selected) low-cost HDD. This hierarchy allows the system to provide control over the long-tail latency, such that if the read operation from the low-cost SSD encounters an error, the system can proceed with the simultaneously issued read operation from the first low-cost HDD. The second low-cost HDD provides an additional backup layer.
  • Thus, by providing the distributed layers in the manner described in FIGS. 4 and 5, the embodiments of the system described herein can provide a more efficient and distributed storage system and a reduced total cost of ownership for multiple available zones.
  • Exemplary Hierarchy of Distributed Layers
  • FIG. 5 illustrates an exemplary hierarchy 500 for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. Hierarchy 500 can include a read cache layer 502, a write cache layer 504, a normal read layer (from low-cost SSD) 506, and a backup read layer (from low-cost HDD) 508. Read cache layer 502 can include features such as a large capacity, local access, and a short read latency (e.g., read cache 214 of FIGS. 2 and 4). Write cache layer 504 can include features such as a small capacity, global access, storage of multiple replicas, a short write latency, and a high endurance (e.g., write cache 234 of FIGS. 2 and 3).
  • Normal read layer (from low-cost SSD) 506 can include features such as a large capacity, global access, storage for multiple replicas, a short read latency, and a low endurance (e.g., via communications 432/434 with low-cost SSD 236 of FIG. 4). Backup read layer (from low-cost HDD) 508 can include features such as a large capacity, global access, storage for multiple replicas, a long read latency, and a high endurance (e.g., via communications 436/438 with low-cost HDD 268 of FIG. 4).
  • The low-cost high-capacity read cache can improve the overall performance of a distributed storage system by reducing the average read latency. The write cache, which has a small capacity and a low latency, can lead to an improvement in the overall performance of the distributed storage system while maintaining a limited cost. For example, assuming that the dollars per Gigabyte ($/GB) comparison between a conventional high-cost SSD and an HDD is 10 to 1, by storing two of the three copies of data on a low-cost HDD rather than on a conventional high-cost SSD, the system can provide an improved cost ratio of (10+10+10) to (10+1+1), which is 30 to 12, or 5 to 2, which is significantly better than the conventional 10 to 1 ratio, thus resulting in a significant cost savings and overall efficiency of the distributed storage system.
  • Exemplary Method for Mitigating Read Disturb Impact of Persistent Memory
  • FIG. 6A presents a flowchart 600 illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. During operation, the system receives, from a host, a request to read data (operation 602). The system determines whether the data is available in a read cache (operation 604). If the system determines that the data is available in the read cache (i.e., a read cache hit) (decision 606), the operation continues at operation 640 of FIG. 6B.
  • If the system determines that the data is not available in the read cache (i.e., a read cache miss) (decision 606), the system determines paths to replicas of the requested data stored on at least a solid state drive (SSD), a first hard disk drive (HDD1), and a second hard disk drive (HDD2) (operation 608). The system can store replicas on one or more available zones, and a first available zone can include the SSD, HDD1, and HDD2. The system can select HDD1 randomly from a plurality of HDDs in the first available zone. The system issues the read request to the solid state drive (SSD) and the first hard disk drive (HDD1) (operation 610). The system reads the data from the solid state drive and the first hard disk drive (operation 612). If the system successfully reads the requested data from the solid state drive (decision 614), the operation continues at operation 640 of FIG. 6B. If the system unsuccessfully reads the requested data from the solid state drive (decision 614), the system reports a fault associated with the solid state drive (operation 616), and the operation continues at Label A of FIG. 6B.
  • FIG. 6B presents a flowchart 620 illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. During operation, if the system successfully reads the requested data from the first hard disk drive (decision 622), the operation continues at operation 640. If the system unsuccessfully reads the requested data from the first hard disk drive (decision 622), the system reports a fault associated with the first hard disk drive (operation 624). The system issues the read request to and reads the data from the second hard disk drive (operation 626).
  • If the system successfully reads the requested data from the second hard disk drive (decision 628), the operation continues at operation 640 of FIG. 6B. The system sends the requested data to the host (operation 640), and the operation returns.
  • If the system unsuccessfully reads the requested data from the second hard disk drive (decision 628), the system reports a fault associated with the second hard disk drive (operation 630). The system generates a notification that the requested data is not available from a first storage cluster (or a first available zone) comprising the SSD, HDD1, and HDD2, wherein the notification indicates to recover the requested data from a second storage cluster or a second available zone (operation 632), and the operation returns.
  • Exemplary Computer System and Apparatus
  • FIG. 7 illustrates an exemplary computer system 700 that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. Computer system 700 includes a processor 702, a controller 704, a volatile memory 706, and a storage device 708. Volatile memory 706 can include, e.g., random access memory (RAM), that serves as a managed memory, and can be used to store one or more memory pools. Storage device 708 can include persistent storage which can be managed or accessed via controller 704. Furthermore, computer system 700 can be coupled to peripheral input/output user devices 710, such as a display device 711, a keyboard 712, and a pointing device 714. Storage device 708 can store an operating system 716, a content-processing system 718, and data 732.
  • Content-processing system 718 can include instructions, which when executed by computer system 700, can cause computer system 700 to perform methods and/or processes described in this disclosure. Specifically, content-processing system 718 can include instructions for receiving and transmitting data packets, including data to be read or written, a read request, and a write request (communication module 720).
  • Content-processing system 718 can further include instructions for receiving, from a host, a request to read data (communication module 720). Content-processing system 718 can include instructions for determining that the data is not available in a read cache (read cache-managing module 722). Content-processing system 718 can include instructions for issuing the read request to a solid state drive and a first hard disk drive (request-issuing module 724). Content-processing system 718 can include instructions for, in response to unsuccessfully reading the requested data from the solid state drive (SSD-managing module 726) and successfully reading the requested data from the first hard disk drive (HDD-managing module 728), sending the requested data to the host (communication module 720). Content-processing system 718 can include instructions for, in response to unsuccessfully reading the requested data from both the solid state drive and the first hard disk drive (SSD-managing module 726 and HDD-managing module 728): issuing the read request to a second hard disk drive (request-issuing module 724); and sending the requested data to the host (communication module 720).
  • Content-processing system 718 can include instructions for identifying, based on previously stored path information, the solid state drive, the first hard disk drive, and the second hard disk drive (zone-managing module 730). Content-processing system 718 can include instructions for selecting, from a plurality of hard disk drives on which the data is stored, the first hard drive (zone-managing module 730). Content-processing system 718 can include instructions for selecting, from the plurality of hard disk drives, the second hard disk drive (zone-managing module 730).
  • Data 732 can include any data that is required as input or that is generated as output by the methods and/or processes described in this disclosure. Specifically, data 732 can store at least: data; a request; a read request; a write request; data associated with a read cache or a write cache; path information for drives in a storage cluster or available zone; an identification or indicator of an available zone, a solid state drive, a hard disk drive, or other storage device; an indicator of a fault; a message; a notification; a replica; a copy of data; and an indicator of multiple available zones.
  • FIG. 8 illustrates an exemplary apparatus 800 that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. Apparatus 800 can comprise a plurality of units or apparatuses which may communicate with one another via a wired, wireless, quantum light, or electrical communication channel. Apparatus 800 may be realized using one or more integrated circuits, and may include fewer or more units or apparatuses than those shown in FIG. 8. Further, apparatus 800 may be integrated in a computer system, or realized as a separate device which is capable of communicating with other computer systems and/or devices. Specifically, apparatus 800 can comprise units 802-812 which perform functions or operations similar to modules 720-730 of computer system 700 of FIG. 7, including: a communication unit 802; a read cache-managing unit 804; a request-issuing unit 806; an SSD-managing unit 808; an HDD-managing unit 810; and a zone-managing unit 812.
  • The data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. The computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed.
  • The methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above. When a computer system reads and executes the code and/or data stored on the computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the computer-readable storage medium.
  • Furthermore, the methods and processes described above can be included in hardware modules. For example, the hardware modules can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs), and other programmable-logic devices now known or later developed. When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.
  • The foregoing embodiments described herein have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the embodiments described herein to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the embodiments described herein. The scope of the embodiments described herein is defined by the appended claims.

Claims (20)

What is claimed is:
1. A computer-implemented method for facilitating data placement, the method comprising:
receiving, from a host, a request to read data;
determining that the data is not available in a read cache;
issuing the read request to a first storage drive and a second storage drive of a different type than the first storage drive;
in response to unsuccessfully reading the requested data from the first storage drive and successfully reading the requested data from the second storage drive, sending the requested data to the host; and
in response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive:
issuing the read request to a third storage drive; and
sending the requested data to the host.
2. The method of claim 1, further comprising:
identifying, based on previously stored path information, the first storage drive, the second storage drive, and the third storage drive;
selecting, from a plurality of storage drives on which the data is stored, the second storage drive; and
selecting, from the plurality of storage drives, the third storage drive.
3. The method of claim 1, wherein in response to successfully reading the requested data from the first storage drive, the method further comprises:
sending the requested data to the host; and
dropping data read from the second storage drive.
4. The method of claim 1,
wherein in response to unsuccessfully reading the requested data from the first storage drive, the method further comprises reporting a fault associated with the first storage drive; and
wherein in response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive, the method further comprises reporting a fault associated with the second storage drive.
5. The method of claim 1,
wherein the third storage drive is of a same or a different type as the second storage drive, and
wherein a type for the first storage drive, the second storage drive, and the third storage drive comprises one or more of:
a solid state drive;
a hard disk drive; and
a storage medium which comprises one or more of: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
6. The method of claim 1, wherein in response to unsuccessfully reading the requested data from the third storage drive, the method further comprises:
reporting a fault associated with the third storage drive; and
generating a notification indicating that the requested data is not available from a first available zone comprising the first storage drive, the second storage drive, and the third storage drive,
wherein the notification further indicates to recover the requested data from a second available zone.
7. The method of claim 1, wherein the first storage drive, the second storage drive, and the third storage drive comprise a first available zone of a plurality of available zones, and wherein replicas of the requested data are stored in a respective available zone.
8. The method of claim 1, wherein prior to receiving the request to read the data, the method further comprises receiving a request to write the data to the first storage drive, the second storage drive, and the third storage drive, which involves:
simultaneously writing the data to a write cache of each of the first storage drive, the second storage drive, and the third storage drive; and
committing the write request upon successfully writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive.
9. The method of claim 8, wherein subsequent to writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive, the method further comprises:
writing the data asynchronously from the write cache to a non-volatile memory of each of the first storage drive, the second storage drive, and the third storage drive.
10. A computer system for facilitating data placement, the system comprising:
a processor; and
a memory coupled to the processor and storing instructions, which when executed by the processor cause the processor to perform a method, wherein the computer system is a storage device, the method comprising:
receiving, from a host, a request to read data;
determining that the data is not available in a read cache;
issuing the read request to a first storage drive and a second storage drive of a different type than the first storage drive;
in response to unsuccessfully reading the requested data from the first storage drive and successfully reading the requested data from the second storage drive, sending the requested data to the host; and
in response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive:
issuing the read request to a third storage drive; and
sending the requested data to the host.
11. The computer system of claim 10, wherein the method further comprises:
identifying, based on previously stored path information, the first storage drive, the second storage drive, and the third storage drive;
selecting, from a plurality of storage drives on which the data is stored, the second storage drive; and
selecting, from the plurality of storage drives, the third storage drive.
12. The computer system of claim 10, wherein in response to successfully reading the requested data from the first storage drive, the method further comprises:
sending the requested data to the host; and
dropping data read from the second storage drive.
13. The computer system of claim 10,
wherein in response to unsuccessfully reading the requested data from the first storage drive, the method further comprises reporting a fault associated with the first storage drive; and
wherein in response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive, the method further comprises reporting a fault associated with the second storage drive.
14. The computer system of claim 10,
wherein the third storage drive is of a same or a different type as the second storage drive, and
wherein a type for the first storage drive, the second storage drive, and the third storage drive comprises one or more of:
a solid state drive;
a hard disk drive; and
a storage medium which comprises one or more of: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
15. The computer system of claim 10, wherein in response to unsuccessfully reading the requested data from the third storage drive, the method further comprises:
reporting a fault associated with the third storage drive; and
generating a notification indicating that the requested data is not available from a first available zone comprising the first storage drive, the second storage drive, and the third storage drive,
wherein the notification further indicates to recover the requested data from a second available zone.
16. The computer system of claim 10, wherein the first storage drive, the second storage drive, and the third storage drive comprise a first available zone of a plurality of available zones, and wherein replicas of the requested data are stored in a respective available zone.
17. The computer system of claim 10, wherein prior to receiving the request to read the data, the method further comprises receiving a request to write the data to the first storage drive, the second storage drive, and the third storage drive, which involves:
simultaneously writing the data to a write cache of each of the first storage drive, the second storage drive, and the third storage drive; and
committing the write request upon successfully writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive.
18. The computer system of claim 17, wherein subsequent to writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive, the method further comprises:
writing the data asynchronously from the write cache to a non-volatile memory of each of the first storage drive, the second storage drive, and the third storage drive.
19. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method, the method comprising:
receiving, from a host, a request to read data;
determining that the data is not available in a read cache;
issuing the read request to a first storage drive and a second storage drive of a different type than the first storage drive;
in response to unsuccessfully reading the requested data from the first storage drive and successfully reading the requested data from the second storage drive, sending the requested data to the host; and
in response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive:
issuing the read request to a third storage drive; and
sending the requested data to the host.
20. The storage medium of claim 19, wherein prior to receiving the request to read the data, the method further comprises receiving a request to write the data to the first storage drive, the second storage drive, and the third storage drive, which involves:
simultaneously writing the data to a write cache of each of the first storage drive, the second storage drive, and the third storage drive; and
committing the write request upon successfully writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive; and
wherein subsequent to writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive, the method further comprises:
writing the data asynchronously from the write cache to a non-volatile memory of each of the first storage drive, the second storage drive, and the third storage drive.
US16/277,708 2019-02-15 2019-02-15 Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones Active 2039-03-02 US10970212B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/277,708 US10970212B2 (en) 2019-02-15 2019-02-15 Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US16/277,708 US10970212B2 (en) 2019-02-15 2019-02-15 Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones

Publications (2)

Publication Number Publication Date
US20200264978A1 true US20200264978A1 (en) 2020-08-20
US10970212B2 US10970212B2 (en) 2021-04-06

Family

ID=72042102

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/277,708 Active 2039-03-02 US10970212B2 (en) 2019-02-15 2019-02-15 Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones

Country Status (1)

Country Link
US (1) US10970212B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11960724B2 (en) * 2021-09-13 2024-04-16 SK Hynix Inc. Device for detecting zone parallelity of a solid state drive and operating method thereof

Family Cites Families (186)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US1001316A (en) 1910-12-27 1911-08-22 Frank Edward Smith Angle-gage for squares.
US3893071A (en) 1974-08-19 1975-07-01 Ibm Multi level error correction system for high density memory
NL8402411A (en) 1984-08-02 1986-03-03 Philips Nv DEVICE FOR CORRECTING AND MASKING ERRORS IN AN INFORMATION FLOW, AND DISPLAY FOR DISPLAYING IMAGES AND / OR SOUND PROVIDED WITH SUCH A DEVICE.
EP0681721B1 (en) 1993-02-01 2005-03-23 Sun Microsystems, Inc. Archiving file system for data servers in a distributed network environment
US5394382A (en) 1993-02-11 1995-02-28 International Business Machines Corporation Method for the organization of data on a CD-ROM
JP3215237B2 (en) 1993-10-01 2001-10-02 富士通株式会社 Storage device and method for writing / erasing storage device
US5732093A (en) 1996-02-08 1998-03-24 United Microelectronics Corporation Error correction method and apparatus on optical disc system
US6148377A (en) 1996-11-22 2000-11-14 Mangosoft Corporation Shared memory computer networks
US5930167A (en) 1997-07-30 1999-07-27 Sandisk Corporation Multi-state non-volatile flash memory capable of being its own two state write cache
US6098185A (en) 1997-10-31 2000-08-01 Stmicroelectronics, N.V. Header-formatted defective sector management system
US7200623B2 (en) 1998-11-24 2007-04-03 Oracle International Corp. Methods to perform disk writes in a distributed shared disk system needing consistency across failures
US6421787B1 (en) 1998-05-12 2002-07-16 Sun Microsystems, Inc. Highly available cluster message passing facility
US7966462B2 (en) 1999-08-04 2011-06-21 Super Talent Electronics, Inc. Multi-channel flash module with plane-interleaved sequential ECC writes and background recycling to restricted-write flash chips
US6457104B1 (en) 2000-03-20 2002-09-24 International Business Machines Corporation System and method for recycling stale memory content in compressed memory systems
US6658478B1 (en) * 2000-08-04 2003-12-02 3Pardata, Inc. Data storage system
US6694451B2 (en) 2000-12-07 2004-02-17 Hewlett-Packard Development Company, L.P. Method for redundant suspend to RAM
KR100856399B1 (en) 2002-01-23 2008-09-04 삼성전자주식회사 Decoding method and apparatus therefor
US20030163633A1 (en) 2002-02-27 2003-08-28 Aasheim Jered Donald System and method for achieving uniform wear levels in a flash memory device
US7533214B2 (en) 2002-02-27 2009-05-12 Microsoft Corporation Open architecture flash driver
US6988165B2 (en) 2002-05-20 2006-01-17 Pervasive Software, Inc. System and method for intelligent write management of disk pages in cache checkpoint operations
US7953899B1 (en) * 2002-08-21 2011-05-31 3Par Inc. Universal diagnostic hardware space access system for firmware
US7239605B2 (en) * 2002-09-23 2007-07-03 Sun Microsystems, Inc. Item and method for performing a cluster topology self-healing process in a distributed data system cluster
US7003620B2 (en) 2002-11-26 2006-02-21 M-Systems Flash Disk Pioneers Ltd. Appliance, including a flash memory, that is robust under power failure
US7173863B2 (en) 2004-03-08 2007-02-06 Sandisk Corporation Flash controller cache architecture
US7130957B2 (en) 2004-02-10 2006-10-31 Sun Microsystems, Inc. Storage system structure for storing relational cache metadata
US7676603B2 (en) 2004-04-20 2010-03-09 Intel Corporation Write combining protocol between processors and chipsets
JP4401895B2 (en) * 2004-08-09 2010-01-20 株式会社日立製作所 Computer system, computer and its program.
DE102005032061B4 (en) 2005-07-08 2009-07-30 Qimonda Ag Memory module, and memory module system
US7752382B2 (en) 2005-09-09 2010-07-06 Sandisk Il Ltd Flash memory storage system and method
US7631162B2 (en) 2005-10-27 2009-12-08 Sandisck Corporation Non-volatile memory with adaptive handling of data writes
JP2007305210A (en) 2006-05-10 2007-11-22 Toshiba Corp Semiconductor storage device
EP2033128A4 (en) 2006-05-31 2012-08-15 Ibm Method and system for transformation of logical data objects for storage
US7711890B2 (en) 2006-06-06 2010-05-04 Sandisk Il Ltd Cache control in a non-volatile memory device
US20080065805A1 (en) 2006-09-11 2008-03-13 Cameo Communications, Inc. PCI-Express multimode expansion card and communication device having the same
JP2008077810A (en) 2006-09-25 2008-04-03 Toshiba Corp Nonvolatile semiconductor storage device
US7761623B2 (en) 2006-09-28 2010-07-20 Virident Systems, Inc. Main memory in a system with a memory controller configured to control access to non-volatile memory, and related technologies
KR100858241B1 (en) 2006-10-25 2008-09-12 삼성전자주식회사 Hybrid-flash memory device and method for assigning reserved blocks therof
US8344475B2 (en) 2006-11-29 2013-01-01 Rambus Inc. Integrated circuit heating to effect in-situ annealing
US7958433B1 (en) 2006-11-30 2011-06-07 Marvell International Ltd. Methods and systems for storing data in memory using zoning
US7852654B2 (en) 2006-12-28 2010-12-14 Hynix Semiconductor Inc. Semiconductor memory device, and multi-chip package and method of operating the same
US7599139B1 (en) 2007-06-22 2009-10-06 Western Digital Technologies, Inc. Disk drive having a high performance access mode and a lower performance archive mode
US7917574B2 (en) 2007-10-01 2011-03-29 Accenture Global Services Limited Infrastructure for parallel programming of clusters of machines
IL187041A0 (en) 2007-10-30 2008-02-09 Sandisk Il Ltd Optimized hierarchical integrity protection for stored data
US8281061B2 (en) 2008-03-31 2012-10-02 Micron Technology, Inc. Data conditioning to improve flash memory reliability
US8195978B2 (en) 2008-05-16 2012-06-05 Fusion-IO. Inc. Apparatus, system, and method for detecting and replacing failed data storage
KR101497074B1 (en) 2008-06-17 2015-03-05 삼성전자주식회사 Non-volatile memory system and data manage method thereof
US9123422B2 (en) 2012-07-02 2015-09-01 Super Talent Technology, Corp. Endurance and retention flash controller with programmable binary-levels-per-cell bits identifying pages or blocks as having triple, multi, or single-level flash-memory cells
US8959280B2 (en) 2008-06-18 2015-02-17 Super Talent Technology, Corp. Super-endurance solid-state drive with endurance translation layer (ETL) and diversion of temp files for reduced flash wear
JP2010152704A (en) 2008-12-25 2010-07-08 Hitachi Ltd System and method for operational management of computer system
US20100217952A1 (en) 2009-02-26 2010-08-26 Iyer Rahul N Remapping of Data Addresses for a Large Capacity Victim Cache
US8166233B2 (en) 2009-07-24 2012-04-24 Lsi Corporation Garbage collection for solid state disks
US20100332922A1 (en) 2009-06-30 2010-12-30 Mediatek Inc. Method for managing device and solid state disk drive utilizing the same
US20110055471A1 (en) 2009-08-28 2011-03-03 Jonathan Thatcher Apparatus, system, and method for improved data deduplication
US8214700B2 (en) 2009-10-28 2012-07-03 Sandisk Technologies Inc. Non-volatile memory and method with post-write read and adaptive re-write to manage errors
US8144512B2 (en) 2009-12-18 2012-03-27 Sandisk Technologies Inc. Data transfer flows for on-chip folding
US8443263B2 (en) 2009-12-30 2013-05-14 Sandisk Technologies Inc. Method and controller for performing a copy-back operation
US8631304B2 (en) 2010-01-28 2014-01-14 Sandisk Il Ltd. Overlapping error correction operations
TWI409633B (en) 2010-02-04 2013-09-21 Phison Electronics Corp Flash memory storage device, controller thereof, and method for programming data
US8370297B2 (en) 2010-03-08 2013-02-05 International Business Machines Corporation Approach for optimizing restores of deduplicated data
US9401967B2 (en) 2010-06-09 2016-07-26 Brocade Communications Systems, Inc. Inline wire speed deduplication system
US8938624B2 (en) 2010-09-15 2015-01-20 Lsi Corporation Encryption key destruction for secure data erasure
US9244779B2 (en) 2010-09-30 2016-01-26 Commvault Systems, Inc. Data recovery operations, such as recovery from modified network data management protocol data
US20120089774A1 (en) 2010-10-12 2012-04-12 International Business Machines Corporation Method and system for mitigating adjacent track erasure in hard disk drives
US8429495B2 (en) 2010-10-19 2013-04-23 Mosaid Technologies Incorporated Error detection and correction codes for channels and memories with incomplete error characteristics
US9176794B2 (en) 2010-12-13 2015-11-03 Advanced Micro Devices, Inc. Graphics compute process scheduling
US10817421B2 (en) 2010-12-13 2020-10-27 Sandisk Technologies Llc Persistent data structures
US8793328B2 (en) * 2010-12-17 2014-07-29 Facebook, Inc. Distributed storage system
US8826098B2 (en) 2010-12-20 2014-09-02 Lsi Corporation Data signatures to determine successful completion of memory backup
US8819328B2 (en) 2010-12-30 2014-08-26 Sandisk Technologies Inc. Controller and method for performing background operations
US9612978B2 (en) 2010-12-31 2017-04-04 International Business Machines Corporation Encrypted flash-based data storage system with confidentiality mode
WO2012109679A2 (en) 2011-02-11 2012-08-16 Fusion-Io, Inc. Apparatus, system, and method for application direct virtual memory management
US9141527B2 (en) 2011-02-25 2015-09-22 Intelligent Intellectual Property Holdings 2 Llc Managing cache pools
CN102693168B (en) * 2011-03-22 2014-12-31 中兴通讯股份有限公司 A method, a system and a service node for data backup recovery
US20180107591A1 (en) 2011-04-06 2018-04-19 P4tents1, LLC System, method and computer program product for fetching data between an execution of a plurality of threads
US8832402B2 (en) 2011-04-29 2014-09-09 Seagate Technology Llc Self-initiated secure erasure responsive to an unauthorized power down event
US9235482B2 (en) * 2011-04-29 2016-01-12 International Business Machines Corporation Consistent data retrieval in a multi-site computing infrastructure
WO2012161659A1 (en) 2011-05-24 2012-11-29 Agency For Science, Technology And Research A memory storage device, and a related zone-based block management and mapping method
US9344494B2 (en) * 2011-08-30 2016-05-17 Oracle International Corporation Failover data replication with colocation of session state data
US8904158B2 (en) 2011-09-02 2014-12-02 Lsi Corporation Storage system with boot appliance for improving reliability/availability/serviceability in high density server environments
US8843451B2 (en) 2011-09-23 2014-09-23 International Business Machines Corporation Block level backup and restore
KR20130064518A (en) 2011-12-08 2013-06-18 삼성전자주식회사 Storage device and operation method thereof
US9088300B1 (en) 2011-12-15 2015-07-21 Marvell International Ltd. Cyclic redundancy check for out-of-order codewords
US8904061B1 (en) * 2011-12-30 2014-12-02 Emc Corporation Managing storage operations in a server cache
US9043545B2 (en) 2012-01-06 2015-05-26 Netapp, Inc. Distributing capacity slices across storage system nodes
US9251086B2 (en) 2012-01-24 2016-02-02 SanDisk Technologies, Inc. Apparatus, system, and method for managing a cache
US8880815B2 (en) 2012-02-20 2014-11-04 Avago Technologies General Ip (Singapore) Pte. Ltd. Low access time indirect memory accesses
US9362003B2 (en) 2012-03-09 2016-06-07 Sandisk Technologies Inc. System and method to decode data subject to a disturb condition
US9336340B1 (en) 2012-03-30 2016-05-10 Emc Corporation Evaluating management operations
US9208820B2 (en) 2012-06-29 2015-12-08 International Business Machines Corporation Optimized data placement for individual file accesses on deduplication-enabled sequential storage systems
US20140019650A1 (en) 2012-07-10 2014-01-16 Zhi Bin Li Multi-Write Bit-Fill FIFO
US9009402B2 (en) 2012-09-20 2015-04-14 Emc Corporation Content addressable storage in legacy systems
US8756237B2 (en) 2012-10-12 2014-06-17 Architecture Technology Corporation Scalable distributed processing of RDF data
US9141554B1 (en) 2013-01-18 2015-09-22 Cisco Technology, Inc. Methods and apparatus for data processing using data compression, linked lists and de-duplication techniques
US8751763B1 (en) 2013-03-13 2014-06-10 Nimbus Data Systems, Inc. Low-overhead deduplication within a block-based data storage
US9280472B1 (en) 2013-03-13 2016-03-08 Western Digital Technologies, Inc. Caching data in a high performance zone of a data storage system
US9747202B1 (en) 2013-03-14 2017-08-29 Sandisk Technologies Llc Storage module and method for identifying hot and cold data
US9195673B2 (en) 2013-03-15 2015-11-24 International Business Machines Corporation Scalable graph modeling of metadata for deduplicated storage systems
US10073626B2 (en) 2013-03-15 2018-09-11 Virident Systems, Llc Managing the write performance of an asymmetric memory system
KR102039537B1 (en) 2013-03-15 2019-11-01 삼성전자주식회사 Nonvolatile storage device and os image program method thereof
US9436595B1 (en) 2013-03-15 2016-09-06 Google Inc. Use of application data and garbage-collected data to improve write efficiency of a data storage device
US20140304452A1 (en) 2013-04-03 2014-10-09 Violin Memory Inc. Method for increasing storage media performance
KR101478168B1 (en) 2013-04-17 2014-12-31 주식회사 디에이아이오 Storage system and method of processing write data
US9785545B2 (en) 2013-07-15 2017-10-10 Cnex Labs, Inc. Method and apparatus for providing dual memory access to non-volatile memory
US9093093B2 (en) 2013-10-25 2015-07-28 Seagate Technology Llc Adaptive guard band for multiple heads of a data storage device
CA2881206A1 (en) 2014-02-07 2015-08-07 Andrew WARFIELD Methods, systems and devices relating to data storage interfaces for managing address spaces in data storage devices
US9542404B2 (en) 2014-02-17 2017-01-10 Netapp, Inc. Subpartitioning of a namespace region
US20150301964A1 (en) 2014-02-18 2015-10-22 Alistair Mark Brinicombe Methods and systems of multi-memory, control and data plane architecture
US9263088B2 (en) 2014-03-21 2016-02-16 Western Digital Technologies, Inc. Data management for a data storage device using a last resort zone
US9880859B2 (en) 2014-03-26 2018-01-30 Intel Corporation Boot image discovery and delivery
US9383926B2 (en) 2014-05-27 2016-07-05 Kabushiki Kaisha Toshiba Host-controlled garbage collection
US9015561B1 (en) 2014-06-11 2015-04-21 Sandisk Technologies Inc. Adaptive redundancy in three dimensional memory
GB2527296A (en) 2014-06-16 2015-12-23 Ibm A method for restoring data in a HSM system
US8868825B1 (en) 2014-07-02 2014-10-21 Pure Storage, Inc. Nonrepeating identifiers in an address space of a non-volatile solid-state storage
US10044795B2 (en) 2014-07-11 2018-08-07 Vmware Inc. Methods and apparatus for rack deployments for virtual computing environments
US9542327B2 (en) 2014-07-22 2017-01-10 Avago Technologies General Ip (Singapore) Pte. Ltd. Selective mirroring in caches for logical volumes
US20160041760A1 (en) 2014-08-08 2016-02-11 International Business Machines Corporation Multi-Level Cell Flash Memory Control Mechanisms
US10430328B2 (en) 2014-09-16 2019-10-01 Sandisk Technologies Llc Non-volatile cache and non-volatile storage medium using single bit and multi bit flash memory cells or different programming parameters
US9588977B1 (en) 2014-09-30 2017-03-07 EMC IP Holding Company LLC Data and metadata structures for use in tiering data to cloud storage
US10127157B2 (en) 2014-10-06 2018-11-13 SK Hynix Inc. Sizing a cache while taking into account a total bytes written requirement
US9129628B1 (en) 2014-10-23 2015-09-08 Western Digital Technologies, Inc. Data management for data storage device with different track density regions
CN105701028B (en) * 2014-11-28 2018-10-09 国际商业机器公司 Disk management method in distributed memory system and equipment
US9852076B1 (en) 2014-12-18 2017-12-26 Violin Systems Llc Caching of metadata for deduplicated LUNs
US20160179399A1 (en) 2014-12-23 2016-06-23 Sandisk Technologies Inc. System and Method for Selecting Blocks for Garbage Collection Based on Block Health
US10282211B2 (en) 2015-01-09 2019-05-07 Avago Technologies International Sales Pte. Limited Operating system software install and boot up from a storage area network device
US9916275B2 (en) 2015-03-09 2018-03-13 International Business Machines Corporation Preventing input/output (I/O) traffic overloading of an interconnect channel in a distributed data storage system
KR101927233B1 (en) 2015-03-16 2018-12-12 한국전자통신연구원 Gpu power measuring method of heterogeneous multi-core system
US9639282B2 (en) 2015-05-20 2017-05-02 Sandisk Technologies Llc Variable bit encoding per NAND flash cell to improve device endurance and extend life of flash-based storage devices
US10069916B2 (en) 2015-05-26 2018-09-04 Gluent, Inc. System and method for transparent context aware filtering of data requests
US9875053B2 (en) 2015-06-05 2018-01-23 Western Digital Technologies, Inc. Scheduling scheme(s) for a multi-die storage device
US9696931B2 (en) 2015-06-12 2017-07-04 International Business Machines Corporation Region-based storage for volume data and metadata
US9588571B2 (en) 2015-07-08 2017-03-07 Quanta Computer Inc. Dynamic power supply management
US10324832B2 (en) 2016-05-25 2019-06-18 Samsung Electronics Co., Ltd. Address based multi-stream storage device access
US10656838B2 (en) 2015-07-13 2020-05-19 Samsung Electronics Co., Ltd. Automatic stream detection and assignment algorithm
US9529601B1 (en) 2015-07-15 2016-12-27 Dell Products L.P. Multi-processor startup system
US10749858B2 (en) 2015-09-04 2020-08-18 Hewlett Packard Enterprise Development Lp Secure login information
US9952769B2 (en) 2015-09-14 2018-04-24 Microsoft Technology Licensing, Llc. Data storage system with data storage devices operative to manage storage device functions specific to a particular data storage device
CN105278876B (en) 2015-09-23 2018-12-14 华为技术有限公司 A kind of the data method for deleting and device of solid state hard disk
US10120811B2 (en) 2015-09-29 2018-11-06 International Business Machines Corporation Considering a frequency of access to groups of tracks and density of the groups to select groups of tracks to destage
US10031774B2 (en) 2015-10-15 2018-07-24 Red Hat, Inc. Scheduling multi-phase computing jobs
KR20170045806A (en) 2015-10-20 2017-04-28 삼성전자주식회사 Semiconductor memory device and method of operating the same
US20170147499A1 (en) 2015-11-25 2017-05-25 Sandisk Technologies Llc Multi-Level Logical to Physical Address Mapping Using Distributed Processors in Non-Volatile Storage Device
US20170162235A1 (en) 2015-12-02 2017-06-08 Qualcomm Incorporated System and method for memory management using dynamic partial channel interleaving
US20170161202A1 (en) 2015-12-02 2017-06-08 Samsung Electronics Co., Ltd. Flash memory device including address mapping for deduplication, and related methods
US20170177259A1 (en) 2015-12-18 2017-06-22 Intel Corporation Techniques to Use Open Bit Line Information for a Memory System
JP6517684B2 (en) 2015-12-22 2019-05-22 東芝メモリ株式会社 Memory system and control method
US10649681B2 (en) 2016-01-25 2020-05-12 Samsung Electronics Co., Ltd. Dynamic garbage collection P/E policies for redundant storage blocks and distributed software stacks
CN107037976B (en) 2016-02-03 2020-03-20 株式会社东芝 Storage device and working method thereof
US10235198B2 (en) 2016-02-24 2019-03-19 Samsung Electronics Co., Ltd. VM-aware FTL design for SR-IOV NVME SSD
US20170249162A1 (en) 2016-02-25 2017-08-31 Red Hat Israel, Ltd. Safe transmit packet processing for network function virtualization applications
US10303557B2 (en) * 2016-03-09 2019-05-28 Commvault Systems, Inc. Data transfer to a distributed storage environment
US10101939B2 (en) 2016-03-09 2018-10-16 Toshiba Memory Corporation Storage system having a host that manages physical data locations of a storage device
US20170286311A1 (en) 2016-04-01 2017-10-05 Dale J. Juenemann Repetitive address indirection in a memory
US10585809B2 (en) 2016-04-01 2020-03-10 Intel Corporation Convolutional memory integrity
US10866905B2 (en) 2016-05-25 2020-12-15 Samsung Electronics Co., Ltd. Access parameter based multi-stream storage device access
US10684795B2 (en) 2016-07-25 2020-06-16 Toshiba Memory Corporation Storage device and storage control method
US10283215B2 (en) 2016-07-28 2019-05-07 Ip Gem Group, Llc Nonvolatile memory system with background reference positioning and local reference positioning
US11644992B2 (en) 2016-11-23 2023-05-09 Samsung Electronics Co., Ltd. Storage system performing data deduplication, method of operating storage system, and method of operating data processing system
US10374885B2 (en) 2016-12-13 2019-08-06 Amazon Technologies, Inc. Reconfigurable server including a reconfigurable adapter device
US10496544B2 (en) 2016-12-29 2019-12-03 Intel Corporation Aggregated write back in a direct mapped two level memory
US10516760B2 (en) 2017-03-17 2019-12-24 Verizon Patent And Licensing Inc. Automatic bootstrapping and dynamic configuration of data center nodes
US10275170B2 (en) 2017-04-10 2019-04-30 Sandisk Technologies Llc Folding operations in memory systems with single address updates
US10613944B2 (en) * 2017-04-18 2020-04-07 Netapp, Inc. Systems and methods for backup and restore of distributed master-slave database clusters
TWI625620B (en) 2017-05-12 2018-06-01 威盛電子股份有限公司 Non-volatile memory apparatus and reading method thereof
US10474397B2 (en) 2017-06-13 2019-11-12 Western Digital Technologies, Inc Unified indirection in a multi-device hybrid storage unit
US10521375B2 (en) 2017-06-22 2019-12-31 Macronix International Co., Ltd. Controller for a memory system
US10838902B2 (en) 2017-06-23 2020-11-17 Facebook, Inc. Apparatus, system, and method for performing hardware acceleration via expansion cards
US10275162B2 (en) 2017-06-23 2019-04-30 Dell Products L.P. Methods and systems for managing data migration in solid state non-volatile memory
US10564856B2 (en) 2017-07-06 2020-02-18 Alibaba Group Holding Limited Method and system for mitigating write amplification in a phase change memory-based storage device
TWI631570B (en) 2017-09-04 2018-08-01 威盛電子股份有限公司 Error checking and correcting decoding method and apparatus
US10642522B2 (en) 2017-09-15 2020-05-05 Alibaba Group Holding Limited Method and system for in-line deduplication in a storage drive based on a non-collision hash
US10956279B2 (en) * 2017-12-04 2021-03-23 International Business Machines Corporation Managing big data on document based NoSQL databases
US10229735B1 (en) 2017-12-22 2019-03-12 Intel Corporation Block management for dynamic single-level cell buffers in storage devices
US10606693B2 (en) 2017-12-28 2020-03-31 Micron Technology, Inc. Memory controller implemented error correction code memory
CN110058794B (en) 2018-01-19 2022-11-01 上海宝存信息科技有限公司 Data storage device for dynamically executing garbage recovery and operation method
US10199066B1 (en) 2018-03-01 2019-02-05 Seagate Technology Llc Write management of physically coupled storage areas
US10585819B2 (en) 2018-03-05 2020-03-10 Samsung Electronics Co., Ltd. SSD architecture for FPGA based acceleration
US10649657B2 (en) 2018-03-22 2020-05-12 Western Digital Technologies, Inc. Log-based storage for different data types in non-volatile memory
JP7023384B2 (en) 2018-05-04 2022-02-21 シトリックス・システムズ・インコーポレイテッド Computer systems and related methods that provide hierarchical display remoting optimized with user and system hints
US10437670B1 (en) 2018-05-24 2019-10-08 International Business Machines Corporation Metadata hardening and parity accumulation for log-structured arrays
KR20190139082A (en) 2018-06-07 2019-12-17 삼성전자주식회사 Memory device and method for equalizing bit error rates
US11599557B2 (en) * 2018-06-12 2023-03-07 Open Text Corporation System and method for persistence and replication of changes to a data store
US10921992B2 (en) 2018-06-25 2021-02-16 Alibaba Group Holding Limited Method and system for data placement in a hard disk drive based on access frequency for improved IOPS and utilization efficiency
US10776263B2 (en) 2018-06-27 2020-09-15 Seagate Technology Llc Non-deterministic window scheduling for data storage systems
US11150836B2 (en) 2018-06-28 2021-10-19 Seagate Technology Llc Deterministic optimization via performance tracking in a data storage system
US11086529B2 (en) 2018-09-26 2021-08-10 Western Digital Technologies, Inc. Data storage systems and methods for improved data relocation based on read-level voltages associated with error recovery
US11334521B2 (en) * 2018-12-21 2022-05-17 EMC IP Holding Company LLC System and method that determines a size of metadata-based system snapshots

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11960724B2 (en) * 2021-09-13 2024-04-16 SK Hynix Inc. Device for detecting zone parallelity of a solid state drive and operating method thereof

Also Published As

Publication number Publication date
US10970212B2 (en) 2021-04-06

Similar Documents

Publication Publication Date Title
KR102370760B1 (en) Zone formation for zoned namespaces
KR101912596B1 (en) Non-volatile memory program failure recovery via redundant arrays
US10331345B2 (en) Method and apparatus for reducing silent data errors in non-volatile memory systems
US11200159B2 (en) System and method for facilitating efficient utilization of NAND flash memory
US11449386B2 (en) Method and system for optimizing persistent memory on data retention, endurance, and performance for host memory
CN113396566A (en) Resource allocation based on comprehensive I/O monitoring in distributed storage system
US10872622B1 (en) Method and system for deploying mixed storage products on a uniform storage infrastructure
US11922019B2 (en) Storage device read-disturb-based block read temperature utilization system
US10970212B2 (en) Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones
CN112346658B (en) Improving data heat trace resolution in a storage device having a cache architecture
EP4170499A1 (en) Data storage method, storage system, storage device, and storage medium
US11340989B2 (en) RAID storage-device-assisted unavailable primary data/Q data rebuild system
CN114730247A (en) Storage device with minimum write size of data
US11847337B2 (en) Data parking for ZNS devices
US9400748B2 (en) System and method for data inversion in a storage resource
US11487465B2 (en) Method and system for a local storage engine collaborating with a solid state drive controller
US11119855B2 (en) Selectively storing parity data in different types of memory
US11003391B2 (en) Data-transfer-based RAID data update system
CN114490726A (en) Automatic flexible mode detection and migration
US11989441B2 (en) Read-disturb-based read temperature identification system
US11928354B2 (en) Read-disturb-based read temperature determination system
US11922035B2 (en) Read-disturb-based read temperature adjustment system
US11983431B2 (en) Read-disturb-based read temperature time-based attenuation system
US11995340B2 (en) Read-disturb-based read temperature information access system
US11868223B2 (en) Read-disturb-based read temperature information utilization system

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: ALIBABA GROUP HOLDING LIMITED, CAYMAN ISLANDS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LI, SHU;REEL/FRAME:048484/0545

Effective date: 20190228

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE