US20200264978A1 - Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones - Google Patents
Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones Download PDFInfo
- Publication number
- US20200264978A1 US20200264978A1 US16/277,708 US201916277708A US2020264978A1 US 20200264978 A1 US20200264978 A1 US 20200264978A1 US 201916277708 A US201916277708 A US 201916277708A US 2020264978 A1 US2020264978 A1 US 2020264978A1
- Authority
- US
- United States
- Prior art keywords
- storage drive
- data
- storage
- requested data
- drive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0868—Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0617—Improving the reliability of storage systems in relation to availability
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0635—Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0685—Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/15—Use in a specific computing environment
- G06F2212/154—Networked environment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/21—Employing a record carrier using a specific recording technology
- G06F2212/217—Hybrid disk, e.g. using both magnetic and solid state storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/26—Using a specific storage system architecture
- G06F2212/261—Storage comprising a plurality of storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/28—Using a specific disk cache architecture
- G06F2212/283—Plural cache memories
- G06F2212/284—Plural cache memories being distributed
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/31—Providing disk cache in a specific location of a storage system
- G06F2212/311—In host system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/31—Providing disk cache in a specific location of a storage system
- G06F2212/313—In storage device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/72—Details relating to flash memory management
- G06F2212/7208—Multiple device management, e.g. distributing data over multiple flash devices
Definitions
- This disclosure is generally related to the field of data storage. More specifically, this disclosure is related to a method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones.
- a hyperscale storage system which facilitates achieving a massive scale in computing, e.g., for big data or cloud computing.
- a hyperscale infrastructure must ensure both high availability and data reliability for the corresponding massive scale in computing.
- One way to ensure the high availability and data reliability in a hyperscale infrastructure is to use multiple available zones, which are constructed to synchronize data and provide service in a consistent manner. Each available zone may include multiple storage clusters, and each storage cluster may be deployed with a distributed file system which maintains multiple replicas of given data.
- One embodiment facilitates data placement in a storage device.
- the system receives, from a host, a request to read data.
- the system determines that the data is not available in a read cache.
- the system issues the read request to a first storage drive and a second storage drive of a different type than the first storage drive.
- the system sends the requested data to the host.
- the system issues the read request to a third storage drive; and the system sends the requested data to the host.
- the system identifies, based on previously stored path information, the first storage drive, the second storage drive, and the third storage drive.
- the system selects, from a plurality of storage drives on which the data is stored, the second storage drive.
- the system selects, from the plurality of storage drives, the third storage drive.
- the system in response to successfully reading the requested data from the first storage drive, the system sends the requested data to the host, and drops data read from the second storage drive.
- the system in response to unsuccessfully reading the requested data from the first storage drive, the system reports a fault associated with the solid state drive. In response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive, the system reports a fault associated with the second storage drive.
- the third storage drive is of a same or a different type as the second storage drive, and a type for the first storage drive, the second storage drive, and the third storage drive comprises one or more of: a solid state drive; a hard disk drive; and a storage medium which comprises one or more of: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
- MRAM magnetoresistive random-access memory
- ReRAM resistive RAM
- PCM phase change memory
- NRAM nano-RAM
- FRAM ferroelectric RAM
- the system in response to unsuccessfully reading the requested data from the third storage drive, the system reports a fault associated with the third storage drive, and the system generates a notification indicating that the requested data is not available from a first available zone comprising the first storage drive, the second storage drive, and the third storage drive, wherein the notification further indicates to recover the requested data from a second available zone.
- the first storage drive, the second storage drive, and the third storage drive comprise a first available zone of a plurality of available zones, and replicas of the requested data are stored in a respective available zone.
- the system prior to receiving the request to read the data, receives a request to write the data to the first storage drive, the second storage drive, and the third storage drive, which involves: simultaneously writing the data to a write cache of each of the first storage drive, the second storage drive, and the third storage drive; and committing the write request upon successfully writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive.
- the system subsequent to writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive, the system writes the data asynchronously from the write cache to a non-volatile memory of each of the first storage drive, the second storage drive, and the third storage drive.
- FIG. 1 illustrates an exemplary environment which demonstrates data placement in a distributed storage system with multiple available zones, in accordance with the prior art.
- FIG. 2 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- FIG. 3 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, including exemplary write operations, in accordance with an embodiment of the present application.
- FIG. 4 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, including exemplary read operations, in accordance with an embodiment of the present application.
- FIG. 5 illustrates an exemplary hierarchy for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- FIG. 6A presents a flowchart illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- FIG. 6B presents a flowchart illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- FIG. 7 illustrates an exemplary computer system that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- FIG. 8 illustrates an exemplary apparatus that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- the embodiments described herein solve the problem of increasing the efficiency and performance of a distributed storage system by providing a hierarchy of access layers, which can ensure the high availability of service and data while reducing the total cost of ownership.
- a distributed storage system is a hyperscale storage system, which facilitates achieving a massive scale in computing, e.g., for big data or cloud computing.
- a hyperscale infrastructure must ensure both high availability and data reliability for the corresponding massive scale in computing.
- One way to ensure the high availability and data reliability in a hyperscale infrastructure is to use multiple available zones, which are constructed to synchronize data and provide service in a consistent manner.
- Each available zone may include multiple storage clusters, and each storage cluster may be deployed with a distributed file system which maintains multiple replicas of given data. For example, in a hyperscale infrastructure with three available zones, where each available zone has three replicas, the same data is stored nine times.
- the embodiments described herein address these challenges by providing a system which increases the efficiency and performance of a distributed storage system by providing a hierarchy of access layers.
- the system can use “low-cost SSDs” and “low-cost HDDs” in a layered hierarchy to store the multiple replicas of data required by a given application.
- An example of a low-cost SSD is a quad-level cell (QLC) SSD, which includes multi-level cell (MLC) memory elements which can store four bits of information, in contrast with a single-level cell (SLC) memory element which can only store a single bit of information.
- QLC quad-level cell
- MLC multi-level cell
- SLC single-level cell
- An MLC-based SSD can refer to a multi-level cell (MLC) which stores two bits, while a triple-level cell (TLC) stores three bits and a quad-level cell (QLC) stores four bits.
- MLC multi-level cell
- TLC triple-level cell
- QLC quad-level cell
- an SLC-based SSD has features which include a high capacity and a high endurance (e.g., can endure a large number of program/erase cycles), and are currently the most expensive SSDs.
- MLC ⁇ TLC ⁇ QLC As the number of bits per cell increases (e.g., MLC ⁇ TLC ⁇ QLC), so decreases the cost of the associated SSD as well as the endurance.
- the QLC SSD is an example of a low-cost SSD.
- An example of a low-cost HDD is a shingled magnetic recording (SMR) drive, which writes data sequentially to overlapping or “shingled” tracks.
- SMR HDD generally has a higher read latency than the read latency of a conventional magnetic recording (CMR) drive, but the SMR HDD is generally a lower cost alternative, and can be useful for sequential workloads where large amounts of data can be written sequentially, followed by random reads for processing and archive retrieval (e.g., video surveillance, object storage, and cloud services).
- CMR magnetic recording
- an SMR drive is an example of a low-cost HDD.
- the system can construct available zones (or storage clusters) which include low-cost SSDs and low-cost HDDs.
- an available zone can include one low-cost SSD and two low-cost HDDs, as described below in relation to FIGS. 3 and 4 .
- the system can ensure the high availability of service and data without a loss in performance (e.g., in access latency and throughput), and can also provide a reduction of the total cost of ownership (TCO).
- TCO total cost of ownership
- Each available zone or storage cluster can include multiple storage nodes, and each storage node can include multiple low-cost SSDs and multiple low-cost HDDs.
- each storage node can deploy a write cache.
- an available zone which can include, heterogeneous storage devices, e.g., one low-cost SSD and two low-cost HDDs on different storage nodes
- each write cache associated with a specific storage node can be written simultaneously, and can thus provide execution of a low-latency write, followed by a commit to the host.
- the data in the write cache can be written asynchronously from the write cache to the non-volatile memory of the respective low-cost SSD or low-cost HDD.
- Data is thus written in a layered, hierarchical manner, which can result in an improved distributed storage system that can support a growing hyperscale infrastructure.
- An exemplary write operation is described below in relation to FIG. 3
- an exemplary read operation is described below in relation to FIG. 4 .
- the embodiments described herein provide a distributed storage system which can ensure both high availability and data reliability to meet the increasing needs of current applications.
- a “storage drive” refers to a device or a drive with a non-volatile memory which can provide persistent storage of data, e.g., a solid state drive (SSD) or a hard disk drive (HDD).
- SSD solid state drive
- HDD hard disk drive
- a “storage server” or a “storage node” refers to a computing device which can include multiple storage drives.
- a distributed storage system can include multiple storage servers or storage nodes.
- a “compute node” refers to a computing device which can perform as a client device or a host device.
- a distributed storage system can include multiple compute nodes.
- a “storage cluster” or an “available zone” is a grouping of storage servers, storage nodes, or storage drives in a distributed storage system.
- a “low-cost SSD” refers to an SSD which has a lower cost compared to currently available SSDs, and may have a lower endurance than other currently available SSDS.
- An example of a low-cost SSD is a QLC SSD.
- a “low-cost HDD” refers to a HDD which has a lower cost compared to currently available HDDs, and may have a higher read or access latency than other currently available HDDs.
- An example of a low-cost HDD is an SMR HDD. While the embodiments and Figures described herein refer to low-cost SSDs and low-cost HDDs, in some embodiments, the storage drives depicted as low-cost SSDs and low-cost HDDs can include other types of storage drives, including but not limited to: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
- MRAM magnetoresistive random-access memory
- ReRAM resistive RAM
- PCM phase change memory
- NRAM nano-RAM
- FRAM ferroelectric RAM
- FIG. 1 illustrates an exemplary environment 100 which demonstrates data placement in a distributed storage system with multiple available zones, in accordance with the prior art.
- Environment 100 includes three available zones (AZ): an available zone 1 102 ; an available zone 2 104 ; and an available zone 3 106 .
- the distributed storage system can use the multiple available zones to synchronize data and provide service in a consistent manner, by storing multiple replicas of given data on each available zone.
- each available zone can indicate a storage cluster, and each cluster can be deployed with a distributed file system which maintains three replicas on each cluster.
- available zone 3 106 can include a compute cluster 112 and a storage cluster 114 .
- Storage cluster 114 can include three storage drives, and each storage drive can include a copy of given data.
- a storage drive 122 can include a data_copy 1 124 ;
- a storage drive 126 can include a data_copy 2 128 ; and
- a storage drive 130 can include a data_copy 3 132 .
- the distributed storage system with three storage clusters (e.g., AZs) depicted in environment 100 stores the data nine separate times.
- each storage drive can be a standard solid state drive (SSD). These standard SSDs are expensive, which can lead to a high overall total cost of ownership (TCO).
- SSD solid state drive
- FIG. 2 illustrates an exemplary environment 200 which facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- Environment 200 can be a distributed storage system which includes compute nodes and storage nodes, which communicate with each other over a data center network 202 (e.g., via communications 204 and 206 ).
- Each compute node can include a read cache, and each storage node can include a write cache.
- a compute node 212 can include a read cache 214
- a compute node 216 can include a read cache 218
- a compute node 220 can include a read cache 222 .
- a storage node 232 can include a write cache 234 .
- storage nodes 242 , 252 , and 262 can include, respectively, write caches 244 , 254 , and 264 .
- each storage node can include multiple storage drives, including a low-cost SSD and a low-cost HDD. This is in contrast to conventional storage nodes which include only the conventional high-cost SSDs, as described above in relation to FIG. 1 .
- storage node 232 can include a low-cost SSD 236 and a low-cost HDD 238 .
- storage nodes 242 , 252 , and 262 can include, respectively, low-cost SSDs ( 246 , 256 , and 266 ) as well as low-cost HDDs ( 248 , 258 , and 268 ).
- Each storage node can maintain the low-cost SSD to ensure the shorter (e.g., faster) read latency as compared to an HDD. While each storage node in environment 200 is depicted as including only one low-cost SSD and one low-cost HDD, each storage node can include any number of low-cost SSDs and low-cost HDDs. Furthermore, the exemplary environments of FIGS. 3 and 4 depict one embodiment of the present application, in which each storage node, storage cluster, or available zone includes one low-cost SSD and two low-cost HDDs.
- the system can use the read cache of a compute node to decrease the average latency of a read operation, and can also use the write cache of a storage node to decrease the average latency of a write operation.
- Exemplary environments for facilitating communications via communications 204 and 206 are described below in relation to FIG. 3 (write operation) and FIG. 4 (read operation).
- FIG. 3 illustrates an exemplary environment 300 which facilitates data placement in a distributed storage system with multiple available zones, including exemplary write operations, in accordance with an embodiment of the present application.
- Environment 300 can include a storage node, storage cluster, or available zone which includes multiple drives, such as a low-cost SSD 236 , a low-cost HDD 248 , and a low-cost HDD 268 .
- these multiple drives can be part of a same storage cluster or available zone, but can reside on different storage nodes/servers.
- these multiple drives ( 236 , 248 , and 268 ) can reside on the same storage node/server.
- the multiple drives can communicate with compute nodes or other client devices via a data center network 302 .
- the system can receive a user write operation 312 via a communication 304 .
- the system can write data corresponding to user write operation 312 to multiple write caches simultaneously (e.g., via a communication 306 ).
- the system can: write data 372 to write cache 1 234 associated with low-cost SSD 236 ; write data 382 to write cache 2 244 associated with low-cost HDD 248 ; and write data 392 to write cache 3 264 associated with low-cost HDD 268 .
- the system can commit the current write operation.
- the system can send an acknowledgement to the host confirming the successful write (e.g., via acknowledgments 374 , 384 , and 394 ).
- the system can perform an asynchronous write operation by writing the data stored in the write cache to the non-volatile memory of the storage drive.
- the system can perform an asynchronous write 342 to write the data from write cache 1 234 to the non-volatile memory of low-cost SSD 236 .
- the system can perform an asynchronous write 352 to write the data from write cache 2 244 to the non-volatile memory of low-cost HDD 248 .
- the system can perform an asynchronous write 362 to write the data from write cache 3 264 to the non-volatile memory of low-cost HDD 268 .
- the asynchronous write can be part of a background operation, and can remain not visible to a front-end user. That is, the asynchronous write operation can be performed without affecting the front-end user.
- an asynchronous write e.g., asynchronous write 342
- a respective write cache e.g., data 372 to write cache 1 234
- acknowledgment 374 e.g., acknowledgment 374
- FIG. 4 illustrates an exemplary environment 400 which facilitates data placement in a distributed storage system with multiple available zones, including exemplary read operations, in accordance with an embodiment of the present application.
- environment 400 can include an available zone which includes low-cost SSD 236 , low-cost HDD 248 , and low-cost HDD 268 , along with their respective write caches (e.g., 234 , 244 , and 264 ).
- These drives can communicate with compute nodes or other client devices via a data center network 402 (via communications 426 and 430 ).
- the system can receive a user read operation 412 .
- the system can initially check the read cache of the corresponding compute node (or another compute node, depending on the configuration of the compute nodes). For example, the system can check, via a communication 422 , whether read cache 414 stores the requested data. If it does, the system can return the requested data via a communication 424 . If it does not, the system can return a message, via communication 424 , indicating that the requested data is not stored in the read cache.
- the system can then issue the read request to both a low-cost SSD and a low-cost HDD of an available zone.
- the system can pick the low-cost HDD randomly. For example, the system can identify low-cost SSD 236 and low-cost HDD 268 of an available zone.
- the system can issue the read request to both low-cost SSD 236 (via an operation 432 ) and low-cost HDD 268 (via an operation 436 ). If the requested data can be obtained from (i.e., is stored on) low-cost SSD 236 , the system can read the requested data from low-cost SSD 236 , and return the requested data to the host (via a communication 434 ).
- the system can drop the data obtained, if any, from low-cost HDD 268 in response to the read request (via communication 436 ), and can also report a fault associated with low-cost SSD 236 .
- the system can read the requested data from low-cost HDD 268 , and return the requested data to the host (via a communication 438 ). If the requested data cannot be obtained from either low-cost SSD 236 or low-cost HDD 268 , the system can report a fault associated with low-cost HDD 268 , and can also identify another low-cost HDD on which the data is stored (e.g., low-cost HDD 248 , which is part of the same available zone as low-cost SSD 236 and low-cost HDD 268 ). The system can then issue the read request to low-cost HDD 248 . If the requested data can be obtained from (i.e., is stored on) low-cost HDD 248 , the system can read the requested data from low-cost HDD 248 , and return the requested data to the host (via a communication, not shown).
- the system can report a fault associated with low-cost HDD 248 , and can also generate a message or notification indicating that the requested data is not available from the available zone comprising low-cost SSD, low-cost HDD 248 , and low-cost HDD 268 .
- the notification can further indicate to recover the requested data from another available zone.
- the read request is always issued to both the low-cost SSD and the (randomly selected) low-cost HDD.
- This hierarchy allows the system to provide control over the long-tail latency, such that if the read operation from the low-cost SSD encounters an error, the system can proceed with the simultaneously issued read operation from the first low-cost HDD.
- the second low-cost HDD provides an additional backup layer.
- the embodiments of the system described herein can provide a more efficient and distributed storage system and a reduced total cost of ownership for multiple available zones.
- FIG. 5 illustrates an exemplary hierarchy 500 for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- Hierarchy 500 can include a read cache layer 502 , a write cache layer 504 , a normal read layer (from low-cost SSD) 506 , and a backup read layer (from low-cost HDD) 508 .
- Read cache layer 502 can include features such as a large capacity, local access, and a short read latency (e.g., read cache 214 of FIGS. 2 and 4 ).
- Write cache layer 504 can include features such as a small capacity, global access, storage of multiple replicas, a short write latency, and a high endurance (e.g., write cache 234 of FIGS. 2 and 3 ).
- Normal read layer (from low-cost SSD) 506 can include features such as a large capacity, global access, storage for multiple replicas, a short read latency, and a low endurance (e.g., via communications 432 / 434 with low-cost SSD 236 of FIG. 4 ).
- Backup read layer (from low-cost HDD) 508 can include features such as a large capacity, global access, storage for multiple replicas, a long read latency, and a high endurance (e.g., via communications 436 / 438 with low-cost HDD 268 of FIG. 4 ).
- the low-cost high-capacity read cache can improve the overall performance of a distributed storage system by reducing the average read latency.
- the write cache which has a small capacity and a low latency, can lead to an improvement in the overall performance of the distributed storage system while maintaining a limited cost. For example, assuming that the dollars per Gigabyte ($/GB) comparison between a conventional high-cost SSD and an HDD is 10 to 1, by storing two of the three copies of data on a low-cost HDD rather than on a conventional high-cost SSD, the system can provide an improved cost ratio of (10+10+10) to (10+1+1), which is 30 to 12, or 5 to 2, which is significantly better than the conventional 10 to 1 ratio, thus resulting in a significant cost savings and overall efficiency of the distributed storage system.
- FIG. 6A presents a flowchart 600 illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- the system receives, from a host, a request to read data (operation 602 ).
- the system determines whether the data is available in a read cache (operation 604 ). If the system determines that the data is available in the read cache (i.e., a read cache hit) (decision 606 ), the operation continues at operation 640 of FIG. 6B .
- the system determines paths to replicas of the requested data stored on at least a solid state drive (SSD), a first hard disk drive (HDD1), and a second hard disk drive (HDD2) (operation 608 ).
- the system can store replicas on one or more available zones, and a first available zone can include the SSD, HDD1, and HDD2.
- the system can select HDD1 randomly from a plurality of HDDs in the first available zone.
- the system issues the read request to the solid state drive (SSD) and the first hard disk drive (HDD1) (operation 610 ).
- the system reads the data from the solid state drive and the first hard disk drive (operation 612 ). If the system successfully reads the requested data from the solid state drive (decision 614 ), the operation continues at operation 640 of FIG. 6B . If the system unsuccessfully reads the requested data from the solid state drive (decision 614 ), the system reports a fault associated with the solid state drive (operation 616 ), and the operation continues at Label A of FIG. 6B .
- FIG. 6B presents a flowchart 620 illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- the operation continues at operation 640 . If the system unsuccessfully reads the requested data from the first hard disk drive (decision 622 ), the system reports a fault associated with the first hard disk drive (operation 624 ). The system issues the read request to and reads the data from the second hard disk drive (operation 626 ).
- the operation continues at operation 640 of FIG. 6B .
- the system sends the requested data to the host (operation 640 ), and the operation returns.
- the system If the system unsuccessfully reads the requested data from the second hard disk drive (decision 628 ), the system reports a fault associated with the second hard disk drive (operation 630 ).
- the system generates a notification that the requested data is not available from a first storage cluster (or a first available zone) comprising the SSD, HDD1, and HDD2, wherein the notification indicates to recover the requested data from a second storage cluster or a second available zone (operation 632 ), and the operation returns.
- FIG. 7 illustrates an exemplary computer system 700 that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- Computer system 700 includes a processor 702 , a controller 704 , a volatile memory 706 , and a storage device 708 .
- Volatile memory 706 can include, e.g., random access memory (RAM), that serves as a managed memory, and can be used to store one or more memory pools.
- Storage device 708 can include persistent storage which can be managed or accessed via controller 704 .
- computer system 700 can be coupled to peripheral input/output user devices 710 , such as a display device 711 , a keyboard 712 , and a pointing device 714 .
- Storage device 708 can store an operating system 716 , a content-processing system 718 , and data 732 .
- Content-processing system 718 can include instructions, which when executed by computer system 700 , can cause computer system 700 to perform methods and/or processes described in this disclosure. Specifically, content-processing system 718 can include instructions for receiving and transmitting data packets, including data to be read or written, a read request, and a write request (communication module 720 ).
- Content-processing system 718 can further include instructions for receiving, from a host, a request to read data (communication module 720 ).
- Content-processing system 718 can include instructions for determining that the data is not available in a read cache (read cache-managing module 722 ).
- Content-processing system 718 can include instructions for issuing the read request to a solid state drive and a first hard disk drive (request-issuing module 724 ).
- Content-processing system 718 can include instructions for, in response to unsuccessfully reading the requested data from the solid state drive (SSD-managing module 726 ) and successfully reading the requested data from the first hard disk drive (HDD-managing module 728 ), sending the requested data to the host (communication module 720 ).
- Content-processing system 718 can include instructions for, in response to unsuccessfully reading the requested data from both the solid state drive and the first hard disk drive (SSD-managing module 726 and HDD-managing module 728 ): issuing the read request to a second hard disk drive (request-issuing module 724 ); and sending the requested data to the host (communication module 720 ).
- Content-processing system 718 can include instructions for identifying, based on previously stored path information, the solid state drive, the first hard disk drive, and the second hard disk drive (zone-managing module 730 ). Content-processing system 718 can include instructions for selecting, from a plurality of hard disk drives on which the data is stored, the first hard drive (zone-managing module 730 ). Content-processing system 718 can include instructions for selecting, from the plurality of hard disk drives, the second hard disk drive (zone-managing module 730 ).
- Data 732 can include any data that is required as input or that is generated as output by the methods and/or processes described in this disclosure. Specifically, data 732 can store at least: data; a request; a read request; a write request; data associated with a read cache or a write cache; path information for drives in a storage cluster or available zone; an identification or indicator of an available zone, a solid state drive, a hard disk drive, or other storage device; an indicator of a fault; a message; a notification; a replica; a copy of data; and an indicator of multiple available zones.
- FIG. 8 illustrates an exemplary apparatus 800 that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.
- Apparatus 800 can comprise a plurality of units or apparatuses which may communicate with one another via a wired, wireless, quantum light, or electrical communication channel.
- Apparatus 800 may be realized using one or more integrated circuits, and may include fewer or more units or apparatuses than those shown in FIG. 8 .
- apparatus 800 may be integrated in a computer system, or realized as a separate device which is capable of communicating with other computer systems and/or devices.
- apparatus 800 can comprise units 802 - 812 which perform functions or operations similar to modules 720 - 730 of computer system 700 of FIG.
- a communication unit 802 including: a communication unit 802 ; a read cache-managing unit 804 ; a request-issuing unit 806 ; an SSD-managing unit 808 ; an HDD-managing unit 810 ; and a zone-managing unit 812 .
- the data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system.
- the computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed.
- the methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above.
- a computer system reads and executes the code and/or data stored on the computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the computer-readable storage medium.
- the methods and processes described above can be included in hardware modules.
- the hardware modules can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs), and other programmable-logic devices now known or later developed.
- ASIC application-specific integrated circuit
- FPGA field-programmable gate arrays
- the hardware modules When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
One embodiment facilitates data placement in a storage device. During operation, the system receives, from a host, a request to read data. The system determines that the data is not available in a read cache. The system issues the read request to a solid state drive and a first hard disk drive. In response to unsuccessfully reading the requested data from the solid state drive and successfully reading the requested data from the first hard disk drive, the system sends the requested data to the host. In response to unsuccessfully reading the requested data from both the solid state drive and the first hard disk drive: the system issues the read request to a second hard disk drive; and the system sends the requested data to the host.
Description
- This disclosure is generally related to the field of data storage. More specifically, this disclosure is related to a method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones.
- The proliferation of the Internet and e-commerce continues to create a vast amount of digital content. Various distributed storage systems have been created to access and store such digital content. One example of a distributed storage system is a hyperscale storage system, which facilitates achieving a massive scale in computing, e.g., for big data or cloud computing. A hyperscale infrastructure must ensure both high availability and data reliability for the corresponding massive scale in computing. One way to ensure the high availability and data reliability in a hyperscale infrastructure is to use multiple available zones, which are constructed to synchronize data and provide service in a consistent manner. Each available zone may include multiple storage clusters, and each storage cluster may be deployed with a distributed file system which maintains multiple replicas of given data. For example, in a hyperscale infrastructure with three available zones, where each available zone has three replicas, the same data is stored nine times. As current applications continue to pursue and require fast access, all nine of the replicas may be stored on high-speed solid state drives (SSDs). However, these high-speed SSDs can be expensive. Meeting the needs of current applications in this way can thus result in a high cost for the overall storage system. As the hyperscale infrastructure scales out and grows, the ability to provide an efficient system which can both scale out and perform at a reasonable pace becomes critical. Furthermore, the total cost of ownership (TCO) can become a critical factor.
- One embodiment facilitates data placement in a storage device. During operation, the system receives, from a host, a request to read data. The system determines that the data is not available in a read cache. The system issues the read request to a first storage drive and a second storage drive of a different type than the first storage drive. In response to unsuccessfully reading the requested data from the first storage drive and successfully reading the requested data from the second storage drive, the system sends the requested data to the host. In response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive: the system issues the read request to a third storage drive; and the system sends the requested data to the host.
- In some embodiments, the system identifies, based on previously stored path information, the first storage drive, the second storage drive, and the third storage drive. The system selects, from a plurality of storage drives on which the data is stored, the second storage drive. The system selects, from the plurality of storage drives, the third storage drive.
- In some embodiments, in response to successfully reading the requested data from the first storage drive, the system sends the requested data to the host, and drops data read from the second storage drive.
- In some embodiments, in response to unsuccessfully reading the requested data from the first storage drive, the system reports a fault associated with the solid state drive. In response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive, the system reports a fault associated with the second storage drive.
- In some embodiments, the third storage drive is of a same or a different type as the second storage drive, and a type for the first storage drive, the second storage drive, and the third storage drive comprises one or more of: a solid state drive; a hard disk drive; and a storage medium which comprises one or more of: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
- In some embodiments, in response to unsuccessfully reading the requested data from the third storage drive, the system reports a fault associated with the third storage drive, and the system generates a notification indicating that the requested data is not available from a first available zone comprising the first storage drive, the second storage drive, and the third storage drive, wherein the notification further indicates to recover the requested data from a second available zone.
- In some embodiments, the first storage drive, the second storage drive, and the third storage drive comprise a first available zone of a plurality of available zones, and replicas of the requested data are stored in a respective available zone.
- In some embodiments, prior to receiving the request to read the data, the system receives a request to write the data to the first storage drive, the second storage drive, and the third storage drive, which involves: simultaneously writing the data to a write cache of each of the first storage drive, the second storage drive, and the third storage drive; and committing the write request upon successfully writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive.
- In some embodiments, subsequent to writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive, the system writes the data asynchronously from the write cache to a non-volatile memory of each of the first storage drive, the second storage drive, and the third storage drive.
-
FIG. 1 illustrates an exemplary environment which demonstrates data placement in a distributed storage system with multiple available zones, in accordance with the prior art. -
FIG. 2 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. -
FIG. 3 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, including exemplary write operations, in accordance with an embodiment of the present application. -
FIG. 4 illustrates an exemplary environment which facilitates data placement in a distributed storage system with multiple available zones, including exemplary read operations, in accordance with an embodiment of the present application. -
FIG. 5 illustrates an exemplary hierarchy for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. -
FIG. 6A presents a flowchart illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. -
FIG. 6B presents a flowchart illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. -
FIG. 7 illustrates an exemplary computer system that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. -
FIG. 8 illustrates an exemplary apparatus that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. - In the figures, like reference numerals refer to the same figure elements.
- The following description is presented to enable any person skilled in the art to make and use the embodiments, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the embodiments described herein are not limited to the embodiments shown, but are to be accorded the widest scope consistent with the principles and features disclosed herein.
- The embodiments described herein solve the problem of increasing the efficiency and performance of a distributed storage system by providing a hierarchy of access layers, which can ensure the high availability of service and data while reducing the total cost of ownership.
- As described above, one example of a distributed storage system is a hyperscale storage system, which facilitates achieving a massive scale in computing, e.g., for big data or cloud computing. A hyperscale infrastructure must ensure both high availability and data reliability for the corresponding massive scale in computing. One way to ensure the high availability and data reliability in a hyperscale infrastructure is to use multiple available zones, which are constructed to synchronize data and provide service in a consistent manner. Each available zone may include multiple storage clusters, and each storage cluster may be deployed with a distributed file system which maintains multiple replicas of given data. For example, in a hyperscale infrastructure with three available zones, where each available zone has three replicas, the same data is stored nine times. As current applications continue to pursue and require fast access, all nine of the replicas may be stored on high-speed solid state drives (SSDs). However, these high-speed SSDs can be expensive. Meeting the needs of current applications in this way can thus result in a high cost for the overall storage system. As the hyperscale infrastructure scales out and grows, the ability to provide an efficient system which can both scale out and perform at a reasonable pace becomes critical. Furthermore, the total cost of ownership (TCO) can become a critical factor.
- The embodiments described herein address these challenges by providing a system which increases the efficiency and performance of a distributed storage system by providing a hierarchy of access layers. The system can use “low-cost SSDs” and “low-cost HDDs” in a layered hierarchy to store the multiple replicas of data required by a given application. An example of a low-cost SSD is a quad-level cell (QLC) SSD, which includes multi-level cell (MLC) memory elements which can store four bits of information, in contrast with a single-level cell (SLC) memory element which can only store a single bit of information. An MLC-based SSD can refer to a multi-level cell (MLC) which stores two bits, while a triple-level cell (TLC) stores three bits and a quad-level cell (QLC) stores four bits. In general, an SLC-based SSD has features which include a high capacity and a high endurance (e.g., can endure a large number of program/erase cycles), and are currently the most expensive SSDs. As the number of bits per cell increases (e.g., MLC→TLC→QLC), so decreases the cost of the associated SSD as well as the endurance. Hence, the QLC SSD is an example of a low-cost SSD.
- An example of a low-cost HDD is a shingled magnetic recording (SMR) drive, which writes data sequentially to overlapping or “shingled” tracks. An SMR HDD generally has a higher read latency than the read latency of a conventional magnetic recording (CMR) drive, but the SMR HDD is generally a lower cost alternative, and can be useful for sequential workloads where large amounts of data can be written sequentially, followed by random reads for processing and archive retrieval (e.g., video surveillance, object storage, and cloud services). Hence, an SMR drive is an example of a low-cost HDD.
- In the embodiments described herein, the system can construct available zones (or storage clusters) which include low-cost SSDs and low-cost HDDs. In one embodiment, an available zone can include one low-cost SSD and two low-cost HDDs, as described below in relation to
FIGS. 3 and 4 . The system can ensure the high availability of service and data without a loss in performance (e.g., in access latency and throughput), and can also provide a reduction of the total cost of ownership (TCO). - Each available zone or storage cluster can include multiple storage nodes, and each storage node can include multiple low-cost SSDs and multiple low-cost HDDs. In addition, each storage node can deploy a write cache. When a replica of a given data is to be written to an available zone (which can include, heterogeneous storage devices, e.g., one low-cost SSD and two low-cost HDDs on different storage nodes), each write cache associated with a specific storage node can be written simultaneously, and can thus provide execution of a low-latency write, followed by a commit to the host. Subsequently, the data in the write cache can be written asynchronously from the write cache to the non-volatile memory of the respective low-cost SSD or low-cost HDD. Data is thus written in a layered, hierarchical manner, which can result in an improved distributed storage system that can support a growing hyperscale infrastructure. An exemplary write operation is described below in relation to
FIG. 3 , and an exemplary read operation is described below in relation toFIG. 4 . - Thus, by constructing multiple available zones and by using the hierarchical layers of access, the embodiments described herein provide a distributed storage system which can ensure both high availability and data reliability to meet the increasing needs of current applications.
- A “storage drive” refers to a device or a drive with a non-volatile memory which can provide persistent storage of data, e.g., a solid state drive (SSD) or a hard disk drive (HDD).
- A “storage server” or a “storage node” refers to a computing device which can include multiple storage drives. A distributed storage system can include multiple storage servers or storage nodes.
- A “compute node” refers to a computing device which can perform as a client device or a host device. A distributed storage system can include multiple compute nodes.
- A “storage cluster” or an “available zone” is a grouping of storage servers, storage nodes, or storage drives in a distributed storage system.
- A “low-cost SSD” refers to an SSD which has a lower cost compared to currently available SSDs, and may have a lower endurance than other currently available SSDS. An example of a low-cost SSD is a QLC SSD.
- A “low-cost HDD” refers to a HDD which has a lower cost compared to currently available HDDs, and may have a higher read or access latency than other currently available HDDs. An example of a low-cost HDD is an SMR HDD. While the embodiments and Figures described herein refer to low-cost SSDs and low-cost HDDs, in some embodiments, the storage drives depicted as low-cost SSDs and low-cost HDDs can include other types of storage drives, including but not limited to: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
- Inefficiency of Data Placement in a Distributed Storage System with Multiple Available Zones in the Prior Art
-
FIG. 1 illustrates anexemplary environment 100 which demonstrates data placement in a distributed storage system with multiple available zones, in accordance with the prior art.Environment 100 includes three available zones (AZ): anavailable zone 1 102; anavailable zone 2 104; and anavailable zone 3 106. The distributed storage system can use the multiple available zones to synchronize data and provide service in a consistent manner, by storing multiple replicas of given data on each available zone. Inenvironment 100, each available zone can indicate a storage cluster, and each cluster can be deployed with a distributed file system which maintains three replicas on each cluster. - For example,
available zone 3 106 can include a compute cluster 112 and a storage cluster 114. Storage cluster 114 can include three storage drives, and each storage drive can include a copy of given data. Astorage drive 122 can include adata_copy1 124; astorage drive 126 can include adata_copy2 128; and astorage drive 130 can include adata_copy3 132. Thus, the distributed storage system with three storage clusters (e.g., AZs) depicted inenvironment 100 stores the data nine separate times. As current applications require fast access to stored data, each storage drive can be a standard solid state drive (SSD). These standard SSDs are expensive, which can lead to a high overall total cost of ownership (TCO). - Exemplary Environment and Architecture for Facilitating Data Placement in a Distributed Storage System with Multiple Zones
-
FIG. 2 illustrates anexemplary environment 200 which facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.Environment 200 can be a distributed storage system which includes compute nodes and storage nodes, which communicate with each other over a data center network 202 (e.g., viacommunications 204 and 206). Each compute node can include a read cache, and each storage node can include a write cache. For example, acompute node 212 can include aread cache 214, acompute node 216 can include aread cache 218, and acompute node 220 can include aread cache 222. Astorage node 232 can include awrite cache 234. Similarly,storage nodes caches - In addition to a write cache, each storage node can include multiple storage drives, including a low-cost SSD and a low-cost HDD. This is in contrast to conventional storage nodes which include only the conventional high-cost SSDs, as described above in relation to
FIG. 1 . For example,storage node 232 can include a low-cost SSD 236 and a low-cost HDD 238. Similarly,storage nodes - Each storage node can maintain the low-cost SSD to ensure the shorter (e.g., faster) read latency as compared to an HDD. While each storage node in
environment 200 is depicted as including only one low-cost SSD and one low-cost HDD, each storage node can include any number of low-cost SSDs and low-cost HDDs. Furthermore, the exemplary environments ofFIGS. 3 and 4 depict one embodiment of the present application, in which each storage node, storage cluster, or available zone includes one low-cost SSD and two low-cost HDDs. - The system can use the read cache of a compute node to decrease the average latency of a read operation, and can also use the write cache of a storage node to decrease the average latency of a write operation. Exemplary environments for facilitating communications via
communications FIG. 3 (write operation) andFIG. 4 (read operation). -
FIG. 3 illustrates anexemplary environment 300 which facilitates data placement in a distributed storage system with multiple available zones, including exemplary write operations, in accordance with an embodiment of the present application.Environment 300 can include a storage node, storage cluster, or available zone which includes multiple drives, such as a low-cost SSD 236, a low-cost HDD 248, and a low-cost HDD 268. Note that these multiple drives can be part of a same storage cluster or available zone, but can reside on different storage nodes/servers. In some embodiments, these multiple drives (236, 248, and 268) can reside on the same storage node/server. The multiple drives can communicate with compute nodes or other client devices via adata center network 302. - During operation, the system can receive a user write operation 312 via a
communication 304. The system can write data corresponding to user write operation 312 to multiple write caches simultaneously (e.g., via a communication 306). For example, at the same time or in a parallel manner, the system can: writedata 372 to writecache 1 234 associated with low-cost SSD 236; writedata 382 to writecache 2 244 associated with low-cost HDD 248; and writedata 392 to writecache 3 264 associated with low-cost HDD 268. Once the data (372, 382, and 392) is successfully written to the respective write cache, the system can commit the current write operation. That is, the system can send an acknowledgement to the host confirming the successful write (e.g., viaacknowledgments - For example, after
data 372 is successfully stored inwrite cache 1 234, the system can perform anasynchronous write 342 to write the data fromwrite cache 1 234 to the non-volatile memory of low-cost SSD 236. Similarly, afterdata 382 is successfully stored inwrite cache 2 244, the system can perform anasynchronous write 352 to write the data fromwrite cache 2 244 to the non-volatile memory of low-cost HDD 248. Additionally, afterdata 392 is successfully stored inwrite cache 3 264, the system can perform anasynchronous write 362 to write the data fromwrite cache 3 264 to the non-volatile memory of low-cost HDD 268. - The asynchronous write can be part of a background operation, and can remain not visible to a front-end user. That is, the asynchronous write operation can be performed without affecting the front-end user. Furthermore, an asynchronous write (e.g., asynchronous write 342) can be performed subsequent to the successful write of data to a respective write cache (e.g.,
data 372 to writecache 1 234), or upon sending the acknowledgment to the host (e.g., acknowledgment 374). Thus, the system can efficiently serve the front-end user with a low latency for the write operation, via the low-capacity, low-latency write cache. -
FIG. 4 illustrates anexemplary environment 400 which facilitates data placement in a distributed storage system with multiple available zones, including exemplary read operations, in accordance with an embodiment of the present application. Similar toenvironment 300 ofFIG. 3 ,environment 400 can include an available zone which includes low-cost SSD 236, low-cost HDD 248, and low-cost HDD 268, along with their respective write caches (e.g., 234, 244, and 264). These drives can communicate with compute nodes or other client devices via a data center network 402 (viacommunications 426 and 430). - During operation, the system can receive a user read operation 412. The system can initially check the read cache of the corresponding compute node (or another compute node, depending on the configuration of the compute nodes). For example, the system can check, via a
communication 422, whetherread cache 414 stores the requested data. If it does, the system can return the requested data via acommunication 424. If it does not, the system can return a message, viacommunication 424, indicating that the requested data is not stored in the read cache. - The system can then issue the read request to both a low-cost SSD and a low-cost HDD of an available zone. The system can pick the low-cost HDD randomly. For example, the system can identify low-
cost SSD 236 and low-cost HDD 268 of an available zone. The system can issue the read request to both low-cost SSD 236 (via an operation 432) and low-cost HDD 268 (via an operation 436). If the requested data can be obtained from (i.e., is stored on) low-cost SSD 236, the system can read the requested data from low-cost SSD 236, and return the requested data to the host (via a communication 434). The system can drop the data obtained, if any, from low-cost HDD 268 in response to the read request (via communication 436), and can also report a fault associated with low-cost SSD 236. - If the requested data cannot be obtained from (i.e., is not stored on) low-
cost SSD 236, the system can read the requested data from low-cost HDD 268, and return the requested data to the host (via a communication 438). If the requested data cannot be obtained from either low-cost SSD 236 or low-cost HDD 268, the system can report a fault associated with low-cost HDD 268, and can also identify another low-cost HDD on which the data is stored (e.g., low-cost HDD 248, which is part of the same available zone as low-cost SSD 236 and low-cost HDD 268). The system can then issue the read request to low-cost HDD 248. If the requested data can be obtained from (i.e., is stored on) low-cost HDD 248, the system can read the requested data from low-cost HDD 248, and return the requested data to the host (via a communication, not shown). - If the requested data cannot be obtained from (i.e., is not stored on) low-
cost HDD 248, the system can report a fault associated with low-cost HDD 248, and can also generate a message or notification indicating that the requested data is not available from the available zone comprising low-cost SSD, low-cost HDD 248, and low-cost HDD 268. The notification can further indicate to recover the requested data from another available zone. - Note that in
environment 400, upon a read cache miss, the read request is always issued to both the low-cost SSD and the (randomly selected) low-cost HDD. This hierarchy allows the system to provide control over the long-tail latency, such that if the read operation from the low-cost SSD encounters an error, the system can proceed with the simultaneously issued read operation from the first low-cost HDD. The second low-cost HDD provides an additional backup layer. - Thus, by providing the distributed layers in the manner described in
FIGS. 4 and 5 , the embodiments of the system described herein can provide a more efficient and distributed storage system and a reduced total cost of ownership for multiple available zones. -
FIG. 5 illustrates anexemplary hierarchy 500 for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.Hierarchy 500 can include aread cache layer 502, awrite cache layer 504, a normal read layer (from low-cost SSD) 506, and a backup read layer (from low-cost HDD) 508. Readcache layer 502 can include features such as a large capacity, local access, and a short read latency (e.g., readcache 214 ofFIGS. 2 and 4 ).Write cache layer 504 can include features such as a small capacity, global access, storage of multiple replicas, a short write latency, and a high endurance (e.g., writecache 234 ofFIGS. 2 and 3 ). - Normal read layer (from low-cost SSD) 506 can include features such as a large capacity, global access, storage for multiple replicas, a short read latency, and a low endurance (e.g., via
communications 432/434 with low-cost SSD 236 ofFIG. 4 ). Backup read layer (from low-cost HDD) 508 can include features such as a large capacity, global access, storage for multiple replicas, a long read latency, and a high endurance (e.g., viacommunications 436/438 with low-cost HDD 268 ofFIG. 4 ). - The low-cost high-capacity read cache can improve the overall performance of a distributed storage system by reducing the average read latency. The write cache, which has a small capacity and a low latency, can lead to an improvement in the overall performance of the distributed storage system while maintaining a limited cost. For example, assuming that the dollars per Gigabyte ($/GB) comparison between a conventional high-cost SSD and an HDD is 10 to 1, by storing two of the three copies of data on a low-cost HDD rather than on a conventional high-cost SSD, the system can provide an improved cost ratio of (10+10+10) to (10+1+1), which is 30 to 12, or 5 to 2, which is significantly better than the conventional 10 to 1 ratio, thus resulting in a significant cost savings and overall efficiency of the distributed storage system.
-
FIG. 6A presents aflowchart 600 illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. During operation, the system receives, from a host, a request to read data (operation 602). The system determines whether the data is available in a read cache (operation 604). If the system determines that the data is available in the read cache (i.e., a read cache hit) (decision 606), the operation continues atoperation 640 ofFIG. 6B . - If the system determines that the data is not available in the read cache (i.e., a read cache miss) (decision 606), the system determines paths to replicas of the requested data stored on at least a solid state drive (SSD), a first hard disk drive (HDD1), and a second hard disk drive (HDD2) (operation 608). The system can store replicas on one or more available zones, and a first available zone can include the SSD, HDD1, and HDD2. The system can select HDD1 randomly from a plurality of HDDs in the first available zone. The system issues the read request to the solid state drive (SSD) and the first hard disk drive (HDD1) (operation 610). The system reads the data from the solid state drive and the first hard disk drive (operation 612). If the system successfully reads the requested data from the solid state drive (decision 614), the operation continues at
operation 640 ofFIG. 6B . If the system unsuccessfully reads the requested data from the solid state drive (decision 614), the system reports a fault associated with the solid state drive (operation 616), and the operation continues at Label A ofFIG. 6B . -
FIG. 6B presents aflowchart 620 illustrating a method for facilitating data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. During operation, if the system successfully reads the requested data from the first hard disk drive (decision 622), the operation continues atoperation 640. If the system unsuccessfully reads the requested data from the first hard disk drive (decision 622), the system reports a fault associated with the first hard disk drive (operation 624). The system issues the read request to and reads the data from the second hard disk drive (operation 626). - If the system successfully reads the requested data from the second hard disk drive (decision 628), the operation continues at
operation 640 ofFIG. 6B . The system sends the requested data to the host (operation 640), and the operation returns. - If the system unsuccessfully reads the requested data from the second hard disk drive (decision 628), the system reports a fault associated with the second hard disk drive (operation 630). The system generates a notification that the requested data is not available from a first storage cluster (or a first available zone) comprising the SSD, HDD1, and HDD2, wherein the notification indicates to recover the requested data from a second storage cluster or a second available zone (operation 632), and the operation returns.
-
FIG. 7 illustrates anexemplary computer system 700 that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application.Computer system 700 includes aprocessor 702, acontroller 704, avolatile memory 706, and astorage device 708.Volatile memory 706 can include, e.g., random access memory (RAM), that serves as a managed memory, and can be used to store one or more memory pools.Storage device 708 can include persistent storage which can be managed or accessed viacontroller 704. Furthermore,computer system 700 can be coupled to peripheral input/output user devices 710, such as adisplay device 711, akeyboard 712, and apointing device 714.Storage device 708 can store anoperating system 716, a content-processing system 718, anddata 732. - Content-
processing system 718 can include instructions, which when executed bycomputer system 700, can causecomputer system 700 to perform methods and/or processes described in this disclosure. Specifically, content-processing system 718 can include instructions for receiving and transmitting data packets, including data to be read or written, a read request, and a write request (communication module 720). - Content-
processing system 718 can further include instructions for receiving, from a host, a request to read data (communication module 720). Content-processing system 718 can include instructions for determining that the data is not available in a read cache (read cache-managing module 722). Content-processing system 718 can include instructions for issuing the read request to a solid state drive and a first hard disk drive (request-issuing module 724). Content-processing system 718 can include instructions for, in response to unsuccessfully reading the requested data from the solid state drive (SSD-managing module 726) and successfully reading the requested data from the first hard disk drive (HDD-managing module 728), sending the requested data to the host (communication module 720). Content-processing system 718 can include instructions for, in response to unsuccessfully reading the requested data from both the solid state drive and the first hard disk drive (SSD-managingmodule 726 and HDD-managing module 728): issuing the read request to a second hard disk drive (request-issuing module 724); and sending the requested data to the host (communication module 720). - Content-
processing system 718 can include instructions for identifying, based on previously stored path information, the solid state drive, the first hard disk drive, and the second hard disk drive (zone-managing module 730). Content-processing system 718 can include instructions for selecting, from a plurality of hard disk drives on which the data is stored, the first hard drive (zone-managing module 730). Content-processing system 718 can include instructions for selecting, from the plurality of hard disk drives, the second hard disk drive (zone-managing module 730). -
Data 732 can include any data that is required as input or that is generated as output by the methods and/or processes described in this disclosure. Specifically,data 732 can store at least: data; a request; a read request; a write request; data associated with a read cache or a write cache; path information for drives in a storage cluster or available zone; an identification or indicator of an available zone, a solid state drive, a hard disk drive, or other storage device; an indicator of a fault; a message; a notification; a replica; a copy of data; and an indicator of multiple available zones. -
FIG. 8 illustrates an exemplary apparatus 800 that facilitates data placement in a distributed storage system with multiple available zones, in accordance with an embodiment of the present application. Apparatus 800 can comprise a plurality of units or apparatuses which may communicate with one another via a wired, wireless, quantum light, or electrical communication channel. Apparatus 800 may be realized using one or more integrated circuits, and may include fewer or more units or apparatuses than those shown inFIG. 8 . Further, apparatus 800 may be integrated in a computer system, or realized as a separate device which is capable of communicating with other computer systems and/or devices. Specifically, apparatus 800 can comprise units 802-812 which perform functions or operations similar to modules 720-730 ofcomputer system 700 ofFIG. 7 , including: acommunication unit 802; a read cache-managingunit 804; a request-issuingunit 806; an SSD-managingunit 808; an HDD-managingunit 810; and a zone-managingunit 812. - The data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. The computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed.
- The methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above. When a computer system reads and executes the code and/or data stored on the computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the computer-readable storage medium.
- Furthermore, the methods and processes described above can be included in hardware modules. For example, the hardware modules can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs), and other programmable-logic devices now known or later developed. When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.
- The foregoing embodiments described herein have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the embodiments described herein to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the embodiments described herein. The scope of the embodiments described herein is defined by the appended claims.
Claims (20)
1. A computer-implemented method for facilitating data placement, the method comprising:
receiving, from a host, a request to read data;
determining that the data is not available in a read cache;
issuing the read request to a first storage drive and a second storage drive of a different type than the first storage drive;
in response to unsuccessfully reading the requested data from the first storage drive and successfully reading the requested data from the second storage drive, sending the requested data to the host; and
in response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive:
issuing the read request to a third storage drive; and
sending the requested data to the host.
2. The method of claim 1 , further comprising:
identifying, based on previously stored path information, the first storage drive, the second storage drive, and the third storage drive;
selecting, from a plurality of storage drives on which the data is stored, the second storage drive; and
selecting, from the plurality of storage drives, the third storage drive.
3. The method of claim 1 , wherein in response to successfully reading the requested data from the first storage drive, the method further comprises:
sending the requested data to the host; and
dropping data read from the second storage drive.
4. The method of claim 1 ,
wherein in response to unsuccessfully reading the requested data from the first storage drive, the method further comprises reporting a fault associated with the first storage drive; and
wherein in response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive, the method further comprises reporting a fault associated with the second storage drive.
5. The method of claim 1 ,
wherein the third storage drive is of a same or a different type as the second storage drive, and
wherein a type for the first storage drive, the second storage drive, and the third storage drive comprises one or more of:
a solid state drive;
a hard disk drive; and
a storage medium which comprises one or more of: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
6. The method of claim 1 , wherein in response to unsuccessfully reading the requested data from the third storage drive, the method further comprises:
reporting a fault associated with the third storage drive; and
generating a notification indicating that the requested data is not available from a first available zone comprising the first storage drive, the second storage drive, and the third storage drive,
wherein the notification further indicates to recover the requested data from a second available zone.
7. The method of claim 1 , wherein the first storage drive, the second storage drive, and the third storage drive comprise a first available zone of a plurality of available zones, and wherein replicas of the requested data are stored in a respective available zone.
8. The method of claim 1 , wherein prior to receiving the request to read the data, the method further comprises receiving a request to write the data to the first storage drive, the second storage drive, and the third storage drive, which involves:
simultaneously writing the data to a write cache of each of the first storage drive, the second storage drive, and the third storage drive; and
committing the write request upon successfully writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive.
9. The method of claim 8 , wherein subsequent to writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive, the method further comprises:
writing the data asynchronously from the write cache to a non-volatile memory of each of the first storage drive, the second storage drive, and the third storage drive.
10. A computer system for facilitating data placement, the system comprising:
a processor; and
a memory coupled to the processor and storing instructions, which when executed by the processor cause the processor to perform a method, wherein the computer system is a storage device, the method comprising:
receiving, from a host, a request to read data;
determining that the data is not available in a read cache;
issuing the read request to a first storage drive and a second storage drive of a different type than the first storage drive;
in response to unsuccessfully reading the requested data from the first storage drive and successfully reading the requested data from the second storage drive, sending the requested data to the host; and
in response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive:
issuing the read request to a third storage drive; and
sending the requested data to the host.
11. The computer system of claim 10 , wherein the method further comprises:
identifying, based on previously stored path information, the first storage drive, the second storage drive, and the third storage drive;
selecting, from a plurality of storage drives on which the data is stored, the second storage drive; and
selecting, from the plurality of storage drives, the third storage drive.
12. The computer system of claim 10 , wherein in response to successfully reading the requested data from the first storage drive, the method further comprises:
sending the requested data to the host; and
dropping data read from the second storage drive.
13. The computer system of claim 10 ,
wherein in response to unsuccessfully reading the requested data from the first storage drive, the method further comprises reporting a fault associated with the first storage drive; and
wherein in response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive, the method further comprises reporting a fault associated with the second storage drive.
14. The computer system of claim 10 ,
wherein the third storage drive is of a same or a different type as the second storage drive, and
wherein a type for the first storage drive, the second storage drive, and the third storage drive comprises one or more of:
a solid state drive;
a hard disk drive; and
a storage medium which comprises one or more of: magnetoresistive random-access memory (MRAM); resistive RAM (ReRAM); phase change memory (PCM); nano-RAM (NRAM); and ferroelectric RAM (FRAM).
15. The computer system of claim 10 , wherein in response to unsuccessfully reading the requested data from the third storage drive, the method further comprises:
reporting a fault associated with the third storage drive; and
generating a notification indicating that the requested data is not available from a first available zone comprising the first storage drive, the second storage drive, and the third storage drive,
wherein the notification further indicates to recover the requested data from a second available zone.
16. The computer system of claim 10 , wherein the first storage drive, the second storage drive, and the third storage drive comprise a first available zone of a plurality of available zones, and wherein replicas of the requested data are stored in a respective available zone.
17. The computer system of claim 10 , wherein prior to receiving the request to read the data, the method further comprises receiving a request to write the data to the first storage drive, the second storage drive, and the third storage drive, which involves:
simultaneously writing the data to a write cache of each of the first storage drive, the second storage drive, and the third storage drive; and
committing the write request upon successfully writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive.
18. The computer system of claim 17 , wherein subsequent to writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive, the method further comprises:
writing the data asynchronously from the write cache to a non-volatile memory of each of the first storage drive, the second storage drive, and the third storage drive.
19. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method, the method comprising:
receiving, from a host, a request to read data;
determining that the data is not available in a read cache;
issuing the read request to a first storage drive and a second storage drive of a different type than the first storage drive;
in response to unsuccessfully reading the requested data from the first storage drive and successfully reading the requested data from the second storage drive, sending the requested data to the host; and
in response to unsuccessfully reading the requested data from both the first storage drive and the second storage drive:
issuing the read request to a third storage drive; and
sending the requested data to the host.
20. The storage medium of claim 19 , wherein prior to receiving the request to read the data, the method further comprises receiving a request to write the data to the first storage drive, the second storage drive, and the third storage drive, which involves:
simultaneously writing the data to a write cache of each of the first storage drive, the second storage drive, and the third storage drive; and
committing the write request upon successfully writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive; and
wherein subsequent to writing the data to the write cache of each of the first storage drive, the second storage drive, and the third storage drive, the method further comprises:
writing the data asynchronously from the write cache to a non-volatile memory of each of the first storage drive, the second storage drive, and the third storage drive.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/277,708 US10970212B2 (en) | 2019-02-15 | 2019-02-15 | Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/277,708 US10970212B2 (en) | 2019-02-15 | 2019-02-15 | Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200264978A1 true US20200264978A1 (en) | 2020-08-20 |
US10970212B2 US10970212B2 (en) | 2021-04-06 |
Family
ID=72042102
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/277,708 Active 2039-03-02 US10970212B2 (en) | 2019-02-15 | 2019-02-15 | Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones |
Country Status (1)
Country | Link |
---|---|
US (1) | US10970212B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11960724B2 (en) * | 2021-09-13 | 2024-04-16 | SK Hynix Inc. | Device for detecting zone parallelity of a solid state drive and operating method thereof |
Family Cites Families (186)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US1001316A (en) | 1910-12-27 | 1911-08-22 | Frank Edward Smith | Angle-gage for squares. |
US3893071A (en) | 1974-08-19 | 1975-07-01 | Ibm | Multi level error correction system for high density memory |
NL8402411A (en) | 1984-08-02 | 1986-03-03 | Philips Nv | DEVICE FOR CORRECTING AND MASKING ERRORS IN AN INFORMATION FLOW, AND DISPLAY FOR DISPLAYING IMAGES AND / OR SOUND PROVIDED WITH SUCH A DEVICE. |
EP0681721B1 (en) | 1993-02-01 | 2005-03-23 | Sun Microsystems, Inc. | Archiving file system for data servers in a distributed network environment |
US5394382A (en) | 1993-02-11 | 1995-02-28 | International Business Machines Corporation | Method for the organization of data on a CD-ROM |
JP3215237B2 (en) | 1993-10-01 | 2001-10-02 | 富士通株式会社 | Storage device and method for writing / erasing storage device |
US5732093A (en) | 1996-02-08 | 1998-03-24 | United Microelectronics Corporation | Error correction method and apparatus on optical disc system |
US6148377A (en) | 1996-11-22 | 2000-11-14 | Mangosoft Corporation | Shared memory computer networks |
US5930167A (en) | 1997-07-30 | 1999-07-27 | Sandisk Corporation | Multi-state non-volatile flash memory capable of being its own two state write cache |
US6098185A (en) | 1997-10-31 | 2000-08-01 | Stmicroelectronics, N.V. | Header-formatted defective sector management system |
US7200623B2 (en) | 1998-11-24 | 2007-04-03 | Oracle International Corp. | Methods to perform disk writes in a distributed shared disk system needing consistency across failures |
US6421787B1 (en) | 1998-05-12 | 2002-07-16 | Sun Microsystems, Inc. | Highly available cluster message passing facility |
US7966462B2 (en) | 1999-08-04 | 2011-06-21 | Super Talent Electronics, Inc. | Multi-channel flash module with plane-interleaved sequential ECC writes and background recycling to restricted-write flash chips |
US6457104B1 (en) | 2000-03-20 | 2002-09-24 | International Business Machines Corporation | System and method for recycling stale memory content in compressed memory systems |
US6658478B1 (en) * | 2000-08-04 | 2003-12-02 | 3Pardata, Inc. | Data storage system |
US6694451B2 (en) | 2000-12-07 | 2004-02-17 | Hewlett-Packard Development Company, L.P. | Method for redundant suspend to RAM |
KR100856399B1 (en) | 2002-01-23 | 2008-09-04 | 삼성전자주식회사 | Decoding method and apparatus therefor |
US20030163633A1 (en) | 2002-02-27 | 2003-08-28 | Aasheim Jered Donald | System and method for achieving uniform wear levels in a flash memory device |
US7533214B2 (en) | 2002-02-27 | 2009-05-12 | Microsoft Corporation | Open architecture flash driver |
US6988165B2 (en) | 2002-05-20 | 2006-01-17 | Pervasive Software, Inc. | System and method for intelligent write management of disk pages in cache checkpoint operations |
US7953899B1 (en) * | 2002-08-21 | 2011-05-31 | 3Par Inc. | Universal diagnostic hardware space access system for firmware |
US7239605B2 (en) * | 2002-09-23 | 2007-07-03 | Sun Microsystems, Inc. | Item and method for performing a cluster topology self-healing process in a distributed data system cluster |
US7003620B2 (en) | 2002-11-26 | 2006-02-21 | M-Systems Flash Disk Pioneers Ltd. | Appliance, including a flash memory, that is robust under power failure |
US7173863B2 (en) | 2004-03-08 | 2007-02-06 | Sandisk Corporation | Flash controller cache architecture |
US7130957B2 (en) | 2004-02-10 | 2006-10-31 | Sun Microsystems, Inc. | Storage system structure for storing relational cache metadata |
US7676603B2 (en) | 2004-04-20 | 2010-03-09 | Intel Corporation | Write combining protocol between processors and chipsets |
JP4401895B2 (en) * | 2004-08-09 | 2010-01-20 | 株式会社日立製作所 | Computer system, computer and its program. |
DE102005032061B4 (en) | 2005-07-08 | 2009-07-30 | Qimonda Ag | Memory module, and memory module system |
US7752382B2 (en) | 2005-09-09 | 2010-07-06 | Sandisk Il Ltd | Flash memory storage system and method |
US7631162B2 (en) | 2005-10-27 | 2009-12-08 | Sandisck Corporation | Non-volatile memory with adaptive handling of data writes |
JP2007305210A (en) | 2006-05-10 | 2007-11-22 | Toshiba Corp | Semiconductor storage device |
EP2033128A4 (en) | 2006-05-31 | 2012-08-15 | Ibm | Method and system for transformation of logical data objects for storage |
US7711890B2 (en) | 2006-06-06 | 2010-05-04 | Sandisk Il Ltd | Cache control in a non-volatile memory device |
US20080065805A1 (en) | 2006-09-11 | 2008-03-13 | Cameo Communications, Inc. | PCI-Express multimode expansion card and communication device having the same |
JP2008077810A (en) | 2006-09-25 | 2008-04-03 | Toshiba Corp | Nonvolatile semiconductor storage device |
US7761623B2 (en) | 2006-09-28 | 2010-07-20 | Virident Systems, Inc. | Main memory in a system with a memory controller configured to control access to non-volatile memory, and related technologies |
KR100858241B1 (en) | 2006-10-25 | 2008-09-12 | 삼성전자주식회사 | Hybrid-flash memory device and method for assigning reserved blocks therof |
US8344475B2 (en) | 2006-11-29 | 2013-01-01 | Rambus Inc. | Integrated circuit heating to effect in-situ annealing |
US7958433B1 (en) | 2006-11-30 | 2011-06-07 | Marvell International Ltd. | Methods and systems for storing data in memory using zoning |
US7852654B2 (en) | 2006-12-28 | 2010-12-14 | Hynix Semiconductor Inc. | Semiconductor memory device, and multi-chip package and method of operating the same |
US7599139B1 (en) | 2007-06-22 | 2009-10-06 | Western Digital Technologies, Inc. | Disk drive having a high performance access mode and a lower performance archive mode |
US7917574B2 (en) | 2007-10-01 | 2011-03-29 | Accenture Global Services Limited | Infrastructure for parallel programming of clusters of machines |
IL187041A0 (en) | 2007-10-30 | 2008-02-09 | Sandisk Il Ltd | Optimized hierarchical integrity protection for stored data |
US8281061B2 (en) | 2008-03-31 | 2012-10-02 | Micron Technology, Inc. | Data conditioning to improve flash memory reliability |
US8195978B2 (en) | 2008-05-16 | 2012-06-05 | Fusion-IO. Inc. | Apparatus, system, and method for detecting and replacing failed data storage |
KR101497074B1 (en) | 2008-06-17 | 2015-03-05 | 삼성전자주식회사 | Non-volatile memory system and data manage method thereof |
US9123422B2 (en) | 2012-07-02 | 2015-09-01 | Super Talent Technology, Corp. | Endurance and retention flash controller with programmable binary-levels-per-cell bits identifying pages or blocks as having triple, multi, or single-level flash-memory cells |
US8959280B2 (en) | 2008-06-18 | 2015-02-17 | Super Talent Technology, Corp. | Super-endurance solid-state drive with endurance translation layer (ETL) and diversion of temp files for reduced flash wear |
JP2010152704A (en) | 2008-12-25 | 2010-07-08 | Hitachi Ltd | System and method for operational management of computer system |
US20100217952A1 (en) | 2009-02-26 | 2010-08-26 | Iyer Rahul N | Remapping of Data Addresses for a Large Capacity Victim Cache |
US8166233B2 (en) | 2009-07-24 | 2012-04-24 | Lsi Corporation | Garbage collection for solid state disks |
US20100332922A1 (en) | 2009-06-30 | 2010-12-30 | Mediatek Inc. | Method for managing device and solid state disk drive utilizing the same |
US20110055471A1 (en) | 2009-08-28 | 2011-03-03 | Jonathan Thatcher | Apparatus, system, and method for improved data deduplication |
US8214700B2 (en) | 2009-10-28 | 2012-07-03 | Sandisk Technologies Inc. | Non-volatile memory and method with post-write read and adaptive re-write to manage errors |
US8144512B2 (en) | 2009-12-18 | 2012-03-27 | Sandisk Technologies Inc. | Data transfer flows for on-chip folding |
US8443263B2 (en) | 2009-12-30 | 2013-05-14 | Sandisk Technologies Inc. | Method and controller for performing a copy-back operation |
US8631304B2 (en) | 2010-01-28 | 2014-01-14 | Sandisk Il Ltd. | Overlapping error correction operations |
TWI409633B (en) | 2010-02-04 | 2013-09-21 | Phison Electronics Corp | Flash memory storage device, controller thereof, and method for programming data |
US8370297B2 (en) | 2010-03-08 | 2013-02-05 | International Business Machines Corporation | Approach for optimizing restores of deduplicated data |
US9401967B2 (en) | 2010-06-09 | 2016-07-26 | Brocade Communications Systems, Inc. | Inline wire speed deduplication system |
US8938624B2 (en) | 2010-09-15 | 2015-01-20 | Lsi Corporation | Encryption key destruction for secure data erasure |
US9244779B2 (en) | 2010-09-30 | 2016-01-26 | Commvault Systems, Inc. | Data recovery operations, such as recovery from modified network data management protocol data |
US20120089774A1 (en) | 2010-10-12 | 2012-04-12 | International Business Machines Corporation | Method and system for mitigating adjacent track erasure in hard disk drives |
US8429495B2 (en) | 2010-10-19 | 2013-04-23 | Mosaid Technologies Incorporated | Error detection and correction codes for channels and memories with incomplete error characteristics |
US9176794B2 (en) | 2010-12-13 | 2015-11-03 | Advanced Micro Devices, Inc. | Graphics compute process scheduling |
US10817421B2 (en) | 2010-12-13 | 2020-10-27 | Sandisk Technologies Llc | Persistent data structures |
US8793328B2 (en) * | 2010-12-17 | 2014-07-29 | Facebook, Inc. | Distributed storage system |
US8826098B2 (en) | 2010-12-20 | 2014-09-02 | Lsi Corporation | Data signatures to determine successful completion of memory backup |
US8819328B2 (en) | 2010-12-30 | 2014-08-26 | Sandisk Technologies Inc. | Controller and method for performing background operations |
US9612978B2 (en) | 2010-12-31 | 2017-04-04 | International Business Machines Corporation | Encrypted flash-based data storage system with confidentiality mode |
WO2012109679A2 (en) | 2011-02-11 | 2012-08-16 | Fusion-Io, Inc. | Apparatus, system, and method for application direct virtual memory management |
US9141527B2 (en) | 2011-02-25 | 2015-09-22 | Intelligent Intellectual Property Holdings 2 Llc | Managing cache pools |
CN102693168B (en) * | 2011-03-22 | 2014-12-31 | 中兴通讯股份有限公司 | A method, a system and a service node for data backup recovery |
US20180107591A1 (en) | 2011-04-06 | 2018-04-19 | P4tents1, LLC | System, method and computer program product for fetching data between an execution of a plurality of threads |
US8832402B2 (en) | 2011-04-29 | 2014-09-09 | Seagate Technology Llc | Self-initiated secure erasure responsive to an unauthorized power down event |
US9235482B2 (en) * | 2011-04-29 | 2016-01-12 | International Business Machines Corporation | Consistent data retrieval in a multi-site computing infrastructure |
WO2012161659A1 (en) | 2011-05-24 | 2012-11-29 | Agency For Science, Technology And Research | A memory storage device, and a related zone-based block management and mapping method |
US9344494B2 (en) * | 2011-08-30 | 2016-05-17 | Oracle International Corporation | Failover data replication with colocation of session state data |
US8904158B2 (en) | 2011-09-02 | 2014-12-02 | Lsi Corporation | Storage system with boot appliance for improving reliability/availability/serviceability in high density server environments |
US8843451B2 (en) | 2011-09-23 | 2014-09-23 | International Business Machines Corporation | Block level backup and restore |
KR20130064518A (en) | 2011-12-08 | 2013-06-18 | 삼성전자주식회사 | Storage device and operation method thereof |
US9088300B1 (en) | 2011-12-15 | 2015-07-21 | Marvell International Ltd. | Cyclic redundancy check for out-of-order codewords |
US8904061B1 (en) * | 2011-12-30 | 2014-12-02 | Emc Corporation | Managing storage operations in a server cache |
US9043545B2 (en) | 2012-01-06 | 2015-05-26 | Netapp, Inc. | Distributing capacity slices across storage system nodes |
US9251086B2 (en) | 2012-01-24 | 2016-02-02 | SanDisk Technologies, Inc. | Apparatus, system, and method for managing a cache |
US8880815B2 (en) | 2012-02-20 | 2014-11-04 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Low access time indirect memory accesses |
US9362003B2 (en) | 2012-03-09 | 2016-06-07 | Sandisk Technologies Inc. | System and method to decode data subject to a disturb condition |
US9336340B1 (en) | 2012-03-30 | 2016-05-10 | Emc Corporation | Evaluating management operations |
US9208820B2 (en) | 2012-06-29 | 2015-12-08 | International Business Machines Corporation | Optimized data placement for individual file accesses on deduplication-enabled sequential storage systems |
US20140019650A1 (en) | 2012-07-10 | 2014-01-16 | Zhi Bin Li | Multi-Write Bit-Fill FIFO |
US9009402B2 (en) | 2012-09-20 | 2015-04-14 | Emc Corporation | Content addressable storage in legacy systems |
US8756237B2 (en) | 2012-10-12 | 2014-06-17 | Architecture Technology Corporation | Scalable distributed processing of RDF data |
US9141554B1 (en) | 2013-01-18 | 2015-09-22 | Cisco Technology, Inc. | Methods and apparatus for data processing using data compression, linked lists and de-duplication techniques |
US8751763B1 (en) | 2013-03-13 | 2014-06-10 | Nimbus Data Systems, Inc. | Low-overhead deduplication within a block-based data storage |
US9280472B1 (en) | 2013-03-13 | 2016-03-08 | Western Digital Technologies, Inc. | Caching data in a high performance zone of a data storage system |
US9747202B1 (en) | 2013-03-14 | 2017-08-29 | Sandisk Technologies Llc | Storage module and method for identifying hot and cold data |
US9195673B2 (en) | 2013-03-15 | 2015-11-24 | International Business Machines Corporation | Scalable graph modeling of metadata for deduplicated storage systems |
US10073626B2 (en) | 2013-03-15 | 2018-09-11 | Virident Systems, Llc | Managing the write performance of an asymmetric memory system |
KR102039537B1 (en) | 2013-03-15 | 2019-11-01 | 삼성전자주식회사 | Nonvolatile storage device and os image program method thereof |
US9436595B1 (en) | 2013-03-15 | 2016-09-06 | Google Inc. | Use of application data and garbage-collected data to improve write efficiency of a data storage device |
US20140304452A1 (en) | 2013-04-03 | 2014-10-09 | Violin Memory Inc. | Method for increasing storage media performance |
KR101478168B1 (en) | 2013-04-17 | 2014-12-31 | 주식회사 디에이아이오 | Storage system and method of processing write data |
US9785545B2 (en) | 2013-07-15 | 2017-10-10 | Cnex Labs, Inc. | Method and apparatus for providing dual memory access to non-volatile memory |
US9093093B2 (en) | 2013-10-25 | 2015-07-28 | Seagate Technology Llc | Adaptive guard band for multiple heads of a data storage device |
CA2881206A1 (en) | 2014-02-07 | 2015-08-07 | Andrew WARFIELD | Methods, systems and devices relating to data storage interfaces for managing address spaces in data storage devices |
US9542404B2 (en) | 2014-02-17 | 2017-01-10 | Netapp, Inc. | Subpartitioning of a namespace region |
US20150301964A1 (en) | 2014-02-18 | 2015-10-22 | Alistair Mark Brinicombe | Methods and systems of multi-memory, control and data plane architecture |
US9263088B2 (en) | 2014-03-21 | 2016-02-16 | Western Digital Technologies, Inc. | Data management for a data storage device using a last resort zone |
US9880859B2 (en) | 2014-03-26 | 2018-01-30 | Intel Corporation | Boot image discovery and delivery |
US9383926B2 (en) | 2014-05-27 | 2016-07-05 | Kabushiki Kaisha Toshiba | Host-controlled garbage collection |
US9015561B1 (en) | 2014-06-11 | 2015-04-21 | Sandisk Technologies Inc. | Adaptive redundancy in three dimensional memory |
GB2527296A (en) | 2014-06-16 | 2015-12-23 | Ibm | A method for restoring data in a HSM system |
US8868825B1 (en) | 2014-07-02 | 2014-10-21 | Pure Storage, Inc. | Nonrepeating identifiers in an address space of a non-volatile solid-state storage |
US10044795B2 (en) | 2014-07-11 | 2018-08-07 | Vmware Inc. | Methods and apparatus for rack deployments for virtual computing environments |
US9542327B2 (en) | 2014-07-22 | 2017-01-10 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Selective mirroring in caches for logical volumes |
US20160041760A1 (en) | 2014-08-08 | 2016-02-11 | International Business Machines Corporation | Multi-Level Cell Flash Memory Control Mechanisms |
US10430328B2 (en) | 2014-09-16 | 2019-10-01 | Sandisk Technologies Llc | Non-volatile cache and non-volatile storage medium using single bit and multi bit flash memory cells or different programming parameters |
US9588977B1 (en) | 2014-09-30 | 2017-03-07 | EMC IP Holding Company LLC | Data and metadata structures for use in tiering data to cloud storage |
US10127157B2 (en) | 2014-10-06 | 2018-11-13 | SK Hynix Inc. | Sizing a cache while taking into account a total bytes written requirement |
US9129628B1 (en) | 2014-10-23 | 2015-09-08 | Western Digital Technologies, Inc. | Data management for data storage device with different track density regions |
CN105701028B (en) * | 2014-11-28 | 2018-10-09 | 国际商业机器公司 | Disk management method in distributed memory system and equipment |
US9852076B1 (en) | 2014-12-18 | 2017-12-26 | Violin Systems Llc | Caching of metadata for deduplicated LUNs |
US20160179399A1 (en) | 2014-12-23 | 2016-06-23 | Sandisk Technologies Inc. | System and Method for Selecting Blocks for Garbage Collection Based on Block Health |
US10282211B2 (en) | 2015-01-09 | 2019-05-07 | Avago Technologies International Sales Pte. Limited | Operating system software install and boot up from a storage area network device |
US9916275B2 (en) | 2015-03-09 | 2018-03-13 | International Business Machines Corporation | Preventing input/output (I/O) traffic overloading of an interconnect channel in a distributed data storage system |
KR101927233B1 (en) | 2015-03-16 | 2018-12-12 | 한국전자통신연구원 | Gpu power measuring method of heterogeneous multi-core system |
US9639282B2 (en) | 2015-05-20 | 2017-05-02 | Sandisk Technologies Llc | Variable bit encoding per NAND flash cell to improve device endurance and extend life of flash-based storage devices |
US10069916B2 (en) | 2015-05-26 | 2018-09-04 | Gluent, Inc. | System and method for transparent context aware filtering of data requests |
US9875053B2 (en) | 2015-06-05 | 2018-01-23 | Western Digital Technologies, Inc. | Scheduling scheme(s) for a multi-die storage device |
US9696931B2 (en) | 2015-06-12 | 2017-07-04 | International Business Machines Corporation | Region-based storage for volume data and metadata |
US9588571B2 (en) | 2015-07-08 | 2017-03-07 | Quanta Computer Inc. | Dynamic power supply management |
US10324832B2 (en) | 2016-05-25 | 2019-06-18 | Samsung Electronics Co., Ltd. | Address based multi-stream storage device access |
US10656838B2 (en) | 2015-07-13 | 2020-05-19 | Samsung Electronics Co., Ltd. | Automatic stream detection and assignment algorithm |
US9529601B1 (en) | 2015-07-15 | 2016-12-27 | Dell Products L.P. | Multi-processor startup system |
US10749858B2 (en) | 2015-09-04 | 2020-08-18 | Hewlett Packard Enterprise Development Lp | Secure login information |
US9952769B2 (en) | 2015-09-14 | 2018-04-24 | Microsoft Technology Licensing, Llc. | Data storage system with data storage devices operative to manage storage device functions specific to a particular data storage device |
CN105278876B (en) | 2015-09-23 | 2018-12-14 | 华为技术有限公司 | A kind of the data method for deleting and device of solid state hard disk |
US10120811B2 (en) | 2015-09-29 | 2018-11-06 | International Business Machines Corporation | Considering a frequency of access to groups of tracks and density of the groups to select groups of tracks to destage |
US10031774B2 (en) | 2015-10-15 | 2018-07-24 | Red Hat, Inc. | Scheduling multi-phase computing jobs |
KR20170045806A (en) | 2015-10-20 | 2017-04-28 | 삼성전자주식회사 | Semiconductor memory device and method of operating the same |
US20170147499A1 (en) | 2015-11-25 | 2017-05-25 | Sandisk Technologies Llc | Multi-Level Logical to Physical Address Mapping Using Distributed Processors in Non-Volatile Storage Device |
US20170162235A1 (en) | 2015-12-02 | 2017-06-08 | Qualcomm Incorporated | System and method for memory management using dynamic partial channel interleaving |
US20170161202A1 (en) | 2015-12-02 | 2017-06-08 | Samsung Electronics Co., Ltd. | Flash memory device including address mapping for deduplication, and related methods |
US20170177259A1 (en) | 2015-12-18 | 2017-06-22 | Intel Corporation | Techniques to Use Open Bit Line Information for a Memory System |
JP6517684B2 (en) | 2015-12-22 | 2019-05-22 | 東芝メモリ株式会社 | Memory system and control method |
US10649681B2 (en) | 2016-01-25 | 2020-05-12 | Samsung Electronics Co., Ltd. | Dynamic garbage collection P/E policies for redundant storage blocks and distributed software stacks |
CN107037976B (en) | 2016-02-03 | 2020-03-20 | 株式会社东芝 | Storage device and working method thereof |
US10235198B2 (en) | 2016-02-24 | 2019-03-19 | Samsung Electronics Co., Ltd. | VM-aware FTL design for SR-IOV NVME SSD |
US20170249162A1 (en) | 2016-02-25 | 2017-08-31 | Red Hat Israel, Ltd. | Safe transmit packet processing for network function virtualization applications |
US10303557B2 (en) * | 2016-03-09 | 2019-05-28 | Commvault Systems, Inc. | Data transfer to a distributed storage environment |
US10101939B2 (en) | 2016-03-09 | 2018-10-16 | Toshiba Memory Corporation | Storage system having a host that manages physical data locations of a storage device |
US20170286311A1 (en) | 2016-04-01 | 2017-10-05 | Dale J. Juenemann | Repetitive address indirection in a memory |
US10585809B2 (en) | 2016-04-01 | 2020-03-10 | Intel Corporation | Convolutional memory integrity |
US10866905B2 (en) | 2016-05-25 | 2020-12-15 | Samsung Electronics Co., Ltd. | Access parameter based multi-stream storage device access |
US10684795B2 (en) | 2016-07-25 | 2020-06-16 | Toshiba Memory Corporation | Storage device and storage control method |
US10283215B2 (en) | 2016-07-28 | 2019-05-07 | Ip Gem Group, Llc | Nonvolatile memory system with background reference positioning and local reference positioning |
US11644992B2 (en) | 2016-11-23 | 2023-05-09 | Samsung Electronics Co., Ltd. | Storage system performing data deduplication, method of operating storage system, and method of operating data processing system |
US10374885B2 (en) | 2016-12-13 | 2019-08-06 | Amazon Technologies, Inc. | Reconfigurable server including a reconfigurable adapter device |
US10496544B2 (en) | 2016-12-29 | 2019-12-03 | Intel Corporation | Aggregated write back in a direct mapped two level memory |
US10516760B2 (en) | 2017-03-17 | 2019-12-24 | Verizon Patent And Licensing Inc. | Automatic bootstrapping and dynamic configuration of data center nodes |
US10275170B2 (en) | 2017-04-10 | 2019-04-30 | Sandisk Technologies Llc | Folding operations in memory systems with single address updates |
US10613944B2 (en) * | 2017-04-18 | 2020-04-07 | Netapp, Inc. | Systems and methods for backup and restore of distributed master-slave database clusters |
TWI625620B (en) | 2017-05-12 | 2018-06-01 | 威盛電子股份有限公司 | Non-volatile memory apparatus and reading method thereof |
US10474397B2 (en) | 2017-06-13 | 2019-11-12 | Western Digital Technologies, Inc | Unified indirection in a multi-device hybrid storage unit |
US10521375B2 (en) | 2017-06-22 | 2019-12-31 | Macronix International Co., Ltd. | Controller for a memory system |
US10838902B2 (en) | 2017-06-23 | 2020-11-17 | Facebook, Inc. | Apparatus, system, and method for performing hardware acceleration via expansion cards |
US10275162B2 (en) | 2017-06-23 | 2019-04-30 | Dell Products L.P. | Methods and systems for managing data migration in solid state non-volatile memory |
US10564856B2 (en) | 2017-07-06 | 2020-02-18 | Alibaba Group Holding Limited | Method and system for mitigating write amplification in a phase change memory-based storage device |
TWI631570B (en) | 2017-09-04 | 2018-08-01 | 威盛電子股份有限公司 | Error checking and correcting decoding method and apparatus |
US10642522B2 (en) | 2017-09-15 | 2020-05-05 | Alibaba Group Holding Limited | Method and system for in-line deduplication in a storage drive based on a non-collision hash |
US10956279B2 (en) * | 2017-12-04 | 2021-03-23 | International Business Machines Corporation | Managing big data on document based NoSQL databases |
US10229735B1 (en) | 2017-12-22 | 2019-03-12 | Intel Corporation | Block management for dynamic single-level cell buffers in storage devices |
US10606693B2 (en) | 2017-12-28 | 2020-03-31 | Micron Technology, Inc. | Memory controller implemented error correction code memory |
CN110058794B (en) | 2018-01-19 | 2022-11-01 | 上海宝存信息科技有限公司 | Data storage device for dynamically executing garbage recovery and operation method |
US10199066B1 (en) | 2018-03-01 | 2019-02-05 | Seagate Technology Llc | Write management of physically coupled storage areas |
US10585819B2 (en) | 2018-03-05 | 2020-03-10 | Samsung Electronics Co., Ltd. | SSD architecture for FPGA based acceleration |
US10649657B2 (en) | 2018-03-22 | 2020-05-12 | Western Digital Technologies, Inc. | Log-based storage for different data types in non-volatile memory |
JP7023384B2 (en) | 2018-05-04 | 2022-02-21 | シトリックス・システムズ・インコーポレイテッド | Computer systems and related methods that provide hierarchical display remoting optimized with user and system hints |
US10437670B1 (en) | 2018-05-24 | 2019-10-08 | International Business Machines Corporation | Metadata hardening and parity accumulation for log-structured arrays |
KR20190139082A (en) | 2018-06-07 | 2019-12-17 | 삼성전자주식회사 | Memory device and method for equalizing bit error rates |
US11599557B2 (en) * | 2018-06-12 | 2023-03-07 | Open Text Corporation | System and method for persistence and replication of changes to a data store |
US10921992B2 (en) | 2018-06-25 | 2021-02-16 | Alibaba Group Holding Limited | Method and system for data placement in a hard disk drive based on access frequency for improved IOPS and utilization efficiency |
US10776263B2 (en) | 2018-06-27 | 2020-09-15 | Seagate Technology Llc | Non-deterministic window scheduling for data storage systems |
US11150836B2 (en) | 2018-06-28 | 2021-10-19 | Seagate Technology Llc | Deterministic optimization via performance tracking in a data storage system |
US11086529B2 (en) | 2018-09-26 | 2021-08-10 | Western Digital Technologies, Inc. | Data storage systems and methods for improved data relocation based on read-level voltages associated with error recovery |
US11334521B2 (en) * | 2018-12-21 | 2022-05-17 | EMC IP Holding Company LLC | System and method that determines a size of metadata-based system snapshots |
-
2019
- 2019-02-15 US US16/277,708 patent/US10970212B2/en active Active
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11960724B2 (en) * | 2021-09-13 | 2024-04-16 | SK Hynix Inc. | Device for detecting zone parallelity of a solid state drive and operating method thereof |
Also Published As
Publication number | Publication date |
---|---|
US10970212B2 (en) | 2021-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102370760B1 (en) | Zone formation for zoned namespaces | |
KR101912596B1 (en) | Non-volatile memory program failure recovery via redundant arrays | |
US10331345B2 (en) | Method and apparatus for reducing silent data errors in non-volatile memory systems | |
US11200159B2 (en) | System and method for facilitating efficient utilization of NAND flash memory | |
US11449386B2 (en) | Method and system for optimizing persistent memory on data retention, endurance, and performance for host memory | |
CN113396566A (en) | Resource allocation based on comprehensive I/O monitoring in distributed storage system | |
US10872622B1 (en) | Method and system for deploying mixed storage products on a uniform storage infrastructure | |
US11922019B2 (en) | Storage device read-disturb-based block read temperature utilization system | |
US10970212B2 (en) | Method and system for facilitating a distributed storage system with a total cost of ownership reduction for multiple available zones | |
CN112346658B (en) | Improving data heat trace resolution in a storage device having a cache architecture | |
EP4170499A1 (en) | Data storage method, storage system, storage device, and storage medium | |
US11340989B2 (en) | RAID storage-device-assisted unavailable primary data/Q data rebuild system | |
CN114730247A (en) | Storage device with minimum write size of data | |
US11847337B2 (en) | Data parking for ZNS devices | |
US9400748B2 (en) | System and method for data inversion in a storage resource | |
US11487465B2 (en) | Method and system for a local storage engine collaborating with a solid state drive controller | |
US11119855B2 (en) | Selectively storing parity data in different types of memory | |
US11003391B2 (en) | Data-transfer-based RAID data update system | |
CN114490726A (en) | Automatic flexible mode detection and migration | |
US11989441B2 (en) | Read-disturb-based read temperature identification system | |
US11928354B2 (en) | Read-disturb-based read temperature determination system | |
US11922035B2 (en) | Read-disturb-based read temperature adjustment system | |
US11983431B2 (en) | Read-disturb-based read temperature time-based attenuation system | |
US11995340B2 (en) | Read-disturb-based read temperature information access system | |
US11868223B2 (en) | Read-disturb-based read temperature information utilization system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: ALIBABA GROUP HOLDING LIMITED, CAYMAN ISLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LI, SHU;REEL/FRAME:048484/0545 Effective date: 20190228 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |