US10067682B1 - I/O accelerator for striped disk arrays using parity - Google Patents
I/O accelerator for striped disk arrays using parity Download PDFInfo
- Publication number
- US10067682B1 US10067682B1 US15/185,522 US201615185522A US10067682B1 US 10067682 B1 US10067682 B1 US 10067682B1 US 201615185522 A US201615185522 A US 201615185522A US 10067682 B1 US10067682 B1 US 10067682B1
- Authority
- US
- United States
- Prior art keywords
- request
- write
- striped
- disk array
- computer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000003491 array Methods 0.000 title description 8
- 238000011010 flushing procedure Methods 0.000 claims abstract description 4
- 238000013500 data storage Methods 0.000 claims description 30
- 238000012545 processing Methods 0.000 claims description 17
- 238000000034 method Methods 0.000 claims description 14
- 230000004044 response Effects 0.000 claims description 8
- 238000005192 partition Methods 0.000 claims 1
- 238000000638 solvent extraction Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 18
- 238000007726 management method Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 10
- 238000013459 approach Methods 0.000 description 5
- 230000001133 acceleration Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 229920000638 styrene acrylonitrile Polymers 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 241000204801 Muraenidae Species 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000012005 ligant binding assay Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002062 proliferating effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
- G06F11/1088—Reconstruction on already foreseen single or plurality of spare disks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0613—Improving I/O performance in relation to throughput
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2211/00—Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
- G06F2211/10—Indexing scheme relating to G06F11/10
- G06F2211/1002—Indexing scheme relating to G06F11/1076
- G06F2211/1019—Fast writes, i.e. signaling the host that a write is done before data is written to disk
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2211/00—Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
- G06F2211/10—Indexing scheme relating to G06F11/10
- G06F2211/1002—Indexing scheme relating to G06F11/1076
- G06F2211/1038—LFS, i.e. Log Structured File System used in RAID systems with parity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2211/00—Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
- G06F2211/10—Indexing scheme relating to G06F11/10
- G06F2211/1002—Indexing scheme relating to G06F11/1076
- G06F2211/104—Metadata, i.e. metadata associated with RAID systems with parity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2211/00—Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
- G06F2211/10—Indexing scheme relating to G06F11/10
- G06F2211/1002—Indexing scheme relating to G06F11/1076
- G06F2211/1066—Parity-small-writes, i.e. improved small or partial write techniques in RAID systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2211/00—Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
- G06F2211/10—Indexing scheme relating to G06F11/10
- G06F2211/1002—Indexing scheme relating to G06F11/1076
- G06F2211/1069—Phantom write, i.e. write were nothing is actually written on the disk of a RAID system
Definitions
- a RAID-5 disk array uses block-level striping (where a stripe is a concurrent series of blocks, one block for each disk in the array) with parity data distributed across all member disks. Data is also written to each physical disk one block at a time. However, whenever a “random” block (or some portion thereof) is updated and needs to be written to the physical disk, the parity block (or some portion thereof) must also be recalculated and rewritten. Consequently, each random block-level write requires at least two reads and two writes to complete.
- full-stripe write While this is particularly costly for small write operations (i.e., operations involving a single block), larger sequential writes that span the entire width of the stripe (i.e., a “full-stripe write”), are much less costly because no read operations are required; instead, the new full-stripe write data (including the new calculated parity block) can simply be written over the entire stripe (as four concurrent write operations) without regard for the old data that is no longer needed for any purpose.
- VM enhanced volume manager
- I/O input/output
- various implementations are directed to accelerating “random writes” (writes comprising less than a complete stripe of data) by consolidating several random writes together to create a “sequential write” (a full-stripe write) to eliminate one or more read operations and/or increase the volume of new/updated data stored for each write operation.
- Several such implementations comprise functionality in the VM (volume manager) for identifying random write I/O requests, queuing them locally in a journal, and then periodically flushing the journal to the disk array as a sequential write request.
- FIG. 1 is an illustration of an exemplary network environment in which the numerous implementations disclosed herein may be utilized;
- FIG. 2A is a block diagram illustrating a typical storage device exposing a plurality of volumes (or logical disks) managed by a volume manager (VM) and backed by a disk array comprising a RAID controller its associated plurality of physical disks;
- VM volume manager
- FIG. 2B is a block diagram of an exemplary RAID-5 physical disk array for which the numerous implementations disclosed herein may be applied;
- FIG. 3A is a block diagram of a journal comprising a journal table and journal data storage area, said journal representative of several implementations disclosed herein;
- FIG. 3B is a block diagram of an I/O write request processing from a snapshot module of a VM (represented by a snapshot volume table, SVT) to either the journal of FIG. 3A or to the logical disk storage expressed by the underlying RAID array;
- a snapshot module of a VM represented by a snapshot volume table, SVT
- FIG. 3C is a block diagram of an I/O read request processing from a snapshot module of a VM (represented by a snapshot volume table, SVT) from either the journal of FIG. 3A or to the logical disk storage expressed by the underlying RAID array;
- SVT snapshot volume table
- FIG. 4A is an operational flow diagram of I/O write request processing by the VM using the exemplary implementation of FIG. 3B ;
- FIG. 4B is an operational flow diagram of I/O read request processing by the VM using the exemplary implementation of FIG. 3C ;
- FIGS. 5A and 5B are block diagrams illustrating an exemplary embodiment of a hash-based search function suitable for certain implementations disclosed herein;
- FIG. 6A is a block diagram illustrating an exemplary storage system on which various implementations disclosed herein may execute;
- FIG. 6B shows an exemplary storage node computer environment (e.g., a computer server and/or NAS server) in which example implementations and aspects may be implemented.
- a storage node computer environment e.g., a computer server and/or NAS server
- a disk array is a disk storage system which contains multiple disk drives.
- a Redundant Array of Independent/Inexpensive Disks (or RAID) is the combination of multiple disk drive components into a single logical unit where data is distributed across the drives in one of several approaches (referred to as “RAID levels”). “RAID” has also become an umbrella term for computer data storage schemes that can divide and replicate data among multiple physical disk drives arranged in a “RAID array” addressed by the operating system as a single virtual disk comprising one or more volumes.
- RAID Red Node
- Server system implementations typically provide volume management which allows a system to present logical volumes for use.
- a volume is a single accessible storage area within a single file system that represents a single logical disk drive, and thus a volume is the logical interface used by an operating system to access data stored in a file system that can be distributed over multiple physical devices.
- a disk array controller In storage systems such as RAID, a disk array controller (DAC) is used to manage the physical disk drives and present them as logical units or volumes to the computing system.
- the disk array controller can also be referred to as a RAID controller.
- the DAC provides both a back-end interface and a front-end interface.
- the back-end interface communicates with the controlled disks using a protocol such as, for example, ATA, SATA, SCSI, FC, or SAS.
- the front-end interface communicates with a computer system using one of the disk protocols such as, for example, ATA, SATA, SCSI, or FC (to transparently emulate a disk for the computer system) or specialized protocols such as FICON/ESCON, iSCSI, HyperSCSI, ATA over Ethernet, or InfiniBand.
- the DAC may use different protocols for back-end and front-end communication.
- External disk arrays such as a storage area network (SAN) or network-attached storage (NAS) servers, are physically independent enclosures of disk arrays.
- a storage area network (SAN) is a dedicated storage network that provides access to consolidated block-level storage, and is primarily are used to make storage devices (such as disk arrays) accessible to servers so that the devices appear as locally attached to those servers.
- a SAN typically comprises its own intra-network of storage devices that are generally not directly accessible by regular devices.
- a SAN alone does not provide the “file” abstraction, only block-level operations on virtual blocks of data; however, file systems built on top of SANs do provide this abstraction and are known as SAN file systems or shared disk file systems.
- Virtual blocks, or “block virtualization,” are the abstraction (of separation) of logical storage from physical storage so that data may be accessed without regard to physical storage or heterogeneous structure and thereby allows the storage system greater flexibility in how its manage it physical storage.
- Network-attached storage on the other hand, is file-level computer data storage connected to a computer network providing data access to heterogeneous clients.
- NAS systems typically comprise one or more hard drives often arranged into logical redundant storage containers or RAID arrays.
- Network-attached storage in contrast to SAN, does not attempt to appear as locally attached but, instead, uses several file-based sharing protocols such as NFS, SMB/CIFS, of AFP to enable remote computers to request a portion of an abstract file (rather than a disk block).
- an NAS may comprise a SAN and/or a disk array
- an “NAS gateway” can be added to a SAN to effectively convert it into a NAS since NAS provides both storage and a file system whereas SAN provides only block-based storage and leaves file system concerns to the client.
- NAS can also be used to refer to the enclosure containing one or more disk drives (which may be configured as a RAID array) along with the equipment necessary to make the storage available over a computer network (including a dedicated computer designed to operate over the network).
- non-RAID storage architectures available today, including, for example, the Single Large Expensive Drive (SLED) which, as the name implies, comprises single drive, as well as disk arrays without any additional control—and thus accessed simply as independent drives—which are often referred to as the “Just a Bunch Of Disks” (JBOD) architecture.
- SLED Single Large Expensive Drive
- JBOD Just a Bunch Of Disks
- RAID or a RAID array can be easily substituted with one of the several non-RAID storage architectures, and thus references to RAID or a RAID array are merely exemplary and are in no way intended to be limiting.
- FIG. 1 is an illustration of an exemplary networked computer environment 100 in which the numerous implementations disclosed herein may be utilized.
- the network environment 100 may include one or more clients 110 and 112 configured to communicate with each other or with one or more servers 121 and 122 through a network 120 which may be a variety of network types including the public switched telephone network (PSTN), a cellular telephone network, and a packet switched network (e.g., the Internet).
- a client such as client 110
- a client, such as client 112 may comprise an internal or non-removable storage device 184 .
- a server, such as server 121 may also comprise a storage device 186 or a collection of storage devices.
- the network environment 100 may further comprise one or more NAS servers 140 and 144 configured to communicate with each other or with one or more clients 110 and 112 and/or one or more servers 121 and 122 through the network 120 .
- An NAS server 140 and 144 may also comprise a storage device 192 and 194 .
- the storage devices 182 , 184 , 186 , 188 , 192 , and 194 may be a disk array (such as a RAID array), a SLED, a JBOD system, or any other storage system.
- the network environment 100 may also comprise one or more SANs 150 , 152 , and 154 that are operatively coupled to, for example, a server (such as SAN 186 coupled to server 121 ), an NAS server (such as the SAN 154 coupled to NAS server 144 ), or to a an NAS gateway 142 that together with its SAN 152 together provide the functionality of an NAS server.
- a server or an NAS server, such as NAS server 144 may comprise both a storage device 194 and a SAN 154 .
- While the clients 110 and 112 , servers 121 and 122 , NAS servers 140 and 144 , and NAS gateway 142 are illustrated as being connected by the network 120 , in some implementations it is contemplated that these systems may be directly connected to each other or even executed by the same computing system.
- the storage devices 182 , 184 , 186 , 188 , 192 , and 194 are shown as connected to one of a client or a server, in some implementations it is contemplated that the storage devices 182 , 184 , 186 , 188 , 192 , and 194 may be connected to each other or to more than one client and/or server, and that such connections may be made over the network 120 as well as directly.
- the clients 110 and 112 may include a desktop personal computer, workstation, laptop, PDA, cell phone, smart phone, or any WAP-enabled device or any other computing device capable of interfacing directly or indirectly with the network 120 .
- the clients 110 and 112 may run an HTTP client (e.g., a web-browsing program) or a WAP-enabled browser in the case of a cell phone, PDA or other wireless device, or the like, allowing a user of the clients 110 and 112 to access information available to it at the servers 121 and 122 or to provide information to the servers 121 and 122 .
- Other applications may also be used by the clients 110 and 112 to access or provide information to the servers 121 and 122 , for example.
- the servers 121 and 122 may be implemented using one or more general purpose computing systems.
- FIG. 2A is a block diagram illustrating a typical storage system 200 exposing a plurality of volumes (or logical disks) 202 , 204 , and 206 managed by a volume manager (VM) 210 and backed by a disk array comprising, for example, a RAID controller 220 and its associated plurality of physical disks 232 , 234 , 236 , and 238 .
- VM volume manager
- Certain disk array implementations employ a technique known as data striping. Data striping is the technique of segmenting logically sequential data (such as a file) in a way that sequential segments are written to different physical storage devices. For example, a RAID-5 disk array uses block-level striping where each stripe is a concurrent series of blocks, one block for each disk in the array.
- FIG. 2B is a block diagram of an exemplary RAID-5 physical disk array for which the numerous implementations disclosed herein may be applied.
- each letter (A, B, C, etc.) corresponds to a stripe and represents the group of blocks comprising each stripe (including a distributed parity block, discussed later herein).
- Striping is used to read and write data more quickly on I/O operations than is possible with a single physical storage device by performing read and write operations on multiple devices concurrently, thereby increasing throughput.
- three blocks (plus parity) corresponding to each letter can be written simultaneously,
- the failure of one physical disk can result in the corruption of the full data sequence; consequently, the failure rate of the disk array is the sum of the failure rate of each storage device.
- this disadvantage of striping can be overcome by the storage of redundant information for the purpose of error correction, and parity is one approach for doing so.
- Parity is data used to achieve redundancy such that, if a physical disk in the disk array fails, the remaining data on the other drives can be combined with the parity data to reconstruct the missing data.
- the resulting parity data is then stored on a separate physical drive from its inputs, and the parity information can be maintained on its own separate physical disk or, as shown in FIG. 2B , spread across all of the physical drives in the array (known as “distributed parity”).
- a RAID-5 disk array uses block-level striping (where a stripe is a concurrent series of blocks, one block for each disk in the array) with parity data distributed across all member disks. Data is also written to each physical disk one block at a time. However, whenever a block (or some portion thereof) is updated and needs to be written to the physical disk, the parity block (or some portion thereof) must also be recalculated and rewritten. Thus for example—and referring again to FIG.
- each block-level write requires at least two reads and two writes to complete (although the reads can be conducted in parallel, as can the writes).
- random writes may be prolific.
- iSCSI-based storage servers are often utilized as backend storage for database servers. Since I/O requests from database servers to the disks are typically 8 KB in size, these storage servers would be receiving numerous random 8 KB I/O write requests.
- certain VMs may utilize I/O tracking granularity of 64 KB, these 8 KB I/Os may need to be converted to 64 KB with a read-modify-write sequence as well to sequence the 8 KB random I/Os, thereby resulting in the random write I/O issue described above.
- full-stripe write in the example of FIG. 2B , comprising three blocks at a time (e.g., A1, A2, and A3)—are much less costly because no read operations are required; instead, the new full-stripe write data (including the new calculated parity block) can simply be written over the entire stripe (as four concurrent write operations) without regard for the old data that is no longer needed for any purpose.
- full-stripe writes are nearly as efficient as read operations that do not require parity data (except to correct for an error when detected).
- Various implementations disclosed herein are directed to accelerating “random writes” (writes comprising less than a complete stripe of data) by consolidating several random writes together to create a “sequential write” (a full-stripe write) to eliminate one or more read operations and/or increase the volume of new/updated data stored for each write operation.
- Several such implementations comprise functionality in the VM (volume manager) for identifying random write I/O requests, queuing them locally in a journal, and then periodically flushing the journal to the disk array as a sequential write request.
- the VM must track the journal, handle read/write I/O requests made to data cached in the journal, and periodically flush the journal to maintain adequate caching space for newer incoming random writes.
- FIG. 3A is a block diagram of a journal comprising a journal table and journal data storage area, said journal representative of several implementations disclosed herein.
- the journal table comprises a table of entries comprising entry values (Entry0, Entry1, . . . EntryN) and the logical block address (LBA) of each random block stored therein.
- the entry values correspond to a location in the journal data storage area.
- the size of the I/O data may be stored as a discrete type according to the size of the I/O data stored in the journal data storage area; for example, this I/O may range from 4 KB to 64 KB in some implementations.
- the VM When directing a random write request to the journal, the VM records the data in the journal data storage area and marks an entry in the corresponding journal table.
- This journal table comprises the metadata for every journal item and includes the location of where that data is supposed reside in the logical disk expressed by the storage array.
- the journal data storage area can optionally be a portion of the disk array.
- random write requests which are directed to various portions that are spread over the disk array, can be recorded in the journal data storage area. Then, a number of random write requests (i.e., a number of random write requests directed to the same stripe) can be consolidated to create a sequential, or full-stripe, write request.
- the VM may have recent I/O pattern history for each volume so that, when an I/O write request for the volume is received, the VM is able to determine if the I/O write request is seemingly sequential or random by comparing the I/O write request to other recent I/Os to see if they together comprise substantially consecutive blocks indicative of sequential data. If the I/O write request seems unrelated to other recent I/O, however, then the VM will deem that I/O to be random and direct it to the journal.
- the VM may take advantage of data from a caching module (such as an Advanced Caching Module, or ACM) layered between iSCSI module and the VM, in which case the VM consults the caching module to determine whether the I/O is random or sequential.
- a caching module such as an Advanced Caching Module, or ACM
- the caching module using valid bitmap data maintained for each chunk in sector granularity, is aware of adjacent valid bits for incoming I/O and can check regions already valid in cache and, if not valid, then conclude that the incoming I/O is random.
- FIG. 3B is a block diagram of an I/O write request processing from a snapshot module of a VM (represented by a snapshot volume table, SVT) to either the journal of FIG. 3A or to the logical disk storage expressed by the underlying RAID array.
- FIG. 3C is a block diagram of an I/O read request processing from a snapshot module of a VM (represented by a snapshot volume table, SVT) from either the journal of FIG. 3A or to the logical disk storage expressed by the underlying RAID array.
- FIG. 4A is an operational flow diagram 400 of I/O write request processing by the VM using the exemplary implementation of FIG. 3B .
- the VM here, processing from the snapshot module's SVT
- the VM first determines if the I/O write request represents a sequential or random write operation. If sequential, then at 404 the I/O write request is immediately forwarded to the logical disk expressed by the exemplary RAID-5 disk array. If random, however, then at 406 the I/O write request is instead marked into the journal table and, at 408 , the corresponding data is cached in the journal data store location corresponding to the entry location in the journal table.
- the journal is then evaluated to determine if a flush of the data contained therein to the logical disk is necessary or desirable based on predefined criteria for the evaluation (and acts accordingly). For example, the journal can be periodically flushed to maintain adequate caching space for newer incoming random writes. Alternatively or additionally, the journal can be periodically flushed at a time to minimize impact on incoming I/O traffic (i.e., a time of lower I/O load on the disk array). In certain implementations, subsequent random I/O entries made to the journal are made adjacent to the one previously written, thus building a sequential I/O from the random I/Os.
- FIG. 4B is an operational flow diagram 450 of I/O read request processing by the VM using the exemplary implementation of FIG. 3C .
- the VM (again, processing from the snapshot module's SVT) first checks the journal table to see if there is an entry for the data being sought and, at 454 , thereby determines if the I/O read request is in the journal or on the logical disk. If on the logical disk, then at 456 the I/O read request is immediately forwarded to the logical disk expressed by the exemplary RAID-5 disk array.
- each read request must first refer the journal if any part of the data being sought is in the journal area, and that part must be read from the journal. In some instance (i.e., for sub-block size elements), a corresponding read to the logical disk may also be necessary, and the journal entry used to update the block before returning in response to the I/O read request, if the return data is block-sized and comprises data not otherwise in the journal.
- FIGS. 5A and 5B are block diagrams illustrating an exemplary embodiment of just such a hash-based search function characterized by the following text (italicized):
- FIG. 6A is a block diagram illustrating an exemplary storage system on which various implementations disclosed herein may execute.
- a volume may be either a data volume (in the case of iSCSI I/O requests) or a fileshare (in the case of the XFS I/O requests).
- SAN paths and NAS paths are thus shown, as well as additional functionality that may exist in a RAID controller.
- a view of the VM coupled to the RAID controller as well as the iSCSI driver and cache (and possibly the XFS file system driver and cache).
- the VM coupled to the RAID controller
- the iSCSI driver and cache and possibly the XFS file system driver and cache.
- storage volumes or fileshares are exposed to the clients, and at the bottom of the storage stack are the physical disks that are utilized to store the data.
- the physical disks are, in turn, connected to a disk controller, such as a Serial ATA (SATA) controller or a hardware RAID controller.
- a SATA controller a SATA driver may be utilized to access the hardware device, and a software RAID module may be utilized to provide RAID services in the absence of a hardware RAID controller.
- a unified RAID management layer may be utilized to simplify the utilization of RAID with either software or hardware implementations.
- a combination device driver that implements additional functions as extensions to the VM.
- this combination device driver a number of software components are utilized depending upon the access mechanism employed to access the data stored on the physical disks.
- a SAN path is provided that utilizes a cache and an iSCSI driver
- a NAS path is also provided that utilizes a cache and the XFS high-performance journaling files system, for example.
- volumes are exposed through the SAN path while fileshares are exposed through the NAS path, although both constitute “volumes” with regard to disclosures herein pertaining to the various implementations.
- FIG. 6B shows an exemplary storage node computer environment (e.g., a computer server and/or NAS server) in which example implementations and aspects may be implemented.
- the storage node computer 2 includes a baseboard, or “motherboard”, which is a printed circuit board to which a multitude of components or devices may be connected by way of a system bus or other electrical communication paths.
- a CPU 22 operates in conjunction with a chipset 52 .
- the CPU 22 is a standard central processor that performs arithmetic and logical operations necessary for the operation of the computer.
- the storage node computer 2 may include a multitude of CPUs 22 .
- the chipset 52 includes a north bridge 24 and a south bridge 26 .
- the north bridge 24 provides an interface between the CPU 22 and the remainder of the computer 2 .
- the north bridge 24 also provides an interface to a random access memory (“RAM”) used as the main memory 54 in the computer 2 and, possibly, to an on-board graphics adapter 30 .
- the north bridge 24 may also include functionality for providing networking functionality through a gigabit Ethernet adapter 28 .
- the gigabit Ethernet adapter 28 is capable of connecting the computer 2 to another computer via a network. Connections which may be made by the network adapter 28 may include LAN or WAN connections. LAN and WAN networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the internet.
- the north bridge 24 is connected to the south bridge 26 .
- the south bridge 26 is responsible for controlling many of the input/output functions of the computer 2 .
- the south bridge 26 may provide one or more universal serial bus (“USB”) ports 32 , a sound adapter 46 , an Ethernet controller 60 , and one or more general purpose input/output (“GPIO”) pins 34 .
- the south bridge 26 may also provide a bus for interfacing peripheral card devices such as a graphics adapter 62 .
- the bus comprises a peripheral component interconnect (“PCI”) bus.
- PCI peripheral component interconnect
- the south bridge 26 may also provide a system management bus 64 for use in managing the various components of the computer 2 . Additional details regarding the operation of the system management bus 64 and its connected components are provided below.
- the south bridge 26 is also operative to provide one or more interfaces for connecting mass storage devices to the computer 2 .
- the south bridge 26 includes a serial advanced technology attachment (“SATA”) adapter for providing one or more serial ATA ports 36 and an ATA 100 adapter for providing one or more ATA 100 ports 44 .
- the serial ATA ports 36 and the ATA 100 ports 44 may be, in turn, connected to one or more mass storage devices storing an operating system 40 and application programs, such as the SATA disk drive 38 .
- an operating system 40 comprises a set of programs that control operations of a computer and allocation of resources.
- An application program is software that runs on top of the operating system software, or other runtime environment, and uses computer resources to perform application specific tasks desired by the user.
- the operating system 40 comprises the LINUX operating system. According to another embodiment of the invention the operating system 40 comprises the WINDOWS SERVER operating system from MICROSOFT CORPORATION. According to another embodiment, the operating system 40 comprises the UNIX or SOLARIS operating system. It should be appreciated that other operating systems may also be utilized.
- the mass storage devices connected to the south bridge 26 , and their associated computer-readable media, provide non-volatile storage for the computer 2 .
- computer-readable media can be any available media that can be accessed by the computer 2 .
- computer-readable media may comprise computer storage media and communication media.
- Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
- Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
- a low pin count (“LPC”) interface may also be provided by the south bridge 26 for connecting a “Super I/O” device 70 .
- the Super I/O device 70 is responsible for providing a number of input/output ports, including a keyboard port, a mouse port, a serial interface 72 , a parallel port, and other types of input/output ports.
- the LPC interface may also connect a computer storage media such as a ROM or a flash memory such as a NVRAM 48 for storing the firmware 50 that includes program code containing the basic routines that help to start up the computer 2 and to transfer information between elements within the computer 2 .
- the south bridge 26 may include a system management bus 64 .
- the system management bus 64 may include a BMC 66 .
- the BMC 66 is a microcontroller that monitors operation of the computer system 2 .
- the BMC 66 monitors health-related aspects associated with the computer system 2 , such as, but not limited to, the temperature of one or more components of the computer system 2 , speed of rotational components (e.g., spindle motor, CPU Fan, etc.) within the system, the voltage across or applied to one or more components within the system 2 , and the available or used capacity of memory devices within the system 2 .
- speed of rotational components e.g., spindle motor, CPU Fan, etc.
- the BMC 66 is communicatively connected to one or more components by way of the management bus 64 .
- these components include sensor devices for measuring various operating and performance-related parameters within the computer system 2 .
- the sensor devices may be either hardware or software based components configured or programmed to measure or detect one or more of the various operating and performance-related parameters.
- the BMC 66 functions as the master on the management bus 64 in most circumstances, but may also function as either a master or a slave in other circumstances.
- Each of the various components communicatively connected to the BMC 66 by way of the management bus 64 is addressed using a slave address.
- the management bus 64 is used by the BMC 66 to request and/or receive various operating and performance-related parameters from one or more components, which are also communicatively connected to the management bus 64 .
- the computer 2 may comprise other types of computing devices, including hand-held computers, embedded computer systems, personal digital assistants, and other types of computing devices known to those skilled in the art. It is also contemplated that the computer 2 may not include all of the components shown in FIG. 6B , may include other components that are not explicitly shown in FIG. 6B , or may utilize an architecture completely different than that shown in FIG. 6B .
- exemplary implementations may refer to utilizing aspects of the presently disclosed subject matter in the context of one or more stand-alone computer systems, the subject matter is not so limited, but rather may be implemented in connection with any computing environment, such as a network or distributed computing environment. Still further, aspects of the presently disclosed subject matter may be implemented in or across a plurality of processing chips or devices, and storage may similarly be effected across a plurality of devices. Such devices might include personal computers, network servers, and handheld devices, for example.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Disclosed herein is an enhanced volume manager (VM) for a storage system that accelerates input/output (I/O) performance for random write operations to a striped disk array using parity. More specifically, various implementations are directed to accelerating “random writes” (writes comprising less than a complete stripe of data) by consolidating several random writes together to create a “sequential write” (a full-stripe write) to eliminate one or more read operations and/or increase the volume of new/updated data stored for each write operation. Several such implementations comprise functionality in the VM (volume manager) for identifying random write I/O requests, queuing them locally in a journal, and then periodically flushing the journal to the disk array as a sequential write request.
Description
This application is a continuation of U.S. patent application Ser. No. 13/449,496, filed on Apr. 18, 2012, now U.S. Pat. No. 9,396,067, entitled “I/O Accelerator for Striped Disk Arrays Using Parity,” which claims the benefit of U.S. Provisional Patent Application No. 61/476,725, filed on Apr. 18, 2011, entitled “I/O Accelerator for Striped Disk Arrays Using Parity.” The disclosures of which are all hereby incorporated by reference in their entireties.
A RAID-5 disk array uses block-level striping (where a stripe is a concurrent series of blocks, one block for each disk in the array) with parity data distributed across all member disks. Data is also written to each physical disk one block at a time. However, whenever a “random” block (or some portion thereof) is updated and needs to be written to the physical disk, the parity block (or some portion thereof) must also be recalculated and rewritten. Consequently, each random block-level write requires at least two reads and two writes to complete.
While this is particularly costly for small write operations (i.e., operations involving a single block), larger sequential writes that span the entire width of the stripe (i.e., a “full-stripe write”), are much less costly because no read operations are required; instead, the new full-stripe write data (including the new calculated parity block) can simply be written over the entire stripe (as four concurrent write operations) without regard for the old data that is no longer needed for any purpose.
Various implementations disclosed herein are directed to an enhanced volume manager (VM) for a storage system that accelerates input/output (I/O) performance for random write operations to a striped disk array using parity. More specifically, various implementations are directed to accelerating “random writes” (writes comprising less than a complete stripe of data) by consolidating several random writes together to create a “sequential write” (a full-stripe write) to eliminate one or more read operations and/or increase the volume of new/updated data stored for each write operation. Several such implementations comprise functionality in the VM (volume manager) for identifying random write I/O requests, queuing them locally in a journal, and then periodically flushing the journal to the disk array as a sequential write request.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
To facilitate an understanding of and for the purpose of illustrating the present disclosure and various implementations, exemplary features and implementations are disclosed in, and are better understood when read in conjunction with, the accompanying drawings—it being understood, however, that the present disclosure is not limited to the specific methods, precise arrangements, and instrumentalities disclosed. Similar reference characters denote similar elements throughout the several views. In the drawings:
A disk array is a disk storage system which contains multiple disk drives. A Redundant Array of Independent/Inexpensive Disks (or RAID) is the combination of multiple disk drive components into a single logical unit where data is distributed across the drives in one of several approaches (referred to as “RAID levels”). “RAID” has also become an umbrella term for computer data storage schemes that can divide and replicate data among multiple physical disk drives arranged in a “RAID array” addressed by the operating system as a single virtual disk comprising one or more volumes.
Many operating systems implement RAID in software as a layer that abstracts multiple physical storage devices to provide a single virtual device as a component of a file system or as a more generic logical volume manager (typical for server systems). Server system implementations typically provide volume management which allows a system to present logical volumes for use. As such, a volume is a single accessible storage area within a single file system that represents a single logical disk drive, and thus a volume is the logical interface used by an operating system to access data stored in a file system that can be distributed over multiple physical devices.
In storage systems such as RAID, a disk array controller (DAC) is used to manage the physical disk drives and present them as logical units or volumes to the computing system. When the physical disk drives comprise a RAID, the disk array controller can also be referred to as a RAID controller. The DAC provides both a back-end interface and a front-end interface. The back-end interface communicates with the controlled disks using a protocol such as, for example, ATA, SATA, SCSI, FC, or SAS. The front-end interface communicates with a computer system using one of the disk protocols such as, for example, ATA, SATA, SCSI, or FC (to transparently emulate a disk for the computer system) or specialized protocols such as FICON/ESCON, iSCSI, HyperSCSI, ATA over Ethernet, or InfiniBand. The DAC may use different protocols for back-end and front-end communication.
External disk arrays, such as a storage area network (SAN) or network-attached storage (NAS) servers, are physically independent enclosures of disk arrays. A storage area network (SAN) is a dedicated storage network that provides access to consolidated block-level storage, and is primarily are used to make storage devices (such as disk arrays) accessible to servers so that the devices appear as locally attached to those servers. A SAN typically comprises its own intra-network of storage devices that are generally not directly accessible by regular devices. A SAN alone does not provide the “file” abstraction, only block-level operations on virtual blocks of data; however, file systems built on top of SANs do provide this abstraction and are known as SAN file systems or shared disk file systems. Virtual blocks, or “block virtualization,” are the abstraction (of separation) of logical storage from physical storage so that data may be accessed without regard to physical storage or heterogeneous structure and thereby allows the storage system greater flexibility in how its manage it physical storage.
Network-attached storage (NAS), on the other hand, is file-level computer data storage connected to a computer network providing data access to heterogeneous clients. NAS systems typically comprise one or more hard drives often arranged into logical redundant storage containers or RAID arrays. Network-attached storage (NAS), in contrast to SAN, does not attempt to appear as locally attached but, instead, uses several file-based sharing protocols such as NFS, SMB/CIFS, of AFP to enable remote computers to request a portion of an abstract file (rather than a disk block). As such, an NAS may comprise a SAN and/or a disk array, and an “NAS gateway” can be added to a SAN to effectively convert it into a NAS since NAS provides both storage and a file system whereas SAN provides only block-based storage and leaves file system concerns to the client. NAS can also be used to refer to the enclosure containing one or more disk drives (which may be configured as a RAID array) along with the equipment necessary to make the storage available over a computer network (including a dedicated computer designed to operate over the network).
Of course, there are also several non-RAID storage architectures available today, including, for example, the Single Large Expensive Drive (SLED) which, as the name implies, comprises single drive, as well as disk arrays without any additional control—and thus accessed simply as independent drives—which are often referred to as the “Just a Bunch Of Disks” (JBOD) architecture. For the various implementations disclosed herein, the use of RAID or a RAID array can be easily substituted with one of the several non-RAID storage architectures, and thus references to RAID or a RAID array are merely exemplary and are in no way intended to be limiting.
While the clients 110 and 112, servers 121 and 122, NAS servers 140 and 144, and NAS gateway 142 are illustrated as being connected by the network 120, in some implementations it is contemplated that these systems may be directly connected to each other or even executed by the same computing system. Similarly, while the storage devices 182, 184, 186, 188, 192, and 194 are shown as connected to one of a client or a server, in some implementations it is contemplated that the storage devices 182, 184, 186, 188, 192, and 194 may be connected to each other or to more than one client and/or server, and that such connections may be made over the network 120 as well as directly. This is also true for the SANs 150, 152, and 154, although each SAN's own intra-network of storage devices are generally not directly accessible by regular devices.
In some implementations, the clients 110 and 112 may include a desktop personal computer, workstation, laptop, PDA, cell phone, smart phone, or any WAP-enabled device or any other computing device capable of interfacing directly or indirectly with the network 120. The clients 110 and 112 may run an HTTP client (e.g., a web-browsing program) or a WAP-enabled browser in the case of a cell phone, PDA or other wireless device, or the like, allowing a user of the clients 110 and 112 to access information available to it at the servers 121 and 122 or to provide information to the servers 121 and 122. Other applications may also be used by the clients 110 and 112 to access or provide information to the servers 121 and 122, for example. In some implementations, the servers 121 and 122 may be implemented using one or more general purpose computing systems.
Because different segments of data are kept on different physical disks in a striped disk array, the failure of one physical disk can result in the corruption of the full data sequence; consequently, the failure rate of the disk array is the sum of the failure rate of each storage device. However, this disadvantage of striping can be overcome by the storage of redundant information for the purpose of error correction, and parity is one approach for doing so.
Parity is data used to achieve redundancy such that, if a physical disk in the disk array fails, the remaining data on the other drives can be combined with the parity data to reconstruct the missing data. To calculate parity data for two physical drives, a Boolean XOR (“exclusive or”) function is performed on the corresponding data bit-by-bit. Referring to FIG. 2B , for example, the parity of stripe A (Ap) is calculated as follows: Ap=A1 XOR A2 XOR A3. The resulting parity data is then stored on a separate physical drive from its inputs, and the parity information can be maintained on its own separate physical disk or, as shown in FIG. 2B , spread across all of the physical drives in the array (known as “distributed parity”). Then, should any of the three physical disks fail, the contents of the failed physical disk can be reconstructed by combining the data from the remaining physical drives with the parity data using the same XOR operation. For example, if “Disk 2” of FIG. 2B failed, A3 can be rebuilt using the XOR results of the contents of the two remaining disks, A1 (“Disk 0”) and A2 (“Disk 1”), and the parity data Ap (“Disk 3”) as follows: A3=A1 XOR A2 XOR Ap. This same approach can be used to reconstruct the other data on the failed drive (i.e., C2, D2, etc.) including any lost parity data (Bp, etc.).
A RAID-5 disk array, as illustrated in FIG. 2B , uses block-level striping (where a stripe is a concurrent series of blocks, one block for each disk in the array) with parity data distributed across all member disks. Data is also written to each physical disk one block at a time. However, whenever a block (or some portion thereof) is updated and needs to be written to the physical disk, the parity block (or some portion thereof) must also be recalculated and rewritten. Thus for example—and referring again to FIG. 2B —if a small portion of block A1 is to be rewritten (i.e., updated with new data in an I/O write), the entire block A1 must first be read (as an entire block) in order to update the block with the new information and perform a subsequent write (as an entire block). In addition, the corresponding parity block Ap must also be read (as an entire block) and, for each bit “flipped” (changed from a 0 to a 1 or vice versa) in the data block A1 due to the write operation, a corresponding bit in the parity block Ap must also be flipped before the parity block Ap can be rewritten to its physical disk. Consequently, each block-level write requires at least two reads and two writes to complete (although the reads can be conducted in parallel, as can the writes).
For certain storage system implementations, random writes may be prolific. For example, iSCSI-based storage servers are often utilized as backend storage for database servers. Since I/O requests from database servers to the disks are typically 8 KB in size, these storage servers would be receiving numerous random 8 KB I/O write requests. However, certain VMs may utilize I/O tracking granularity of 64 KB, these 8 KB I/Os may need to be converted to 64 KB with a read-modify-write sequence as well to sequence the 8 KB random I/Os, thereby resulting in the random write I/O issue described above.
Yet while the random write I/O issue is particularly costly for small write operations (i.e., operations involving a single block), larger sequential writes that span the entire width of the stripe (i.e., a “full-stripe write”)—in the example of FIG. 2B , comprising three blocks at a time (e.g., A1, A2, and A3)—are much less costly because no read operations are required; instead, the new full-stripe write data (including the new calculated parity block) can simply be written over the entire stripe (as four concurrent write operations) without regard for the old data that is no longer needed for any purpose. Thus full-stripe writes are nearly as efficient as read operations that do not require parity data (except to correct for an error when detected).
Various implementations disclosed herein are directed to accelerating “random writes” (writes comprising less than a complete stripe of data) by consolidating several random writes together to create a “sequential write” (a full-stripe write) to eliminate one or more read operations and/or increase the volume of new/updated data stored for each write operation. Several such implementations comprise functionality in the VM (volume manager) for identifying random write I/O requests, queuing them locally in a journal, and then periodically flushing the journal to the disk array as a sequential write request. For data in the journal, the VM must track the journal, handle read/write I/O requests made to data cached in the journal, and periodically flush the journal to maintain adequate caching space for newer incoming random writes.
For several implementations, the VM may have recent I/O pattern history for each volume so that, when an I/O write request for the volume is received, the VM is able to determine if the I/O write request is seemingly sequential or random by comparing the I/O write request to other recent I/Os to see if they together comprise substantially consecutive blocks indicative of sequential data. If the I/O write request seems unrelated to other recent I/O, however, then the VM will deem that I/O to be random and direct it to the journal.
In addition, the VM may take advantage of data from a caching module (such as an Advanced Caching Module, or ACM) layered between iSCSI module and the VM, in which case the VM consults the caching module to determine whether the I/O is random or sequential. For example, the caching module, using valid bitmap data maintained for each chunk in sector granularity, is aware of adjacent valid bits for incoming I/O and can check regions already valid in cache and, if not valid, then conclude that the incoming I/O is random.
In certain VM implementations, the VM may comprise a snapshot functionality, and thus an I/O request might be passed to the journal function after being processed for a snapshot. FIG. 3B is a block diagram of an I/O write request processing from a snapshot module of a VM (represented by a snapshot volume table, SVT) to either the journal of FIG. 3A or to the logical disk storage expressed by the underlying RAID array. Conversely, FIG. 3C is a block diagram of an I/O read request processing from a snapshot module of a VM (represented by a snapshot volume table, SVT) from either the journal of FIG. 3A or to the logical disk storage expressed by the underlying RAID array.
Since every read request has to search the journal table to determine whether the data is in the journal or on the logical disk, a linear search could be very expensive, especially when the journal table is very long. To speed searching, therefore, certain implementations may use a hash-based search function to speed the search. FIGS. 5A and 5B are block diagrams illustrating an exemplary embodiment of just such a hash-based search function characterized by the following text (italicized):
-
- Since the requests in question are random, an assumption can be made that there would be very few random requests per VM territory (or zone) Z0 . . . Zx during a single journal lifetime. On this assumption, if there are no journalled I/Os J0 . . . Jx in a VM territory, the corresponding VM SVT segment would point directly to just a physical territory. If there are journalled I/Os J0 . . . Jx, on the other hand, the SVT segment would point to another structure that would in addition to holding the physical territory would also hold the journal location of journalled LBAs within the territory. This would support multiple accelerations within the territory zone. For such an approach, a linked list can be built to hold the journalled I/Os J0 . . . Jx within the zone, or a linear array of such I/O's within the zone can also be maintained. Approaches for supporting both (a) multiple accelerations per zone and (b) a single acceleration per zone are illustrated in the figures. In either case, when the journal is clean, the decision about the zone size may be dynamic and computed based on how many random I/Os have hit on a particular zone in the past. Similarly, the Accelerated Zone size may be decided when the Journal is clean and computed based on Logical Disk address range such that greater LD address ranges correspond to larger zone size (and vice versa).
Above the RAID management layer sits a combination device driver that implements additional functions as extensions to the VM. Above this combination device driver a number of software components are utilized depending upon the access mechanism employed to access the data stored on the physical disks. In particular, a SAN path is provided that utilizes a cache and an iSCSI driver, and a NAS path is also provided that utilizes a cache and the XFS high-performance journaling files system, for example. As such, volumes are exposed through the SAN path while fileshares are exposed through the NAS path, although both constitute “volumes” with regard to disclosures herein pertaining to the various implementations.
The chipset 52 includes a north bridge 24 and a south bridge 26. The north bridge 24 provides an interface between the CPU 22 and the remainder of the computer 2. The north bridge 24 also provides an interface to a random access memory (“RAM”) used as the main memory 54 in the computer 2 and, possibly, to an on-board graphics adapter 30. The north bridge 24 may also include functionality for providing networking functionality through a gigabit Ethernet adapter 28. The gigabit Ethernet adapter 28 is capable of connecting the computer 2 to another computer via a network. Connections which may be made by the network adapter 28 may include LAN or WAN connections. LAN and WAN networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the internet. The north bridge 24 is connected to the south bridge 26.
The south bridge 26 is responsible for controlling many of the input/output functions of the computer 2. In particular, the south bridge 26 may provide one or more universal serial bus (“USB”) ports 32, a sound adapter 46, an Ethernet controller 60, and one or more general purpose input/output (“GPIO”) pins 34. The south bridge 26 may also provide a bus for interfacing peripheral card devices such as a graphics adapter 62. In one embodiment, the bus comprises a peripheral component interconnect (“PCI”) bus. The south bridge 26 may also provide a system management bus 64 for use in managing the various components of the computer 2. Additional details regarding the operation of the system management bus 64 and its connected components are provided below.
The south bridge 26 is also operative to provide one or more interfaces for connecting mass storage devices to the computer 2. For instance, according to an embodiment, the south bridge 26 includes a serial advanced technology attachment (“SATA”) adapter for providing one or more serial ATA ports 36 and an ATA 100 adapter for providing one or more ATA 100 ports 44. The serial ATA ports 36 and the ATA 100 ports 44 may be, in turn, connected to one or more mass storage devices storing an operating system 40 and application programs, such as the SATA disk drive 38. As known to those skilled in the art, an operating system 40 comprises a set of programs that control operations of a computer and allocation of resources. An application program is software that runs on top of the operating system software, or other runtime environment, and uses computer resources to perform application specific tasks desired by the user.
According to one embodiment of the invention, the operating system 40 comprises the LINUX operating system. According to another embodiment of the invention the operating system 40 comprises the WINDOWS SERVER operating system from MICROSOFT CORPORATION. According to another embodiment, the operating system 40 comprises the UNIX or SOLARIS operating system. It should be appreciated that other operating systems may also be utilized.
The mass storage devices connected to the south bridge 26, and their associated computer-readable media, provide non-volatile storage for the computer 2. Although the description of computer-readable media contained herein refers to a mass storage device, such as a hard disk or CD-ROM drive, it should be appreciated by those skilled in the art that computer-readable media can be any available media that can be accessed by the computer 2. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
A low pin count (“LPC”) interface may also be provided by the south bridge 26 for connecting a “Super I/O” device 70. The Super I/O device 70 is responsible for providing a number of input/output ports, including a keyboard port, a mouse port, a serial interface 72, a parallel port, and other types of input/output ports. The LPC interface may also connect a computer storage media such as a ROM or a flash memory such as a NVRAM 48 for storing the firmware 50 that includes program code containing the basic routines that help to start up the computer 2 and to transfer information between elements within the computer 2.
As described briefly above, the south bridge 26 may include a system management bus 64. The system management bus 64 may include a BMC 66. In general, the BMC 66 is a microcontroller that monitors operation of the computer system 2. In a more specific embodiment, the BMC 66 monitors health-related aspects associated with the computer system 2, such as, but not limited to, the temperature of one or more components of the computer system 2, speed of rotational components (e.g., spindle motor, CPU Fan, etc.) within the system, the voltage across or applied to one or more components within the system 2, and the available or used capacity of memory devices within the system 2. To accomplish these monitoring functions, the BMC 66 is communicatively connected to one or more components by way of the management bus 64. In an embodiment, these components include sensor devices for measuring various operating and performance-related parameters within the computer system 2. The sensor devices may be either hardware or software based components configured or programmed to measure or detect one or more of the various operating and performance-related parameters. The BMC 66 functions as the master on the management bus 64 in most circumstances, but may also function as either a master or a slave in other circumstances. Each of the various components communicatively connected to the BMC 66 by way of the management bus 64 is addressed using a slave address. The management bus 64 is used by the BMC 66 to request and/or receive various operating and performance-related parameters from one or more components, which are also communicatively connected to the management bus 64.
It should be appreciated that the computer 2 may comprise other types of computing devices, including hand-held computers, embedded computer systems, personal digital assistants, and other types of computing devices known to those skilled in the art. It is also contemplated that the computer 2 may not include all of the components shown in FIG. 6B , may include other components that are not explicitly shown in FIG. 6B , or may utilize an architecture completely different than that shown in FIG. 6B .
It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the presently disclosed subject matter, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium where, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the presently disclosed subject matter.
Although exemplary implementations may refer to utilizing aspects of the presently disclosed subject matter in the context of one or more stand-alone computer systems, the subject matter is not so limited, but rather may be implemented in connection with any computing environment, such as a network or distributed computing environment. Still further, aspects of the presently disclosed subject matter may be implemented in or across a plurality of processing chips or devices, and storage may similarly be effected across a plurality of devices. Such devices might include personal computers, network servers, and handheld devices, for example.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
Claims (20)
1. A non-transitory computer-readable medium having computer-executable instructions stored thereon for accelerating I/O performance for a striped-disk array that, when executed by a storage computer, cause the storage computer to:
receive a write I/O request directed to a portion of a stripe of the striped-disk array;
determine, prior to any forwarding of the received write I/O request to the striped-disk array, whether the write I/O request is random or sequential by comparing the write I/O request to a plurality of recent I/O requests;
in response to determining that the write I/O request is random, record the random write I/O request in a journal data storage area of the striped-disk array; and
periodically flush the journal data storage area by forming a sequential write I/O that spans a width of a stripe of the striped-disk array, the formed sequential write I/O comprising the random write I/O request and at least one other random write I/O request that is recorded in the journal data storage area.
2. The non-transitory computer-readable medium of claim 1 , having further computer-executable instructions stored thereon that, when executed by the storage computer, cause the storage computer to forward the sequential write I/O request to the striped-disk array in response to determining that the write I/O request is sequential.
3. The non-transitory computer-readable medium of claim 1 , having further computer-executable instructions stored thereon that, when executed by the storage computer, cause the storage computer to:
forward the formed sequential write I/O to the striped-disk array.
4. The non-transitory computer-readable medium of claim 3 , wherein the journal data storage area is periodically flushed in order to maintain a predetermined amount of storage capacity in the journal data storage area to accommodate incoming write I/O requests.
5. The non-transitory computer-readable medium of claim 3 , wherein the journal data storage area is periodically flushed at a time that minimizes impact on incoming I/O requests.
6. The non-transitory computer-readable medium of claim 1 , having further computer-executable instructions stored thereon that, when executed by the storage computer, cause the storage computer to:
maintain a journal table including a plurality of entries; and
in response to determining that the write I/O request is random, update an entry in the journal table corresponding to the random write I/O operation to indicate a location in the journal data storage area and the portion of the stripe of the striped-disk array to which the random write I/O request is directed.
7. The non-transitory computer-readable medium of claim 6 , having further computer-executable instructions stored thereon that, when executed by the storage computer, cause the storage computer to:
receive a read I/O request;
determine whether the journal table includes an entry corresponding to the read I/O request;
upon determining that the journal table includes an entry corresponding to the read I/O request, service at least a portion of the read I/O request from the journal data storage area; and
upon determining that the journal table does not include an entry corresponding to the read I/O request, forward the read I/O request to the striped-disk array.
8. The non-transitory computer-readable medium of claim 7 , having further computer-executable instructions stored thereon that, when executed by the storage computer, cause the storage computer to:
partition a storage capacity of the striped-disk array into zones;
maintain an accelerated zone table comprising entries that relate the zones to corresponding entries in the journal table;
determine a zone of the striped-disk array to which the read I/O request is directed; and
determine whether the journal table includes an entry corresponding to the read I/O request by searching the accelerated zone table based on the zone of the striped-disk array to which the read I/O request is directed.
9. A method for accelerating I/O performance for a striped-disk array, comprising:
receiving a write I/O request directed to a portion of a stripe of the striped-disk array;
determining, prior to any forwarding of the received write I/O request to the striped-disk array, whether the write I/O request is random or sequential by comparing the write I/O request to a plurality of recent I/O requests;
in response to determining that the write I/O request is random, recording the random write I/O request in a journal data storage area of the striped-disk array; and
periodically flushing the journal data storage area by forming a sequential write I/O that spans a width of a stripe of the striped-disk array, the formed sequential write I/O comprising the random write I/O request and at least one other random write I/O request that is recorded in the journal data storage area.
10. The method of claim 9 further comprising forwarding the sequential write I/O request to the striped-disk array in response to determining that the write I/O request is sequential.
11. The method of claim 9 further comprising:
forwarding the formed sequential write I/O to the striped-disk array.
12. The method of claim 11 , wherein the journal data storage area is periodically flushed in order to maintain a predetermined amount of storage capacity in the journal data storage area to accommodate incoming write I/O requests.
13. The method of claim 11 , wherein the journal data storage area is periodically flushed at a time that minimizes impact on incoming I/O requests.
14. The method of claim 9 , further comprising:
maintaining a journal table including a plurality of entries; and
in response to determining that the write I/O request is random, updating an entry in the journal table corresponding to the random write I/O operation to indicate a location in the journal data storage area and the portion of the stripe of the striped-disk array to which the random write I/O request is directed.
15. The method of claim 14 , further comprising:
receiving a read I/O request;
determining whether the journal table includes an entry corresponding to the read I/O request;
upon determining that the journal table includes an entry corresponding to the read I/O request, servicing at least a portion of the read I/O request from the journal data storage area; and
upon determining that the journal table does not include an entry corresponding to the read I/O request, forwarding the read I/O request to the striped-disk array.
16. The method of claim 15 , further comprising:
partitioning a storage capacity of the striped-disk array into zones;
maintaining an accelerated zone table comprising entries that relate the zones to corresponding entries in the journal table;
determining a zone of the striped-disk array to which the read I/O request is directed; and
determining whether the journal table includes an entry corresponding to the read I/O request further comprises searching the accelerated zone table based on the zone of the striped-disk array to which the read I/O request is directed.
17. A storage computer for accelerating I/O performance for a striped-disk array, comprising:
a processing unit; and
a memory communicatively connected to the processing unit that stores computer-executable instructions that, when executed by the processing unit, cause the storage computer to:
receive a write I/O request directed to a portion of a stripe of the striped-disk array;
determine, prior to any forwarding of the received write I/O request to the striped-disk array, whether the write I/O request is random or sequential by comparing the write I/O request to a plurality of recent I/O requests;
in response to determining that the write I/O request is random, record the random write I/O request in a journal data storage area of the striped-disk array; and
periodically flush the journal data storage area by forming a sequential write I/O that spans a width of a stripe of the striped-disk array, the formed sequential write I/O comprising the random write I/O request and at least one other random write I/O request that is recorded in the journal data storage area.
18. The storage computer of claim 17 , wherein the memory stores further computer-executable instructions that, when executed by the processing unit, cause the storage computer to:
forward the formed sequential write I/O to the striped-disk array.
19. The storage computer of claim 17 , wherein the memory stores further computer-executable instructions that, when executed by the processing unit, cause the storage computer to:
maintain a journal table including a plurality of entries; and
update an entry in the journal table corresponding to the random write I/O operation to indicate a location in the journal data storage area and the portion of the stripe of the striped-disk array to which the random write I/O request is directed.
20. The storage computer of claim 19 , wherein the memory stores further computer-executable instructions that, when executed by the processing unit, cause the storage computer to:
receive a read I/O request;
determine whether the journal table includes an entry corresponding to the read I/O request;
upon determining that the journal table includes an entry corresponding to the read I/O request, service at least a portion of the read I/O request from the journal data storage area; and
upon determining that the journal table does not include an entry corresponding to the read I/O request, forward the read I/O request to the striped-disk array.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/185,522 US10067682B1 (en) | 2011-04-18 | 2016-06-17 | I/O accelerator for striped disk arrays using parity |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161476725P | 2011-04-18 | 2011-04-18 | |
US13/449,496 US9396067B1 (en) | 2011-04-18 | 2012-04-18 | I/O accelerator for striped disk arrays using parity |
US15/185,522 US10067682B1 (en) | 2011-04-18 | 2016-06-17 | I/O accelerator for striped disk arrays using parity |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/449,496 Continuation US9396067B1 (en) | 2011-04-18 | 2012-04-18 | I/O accelerator for striped disk arrays using parity |
Publications (1)
Publication Number | Publication Date |
---|---|
US10067682B1 true US10067682B1 (en) | 2018-09-04 |
Family
ID=56381625
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/449,496 Active 2032-10-04 US9396067B1 (en) | 2011-04-18 | 2012-04-18 | I/O accelerator for striped disk arrays using parity |
US15/185,522 Active 2032-07-02 US10067682B1 (en) | 2011-04-18 | 2016-06-17 | I/O accelerator for striped disk arrays using parity |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/449,496 Active 2032-10-04 US9396067B1 (en) | 2011-04-18 | 2012-04-18 | I/O accelerator for striped disk arrays using parity |
Country Status (1)
Country | Link |
---|---|
US (2) | US9396067B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190146911A1 (en) * | 2017-11-13 | 2019-05-16 | SK Hynix Inc. | Memory system and operating method thereof |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10169152B2 (en) * | 2016-09-12 | 2019-01-01 | International Business Machines Corporation | Resilient data storage and retrieval |
US10216660B1 (en) * | 2017-07-13 | 2019-02-26 | EMC IP Holding Company LLC | Method and system for input/output (IO) scheduling in a storage system |
US10282116B2 (en) * | 2017-07-19 | 2019-05-07 | Avago Technologies International Sales Pte. Limited | Method and system for hardware accelerated cache flush |
KR102697883B1 (en) * | 2018-09-27 | 2024-08-22 | 삼성전자주식회사 | Method of operating storage device, storage device performing the same and storage system including the same |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5557770A (en) | 1993-03-24 | 1996-09-17 | International Business Machines Corporation | Disk storage apparatus and method for converting random writes to sequential writes while retaining physical clustering on disk |
US5933834A (en) | 1997-10-16 | 1999-08-03 | International Business Machines Incorporated | System and method for re-striping a set of objects onto an exploded array of storage units in a computer system |
US6148368A (en) | 1997-07-31 | 2000-11-14 | Lsi Logic Corporation | Method for accelerating disk array write operations using segmented cache memory and data logging |
US6516380B2 (en) | 2001-02-05 | 2003-02-04 | International Business Machines Corporation | System and method for a log-based non-volatile write cache in a storage controller |
US20030182502A1 (en) | 2002-03-21 | 2003-09-25 | Network Appliance, Inc. | Method for writing contiguous arrays of stripes in a RAID storage system |
US20030225970A1 (en) | 2002-05-28 | 2003-12-04 | Ebrahim Hashemi | Method and system for striping spares in a data storage system including an array of disk drives |
US20040128470A1 (en) | 2002-12-27 | 2004-07-01 | Hetzler Steven Robert | Log-structured write cache for data storage devices and systems |
US7076606B2 (en) | 2002-09-20 | 2006-07-11 | Quantum Corporation | Accelerated RAID with rewind capability |
US20070283086A1 (en) * | 2006-06-06 | 2007-12-06 | Seagate Technology Llc | Write caching random data and sequential data simultaneously |
US20090249018A1 (en) | 2008-03-28 | 2009-10-01 | Hitachi Ltd. | Storage management method, storage management program, storage management apparatus, and storage management system |
US7606944B2 (en) | 2007-05-10 | 2009-10-20 | Dot Hill Systems Corporation | Dynamic input/output optimization within a storage controller |
US20100211736A1 (en) | 2009-02-18 | 2010-08-19 | Gang Chen | Method and system for performing i/o operations on disk arrays |
US7853751B2 (en) | 2008-03-12 | 2010-12-14 | Lsi Corporation | Stripe caching and data read ahead |
-
2012
- 2012-04-18 US US13/449,496 patent/US9396067B1/en active Active
-
2016
- 2016-06-17 US US15/185,522 patent/US10067682B1/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5557770A (en) | 1993-03-24 | 1996-09-17 | International Business Machines Corporation | Disk storage apparatus and method for converting random writes to sequential writes while retaining physical clustering on disk |
US6148368A (en) | 1997-07-31 | 2000-11-14 | Lsi Logic Corporation | Method for accelerating disk array write operations using segmented cache memory and data logging |
US5933834A (en) | 1997-10-16 | 1999-08-03 | International Business Machines Incorporated | System and method for re-striping a set of objects onto an exploded array of storage units in a computer system |
US6516380B2 (en) | 2001-02-05 | 2003-02-04 | International Business Machines Corporation | System and method for a log-based non-volatile write cache in a storage controller |
US20030182502A1 (en) | 2002-03-21 | 2003-09-25 | Network Appliance, Inc. | Method for writing contiguous arrays of stripes in a RAID storage system |
US20030225970A1 (en) | 2002-05-28 | 2003-12-04 | Ebrahim Hashemi | Method and system for striping spares in a data storage system including an array of disk drives |
US7076606B2 (en) | 2002-09-20 | 2006-07-11 | Quantum Corporation | Accelerated RAID with rewind capability |
US20040128470A1 (en) | 2002-12-27 | 2004-07-01 | Hetzler Steven Robert | Log-structured write cache for data storage devices and systems |
US20070283086A1 (en) * | 2006-06-06 | 2007-12-06 | Seagate Technology Llc | Write caching random data and sequential data simultaneously |
US7606944B2 (en) | 2007-05-10 | 2009-10-20 | Dot Hill Systems Corporation | Dynamic input/output optimization within a storage controller |
US7853751B2 (en) | 2008-03-12 | 2010-12-14 | Lsi Corporation | Stripe caching and data read ahead |
US20090249018A1 (en) | 2008-03-28 | 2009-10-01 | Hitachi Ltd. | Storage management method, storage management program, storage management apparatus, and storage management system |
US20100211736A1 (en) | 2009-02-18 | 2010-08-19 | Gang Chen | Method and system for performing i/o operations on disk arrays |
Non-Patent Citations (2)
Title |
---|
Rosenblum, M., et al., "The Design and Implementation of a Log-Structured File System," ACM Transactions on Computer Systems, vol. 10, No. 1, 1992, pp. 26-52. |
Stodolsky, D., et al., "Parity Logging, Overcoming the Small Write Problem in Redundant Disk Arrays," 20th Annual International Symposium on Computer Architecture, San Diego, CA, May 16-19, 1993, 12 pages. |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190146911A1 (en) * | 2017-11-13 | 2019-05-16 | SK Hynix Inc. | Memory system and operating method thereof |
US10997065B2 (en) * | 2017-11-13 | 2021-05-04 | SK Hynix Inc. | Memory system and operating method thereof |
Also Published As
Publication number | Publication date |
---|---|
US9396067B1 (en) | 2016-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10877940B2 (en) | Data storage with a distributed virtual array | |
US10949108B2 (en) | Enhanced application performance in multi-tier storage environments | |
JP6294518B2 (en) | Synchronous mirroring in non-volatile memory systems | |
US9606918B2 (en) | Methods and systems for dynamically controlled caching | |
US9021335B2 (en) | Data recovery for failed memory device of memory device array | |
US9298386B2 (en) | System and method for improved placement of blocks in a deduplication-erasure code environment | |
JP6240071B2 (en) | Computer system and method for effectively managing mapping table in storage system | |
US10067682B1 (en) | I/O accelerator for striped disk arrays using parity | |
EP2799973B1 (en) | A method for layered storage of enterprise data | |
US9760574B1 (en) | Managing I/O requests in file systems | |
US9367395B1 (en) | Managing data inconsistencies in storage systems | |
US9524104B2 (en) | Data de-duplication for information storage systems | |
US7565384B2 (en) | Method and apparatus for archive data validation in an archive system | |
US20150081981A1 (en) | Generating predictive cache statistics for various cache sizes | |
US20120047511A1 (en) | Throttling storage initialization for data destage | |
US10929229B2 (en) | Decentralized RAID scheme having distributed parity computation and recovery | |
US10705853B2 (en) | Methods, systems, and computer-readable media for boot acceleration in a data storage system by consolidating client-specific boot data in a consolidated boot volume | |
US8239645B1 (en) | Managing mirroring in data storage system having fast write device and slow write device | |
US11868612B1 (en) | Managing storage operations in storage systems | |
EP3094055B1 (en) | Data storage with a distributed virtual array | |
US11061818B1 (en) | Recovering from write cache failures in servers | |
US11409666B2 (en) | Techniques for providing I/O hints using I/O flags | |
US20240119162A1 (en) | Drive cryptographic value management | |
US10719401B2 (en) | Increasing data recoverability during central inode list loss | |
Thomasian | RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |