WO2017079373A1 - Redundant disk array using heterogeneous disks - Google Patents

Redundant disk array using heterogeneous disks Download PDF

Info

Publication number
WO2017079373A1
WO2017079373A1 PCT/US2016/060232 US2016060232W WO2017079373A1 WO 2017079373 A1 WO2017079373 A1 WO 2017079373A1 US 2016060232 W US2016060232 W US 2016060232W WO 2017079373 A1 WO2017079373 A1 WO 2017079373A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage devices
read
disks
data
disk
Prior art date
Application number
PCT/US2016/060232
Other languages
French (fr)
Inventor
András Krisztián FEKETE
Elizabeth Varki
Original Assignee
University Of New Hampshire
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of New Hampshire filed Critical University Of New Hampshire
Publication of WO2017079373A1 publication Critical patent/WO2017079373A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling

Definitions

  • This disclosure relates to the field of electronic data storage devices, and more particularly, to a redundant disk array configuration that optimizes array disk throughput using heterogeneous disks having different access speeds.
  • RAID redundant array of independent disks
  • RAID configurations utilize disks having equal physical size.
  • a disk in a RAID array fails, it is typically replaced with a similarly sized disk.
  • disks are increasing in speed. Therefore, replacement disks may be faster than the disks they are replacing.
  • RAID array has disks of unequal speed, such arrays are constrained by the speed of the slowest disk in the array.
  • Fig. 1 shows an example redundant disk array, in accordance with an embodiment of the present disclosure.
  • Fig. 2 shows the experimental results of performance tests on a RAID, in accordance with embodiments of the present disclosure.
  • Fig. 3 shows the experimental results of performance tests on a RAID, in accordance with other embodiments of the present disclosure.
  • Fig. 4 shows an example redundant disk array, in accordance with another embodiment of the present disclosure.
  • FIG. 5 is a flow diagram of several example methodologies for reading data from redundant disk array using heterogeneous disks, in accordance with embodiments of the present disclosure.
  • Fig. 6 is a flow diagram of several other example methodologies for reading data from redundant disk array using heterogeneous disks, in accordance with embodiments of the present disclosure.
  • Fig. 7 is a flow diagram of another example methodology for reading data from redundant disk array using heterogeneous disks, in accordance with an embodiment of the present disclosure.
  • Fig. 8 is a block diagram representing an example computing device that may be used in accordance with an embodiment of the present disclosure.
  • SSDs have been used as a cache device, storage of most commonly accessed blocks called I-CASH, or storing data on SSDs while the change logs are stored on hard disk drives (HDDs) called E-HASH.
  • I-CASH storage of most commonly accessed blocks
  • HDDs hard disk drives
  • embodiments of this disclosure are directed to techniques for use in a system that contains identical data across heterogeneous disks.
  • a RAIDl system having two disks, such as shown in Fig. 1, there is a data disk and a copy disk.
  • the copy disk contains a copy of the data stored on the data disk.
  • the additional disks are used as copy disks instead of additional storage.
  • the total storage capacity of the array is the same as the storage capacity of the data disk, even if the disks have different capacities.
  • the disclosed techniques improve the throughput of a heterogeneous array that contains disks with different speeds, such as an SSD and HDD combination.
  • the read speeds can be increased by splitting up each of the data requests and utilizing parallel reads across more than one of the disks in the array. Numerous configurations and variations will be apparent in view of the present disclosure.
  • RAID Redundant Arrays of Inexpensive Disks
  • RAIDO The simplest level is RAIDO and RAIDl also known as striping and mirroring respectively. This is the simplest kind of RAID that can be implemented with as little as two drives. More advanced RAID levels are ones like RAID5 (single parity) and RAID6 (double parity). Parity refers to how redundant the algorithm is which determines how many disks can die before data loss occurs.
  • Various embodiments are implemented with a RAIDl heterogeneous disk array, where each disk contains an exact copy, or mirror, of a set of data, or in another disk array where some disks contain a subset of mirrored data while other disks contain a different subset of mirrored data.
  • the total storage space available for data is equal to the size of the smallest disk in the array.
  • the read and write accesses are spread evenly across all of the disks (e.g., the data disk and the copy disk of Fig. 1). When all the disks are identical, the algorithm that gives the best performance is one where each transaction is evenly divided among the disks in the array.
  • disks with faster read operations such as solid state disks or SSDs
  • the copy disk can be used for at least some read operations even while the data disk is available for performing the read.
  • a RAID write algorithm in accordance with an embodiment of the present disclosure includes putting incoming data packet on a queue.
  • a write instruction is issued to each disk in the array so that the data is written to all of the disks in the RAID.
  • the system begins to process the next transaction.
  • the length of the queue is configured to be equal to the number of disks in the array. This guarantees that at least one disk has written the data packet before another is accepted onto the queue.
  • the queue can be shared among all the disks, as it is unnecessary to create and maintain a queue per disk for each transaction. This reduces memory requirements as well.
  • each disk in the array can perform a read operation (reading data from the disk) or a write operation (writing data to the disk). If a disk is neither reading nor writing, the disk is considered idle.
  • One example algorithm for reading data from a heterogeneous disk array uses only the fastest disk from the set of all idle disk for reads, provided that such a disk contains the requested data. If there are no idle disks available, the algorithm will wait for the first disk to become idle. A synchronous read is issued to the idle disk since it is not necessary to incur the overhead of an asynchronous command to assemble a separate thread to handle the request and then wait for that thread to complete.
  • Another example algorithm for reading data from a heterogeneous disk array makes use of any idle disks for a particular read, provided that such disks contain the requested data, by dividing up the packet into equal sizes and issuing each piece of the divided packet to the idle disks one at a time. When the read of each divided packet is complete, the next piece of the divided packet is issued to the next idle disk.
  • Another example algorithm for reading data from a heterogeneous disk array in accordance with an embodiment, is similar to the preceding example algorithm except that it uses asynchronous reads so that all the pieces of the divided packet are read in parallel from the idle disks, rather than one at a time.
  • This algorithm will yield the best performance when the overhead of the asynchronous call is negligible compared to the actual read time. As the disks in the array have a larger range of speeds, performance begins to suffer and the asynchronous process becomes a significant cost. In such cases, it is best to revert back to the one-read-at-a-time algorithm and use a synchronous read.
  • Another example algorithm for reading data from a heterogeneous disk array in accordance with an embodiment, exclusively uses the fastest disk in the array for all read transactions, provided that such a disk contains the requested data.
  • the algorithm will clear the disk's queue of the remaining writes, execute the read, and then re-queue the writes that were cleared from the queue.
  • There is an overhead cost to pausing the queued writes but in terms of throughput in an array which has disks that have orders of magnitude difference in speeds, the disk that has the fastest disk will also likely have the fastest write speeds thus it will process the transactions in time with the slower disks.
  • this algorithm performs best when there is one disk that is much faster than all the rest.
  • Another example algorithm for reading data from a heterogeneous disk array emulates how RAIDl is implemented where all disks are identical (homogeneous). With this algorithm, all of the disks in the array are added to the queue of transactions for retrieving their respective piece of the data packet, provided that such disks contain the requested data. Since a particular disk may be busy with other writes, the read occurs after all the writes to that disk are complete. The transaction completes after the read finishes, which by proxy causes the queues to be empty at the end of each read. In a homogeneous system this is not a concern, but in a heterogeneous array it can cause great speed reductions.
  • Another example algorithm for reading data from a heterogeneous disk array uses a combination of techniques described above. When there are small transactions (such as a block or two) requested, it does not make sense to issue a read to multiple disks which will put the entire track into its cache. It is sufficient to have only the fastest disk do this. For example, when fewer than 8 blocks (32KB) or another user- specified amount of data is requested, then the fastest idle disk may be used, otherwise the data may be read from all idle disks in parallel.
  • Another example algorithm for reading data from a heterogeneous disk array includes issuing a read data request to multiple (but not necessarily all) idle storage devices. Whichever disk responds with the results first is the one used for reading the requested data. This technique may be effective, for example, in applications such as multiple servers processing read data requests.
  • a test system consisted of a Dell PowerEdge 2900 with a Intel Xeon E5420 processor and 8GB of DDR2 RAM.
  • the disks used in this test are shown in Table 1.
  • the bandwidths in this table shows the measured read and write speeds of each disk for random I/O using the techniques described herein. For sequential access, the speeds are much faster.
  • SSD storage was simulated using RAM. With Linux, it is possible to create files in RAM using the tmpfs kernel module. The file size used for experiments was 300MB to allow for multiple virtual disks. 2GB RAM files were tested in a small subset of experiments and gave similar results. Once these files are created, they can be mounted as a loopback filesystem to create a block device. The read and write bandwidths for such a setup are roughly 420MB/s and 390MB/s respectively.
  • FIO FIO was used as a test platform which is a tool that is designed to test various I/O. It is capable of creating various sequences of I/O that can be precisely controlled.
  • random requests were used to simulate disk requests from multiple sources. It is the type of workload that allows for the least amount of bandwidth due to the nature of traditional hard drives where seek times are usually what takes the longest.
  • the ratio of reads to writes is varied across the tests from 10% to 100% in increments of 10%. Each test is set up to run for 60 seconds. FIO issues transactions continuously until time has expired. The read/write bandwidth is determined by the data transferred in 60 seconds. Issuing a heavy stream of random traffic for a minute ensures that any cache of the disks is exhausted and gives a benchmark for a true performance of the algorithm. In the results, a 0% read was omitted from the graph, as it is a constant value in all test cases because only the read algorithm is being modified.
  • FIG. 2 the experimental results of the six algorithms (discussed above) are compared, in accordance with embodiments of the present disclosure.
  • these algorithms are labeled aiol through aio6.
  • the left set of graphs show the stacked throughput of reads and writes using two slow disks.
  • the disks used in the two slow disk test were SDA and SDD, the slowest and fastest magnetic drives. From this it is possible to visually see the gain in total throughput.
  • FIG. 3 shows a comparison between a 7 HDD array and one with two additional RAM drives added, in accordance with embodiments of the present disclosure. This larger system performed differently compared to the one with only two disks for all algorithms.
  • Algorithms that use multiple disks benefit from the parallelism that the additional disks give even if the disks are slow. This can be seen, for instance, in aio3, aio5 and aio6.
  • FIG. 4 shows an example of such a layout where each group (Group A, B and C) is a RAIDl array, in accordance with an embodiment of the present disclosure. If the disks making up any group have different speeds, the overall speed of the array can be enhanced by strategically replacing disks. When each group contains a fast disk and a slow but reliable disk, it will create a faster and more practical system overall.
  • Fig. 5 is a flow diagram of several example methodologies for reading data from redundant disk array using heterogeneous disks, in accordance with embodiments of the present disclosure.
  • Method 500 includes receiving 502, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID), where at least two of the storage devices are heterogeneous and have different read speeds.
  • the method 500 further includes identifying 504, by the processor and in response to receiving the request, which of the storage devices are idle.
  • the method 500 further includes causing 506, by the processor, one or more of the idle storage devices to read the requested data.
  • the causing 506 of the one or more idle storage devices to read the requested data includes causing 508 only a fastest one of the idle storage devices to read the requested data, such as described above with respect to the aiol algorithm.
  • the causing 506 of the one or more idle storage devices to read the requested data includes causing 510 each of the idle storage devices to read different portions of the requested data from each of the idle storage devices synchronously, such as described above with respect to the aio2 algorithm.
  • the causing 506 of the one or more idle storage devices to read the requested data includes causing 512 each of the idle storage devices to read different portions of the requested data from each of the idle storage devices asynchronously, such as described above with respect to the aio3 algorithm.
  • the causing 506 of the one or more idle storage devices to read the requested data includes causing 514 only a fastest one of the idle storage devices to read the requested data, where fewer than 32 kilobytes of data are requested, otherwise causing each of the idle storage devices to read different portions of the requested data from each of the idle storage devices asynchronously, such as described above with respect to the aio6 algorithm.
  • Fig. 6 is a flow diagram of several other example methodologies for reading data from redundant disk array using heterogeneous disks, in accordance with embodiments of the present disclosure.
  • Method 600 includes receiving 602, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID), where at least two of the storage devices are heterogeneous and have different read speeds.
  • the method 600 further includes identifying 604, by the processor and in response to receiving the request, which of the storage devices is fastest.
  • the method 600 further includes causing 606, by the processor, the fastest storage device to read the requested data, such as described above with respect to the aio4 algorithm.
  • the method 600 further includes clearing 608 a queue of write requests prior to causing the fastest storage device to read the requested data, and restoring 610 the queue of write requests subsequent to causing the fastest storage device to read the requested data, such as further described above with respect to the aio4 algorithm.
  • Fig. 7 is a flow diagram of another example methodology for reading data from redundant disk array using heterogeneous disks, in accordance with an embodiment of the present disclosure.
  • Method 700 includes receiving 702, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID), where at least two of the storage devices are heterogeneous and have different read speeds.
  • the method 700 further includes waiting 704, by the processor and in response to receiving 702 the request, until all pending requests to write data to the storage devices have completed.
  • the method 700 further includes causing 706, by the processor, all of the storage devices to read the requested data after all pending requests to write the data to the storage devices have completed, such as described above with respect to the aio5 algorithm.
  • Fig. 8 is a block diagram representing an example computing device 1000 that may be used to perform any of the techniques as variously described in this disclosure.
  • any of the algorithms (aiol through aio6, described above) or methodologies 500, 600, 700, may be implemented in the computing device 1000.
  • the computing device 1000 may be any computer system, such as a workstation, desktop computer, server, laptop, handheld computer, tablet computer (e.g., the iPadTM tablet computer), mobile computing or communication device (e.g., the iPhoneTM mobile communication device, the AndroidTM mobile communication device, and the like), or other form of computing or telecommunications device that is capable of communication and that has sufficient processor power and memory capacity to perform the operations described in this disclosure.
  • a distributed computational system may be provided comprising a plurality of such computing devices.
  • the computing device 1000 includes one or more storage devices 1010 and/or non-transitory computer-readable media 1020 having encoded thereon one or more computer- executable instructions or software for implementing techniques as variously described in this disclosure.
  • the storage devices 1010 may include a computer system memory or random access memory, such as a durable disk storage (which may include any suitable optical or magnetic durable storage device, e.g., RAM, ROM, Flash, USB drive, or other semiconductor-based storage medium), a hard-drive, CD-ROM, or other computer readable media, for storing data and computer-readable instructions and/or software that implement various embodiments as taught in this disclosure.
  • the storage device 1010 may include other types of memory as well, or combinations thereof.
  • the storage device 1010 may be provided on the computing device 1000 or provided separately or remotely from the computing device 1000.
  • the non-transitory computer-readable media 1020 may include, but are not limited to, one or more types of hardware memory, non-transitory tangible media (for example, one or more magnetic storage disks, one or more optical disks, one or more USB flash drives), and the like.
  • the non -transitory computer-readable media 1020 included in the computing device 1000 may store computer-readable and computer-executable instructions or software for implementing various embodiments.
  • the computer-readable media 1020 may be provided on the computing device 1000 or provided separately or remotely from the computing device 1000.
  • the computing device 1000 also includes at least one processor 1030 for executing computer-readable and computer-executable instructions or software stored in the storage device 1010 and/or non-transitory computer-readable media 1020 and other programs for controlling system hardware.
  • Virtualization may be employed in the computing device 1000 so that infrastructure and resources in the computing device 1000 may be shared dynamically. For example, a virtual machine may be provided to handle a process running on multiple processors so that the process appears to be using only one computing resource rather than multiple computing resources. Multiple virtual machines may also be used with one processor.
  • a user may interact with the computing device 1000 through an output device 1040, such as a screen or monitor, which may display one or more user interfaces provided in accordance with some embodiments.
  • the output device 1040 may also display other aspects, elements and/or information or data associated with some embodiments.
  • the computing device 1000 may include other I/O devices 1050 for receiving input from a user, for example, a keyboard, a joystick, a game controller, a pointing device (e.g., a mouse, a user's finger interfacing directly with a display device, etc.), or any suitable user interface.
  • the computing device 1000 may include other suitable conventional I/O peripherals, such as a camera 1052.
  • the computing device 1000 can include and/or be operatively coupled to various suitable devices for performing one or more of the functions as variously described in this disclosure.
  • the computing device 1000 may run any operating system, such as any of the versions of Microsoft® Windows® operating systems, the different releases of the Unix and Linux operating systems, any version of the MacOS® for Macintosh computers, any embedded operating system, any real-time operating system, any open source operating system, any proprietary operating system, any operating systems for mobile computing devices, or any other operating system capable of running on the computing device 1000 and performing the operations described in this disclosure.
  • the operating system may be run on one or more cloud machine instances.
  • the functional components/modules may be implemented with hardware, such as gate level logic (e.g., FPGA) or a purpose-built semiconductor (e.g., ASIC). Still other embodiments may be implemented with a microcontroller having a number of input/output ports for receiving and outputting data, and a number of embedded routines for carrying out the functionality described in this disclosure. In a more general sense, any suitable combination of hardware, software, and firmware can be used, as will be apparent.
  • gate level logic e.g., FPGA
  • ASIC purpose-built semiconductor
  • modules and components can be implemented in software, such as a set of instructions (e.g., HMTL, XML, C, C++, object-oriented C, JavaScript, Java, BASIC, etc.) encoded on any computer readable medium or computer program product (e.g., hard drive, server, disc, or other suitable non-transient memory or set of memories), that when executed by one or more processors, cause the various methodologies provided in this disclosure to be carried out.
  • any computer readable medium or computer program product e.g., hard drive, server, disc, or other suitable non-transient memory or set of memories
  • various functions performed by the user computing system can be performed by similar processors and/or databases in different configurations and arrangements, and that the depicted embodiments are not intended to be limiting.
  • Various components of this example embodiment, including the computing device 1000 can be integrated into, for example, one or more desktop or laptop computers, workstations, tablets, smart phones, game consoles, set- top boxes, or other such computing devices.
  • Other componentry and modules typical of a computing system such as processors (e.g., central processing unit and co-processor, graphics processor, etc.), input devices (e.g., keyboard, mouse, touch pad, touch screen, etc.), and operating system, are not shown but will be readily apparent.
  • An example embodiment provides a computer-implemented method including receiving, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID). At least two of the storage devices on which the requested data are stored are heterogeneous and have different read speeds, such that the read speed of at least one of the storage devices is faster than the read speed of at least another one of the storage devices.
  • the method further includes identifying, by the processor and in response to receiving the request, which of the storage devices are idle and causing, by the processor, one or more of the idle storage devices to read the requested data.
  • the requested data is read from only a fastest one (or ones) of the idle storage devices. If none of the storage devices are idle, then the method includes waiting until at least one of the storage devices is idle before causing the respective storage device(s) to read the requested data. In some cases, two or more of the storage devices are simultaneously idle, and the method includes causing, by the processor, each of the idle storage devices to read different portions of the requested data from each of the idle storage devices synchronously. In this manner, one portion of the requested data is read from one of the idle storage devices and another portion of the requested data is ready from a different one of the idle storage devices synchronously.
  • the method includes causing, by the processor, each of the idle storage devices to read different portions of the requested data from each of the idle storage devices asynchronously. In this manner, one portion of the requested data is read from one of the idle storage devices and another portion of the requested data is ready from a different one of the idle storage devices asynchronously.
  • the method includes causing, by the processor, only a fastest one of the idle storage devices to read the requested data where less than a user-specified amount of data are requested, otherwise causing, by the processor, each of the idle storage devices to read different portions of the requested data from each of the idle storage devices asynchronously.
  • Another example embodiment provides a computer-implemented method including receiving, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID). At least two of the storage devices are heterogeneous and have different read speeds, such that the read speed of at least one of the storage devices is faster than the read speed of at least another one of the storage devices. The method further includes identifying, by the processor and in response to receiving the request, which of the storage devices is fastest and causing, by the processor, the fastest storage device to read the requested data.
  • RAID redundant array of independent disks
  • the method includes clearing a queue of write requests prior to causing the fastest storage device to read the requested data and restoring the queue of write requests subsequent to causing the fastest storage device to read the requested data. In this manner, any write requests that are pending (incomplete) for the fastest storage device are set aside until the read request has been satisfied and the requested data has been read from the fastest storage device, and then the pending write requests are restored so that the fastest disk may continue processing those write requests.
  • Another example embodiment provides a computer-implemented method including receiving, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID). At least two of the storage devices are heterogeneous and have different read speeds, such that the read speed of at least one of the storage devices is faster than the read speed of at least another one of the storage devices. The method further includes waiting, by the processor and in response to receiving the request, until all pending requests to write data to the storage devices have completed and causing, by the processor, all of the storage devices to read the requested data after all pending requests to write the data to the storage devices have completed.
  • RAID redundant array of independent disks

Abstract

Techniques for using heterogeneous disks of dissimilar access speeds in a redundant disk array can be used to support growing the file-system while it is on-line. In general, such techniques are for use in a system that contains identical data across heterogeneous disks. In a RAID1 system having two disks, there is a data disk and a copy disk. The copy disk contains a copy of the data stored on the data disk. When more than two disks are used in a RAID1 system, the additional disks are used as copy disks instead of additional storage. The disclosed techniques improve the throughput of a heterogeneous array that contains disks with different speeds, such as an SSD and HDD combination. The read speeds can be increased by splitting up each of the data requests and utilizing parallel reads across more than one of the disks in the array.

Description

REDUNDANT DISK ARRAY USING HETEROGENEOUS DISKS
RELATED APPLICATION
[0001] This application claims the benefit of U.S. Provisional Patent Application Serial No. 62/251,401, filed November 5, 2015, which is hereby incorporated by reference in its entirety.
FIELD OF THE DISCLOSURE
[0002] This disclosure relates to the field of electronic data storage devices, and more particularly, to a redundant disk array configuration that optimizes array disk throughput using heterogeneous disks having different access speeds.
BACKGROUND
[0003] RAID (redundant array of independent disks) is a data storage virtualization technology that combines multiple physical disk drive components into a single logical unit for the purposes of data redundancy, performance improvement, or both. Some existing RAID configurations utilize disks having equal physical size. Thus, when a disk in a RAID array fails, it is typically replaced with a similarly sized disk. However, due to advancements in technology, disks are increasing in speed. Therefore, replacement disks may be faster than the disks they are replacing. However, with existing techniques, if a RAID array has disks of unequal speed, such arrays are constrained by the speed of the slowest disk in the array.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] The accompanying drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral.
[0005] Fig. 1 shows an example redundant disk array, in accordance with an embodiment of the present disclosure.
[0006] Fig. 2 shows the experimental results of performance tests on a RAID, in accordance with embodiments of the present disclosure. [0007] Fig. 3 shows the experimental results of performance tests on a RAID, in accordance with other embodiments of the present disclosure.
[0008] Fig. 4 shows an example redundant disk array, in accordance with another embodiment of the present disclosure.
[0009] Fig. 5 is a flow diagram of several example methodologies for reading data from redundant disk array using heterogeneous disks, in accordance with embodiments of the present disclosure.
[0010] Fig. 6 is a flow diagram of several other example methodologies for reading data from redundant disk array using heterogeneous disks, in accordance with embodiments of the present disclosure.
[0011] Fig. 7 is a flow diagram of another example methodology for reading data from redundant disk array using heterogeneous disks, in accordance with an embodiment of the present disclosure.
[0012] Fig. 8 is a block diagram representing an example computing device that may be used in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0013] When a RAID system has been set up with drives of equal size and a disk fails after several years of use, the failed disk can be replaced with another disk. However, new developments in solid-state disk (SSD) technology have given rise to much faster disks. Seek times in rotating disks consume most of the time to access data, but this is not a problem for SSDs as there are no moving parts. For instance, FLASH devices are constructed such that each memory chip can be accessed in parallel gives a higher data throughput.
[0014] The combination of RAID and SSDs have been studied with the goals of increased speed and decreased energy consumption. SSDs have been used as a cache device, storage of most commonly accessed blocks called I-CASH, or storing data on SSDs while the change logs are stored on hard disk drives (HDDs) called E-HASH.
[0015] To that end, disclosed herein are techniques for solving the problem of using heterogeneous disks of dissimilar access speeds in a redundant disk array. Furthermore, such techniques can be used to support growing the file-system while it is on-line. In general, embodiments of this disclosure are directed to techniques for use in a system that contains identical data across heterogeneous disks. In a RAIDl system having two disks, such as shown in Fig. 1, there is a data disk and a copy disk. The copy disk contains a copy of the data stored on the data disk. When more than two disks are used in a RAIDl system, the additional disks are used as copy disks instead of additional storage. As such, the total storage capacity of the array is the same as the storage capacity of the data disk, even if the disks have different capacities. The disclosed techniques improve the throughput of a heterogeneous array that contains disks with different speeds, such as an SSD and HDD combination. The read speeds can be increased by splitting up each of the data requests and utilizing parallel reads across more than one of the disks in the array. Numerous configurations and variations will be apparent in view of the present disclosure.
[0016] Some methods exist for combining the MTTF of multiple disks to decrease the likelihood of data loss, as well as speed up I/O operations, by combining the speed of drives together. This is generally referred to as Redundant Arrays of Inexpensive Disks (RAID). Most RAID concepts and techniques rely on the fact that all disks are identical (e.g., the disks have identical capacities). In some existing RAID configurations, each disk is split up into stripe units (SU). The stripe units are generally based on the block size of a disk. Stripe units that share the same logical location across different disks are grouped into a stripe, as shown in Fig. 1. Depending on the number of disks in the array, it is possible to achieve certain so- called RAID levels. The simplest level is RAIDO and RAIDl also known as striping and mirroring respectively. This is the simplest kind of RAID that can be implemented with as little as two drives. More advanced RAID levels are ones like RAID5 (single parity) and RAID6 (double parity). Parity refers to how redundant the algorithm is which determines how many disks can die before data loss occurs.
Mirroring RAID Algorithms
[0017] Various embodiments are implemented with a RAIDl heterogeneous disk array, where each disk contains an exact copy, or mirror, of a set of data, or in another disk array where some disks contain a subset of mirrored data while other disks contain a different subset of mirrored data. The total storage space available for data is equal to the size of the smallest disk in the array. With some existing techniques, the read and write accesses are spread evenly across all of the disks (e.g., the data disk and the copy disk of Fig. 1). When all the disks are identical, the algorithm that gives the best performance is one where each transaction is evenly divided among the disks in the array. However, in accordance with an embodiment of the present disclosure, when the RAID includes an assortment of different disks with different speeds and sizes (e.g., the data disk and the copy disk have different read speeds), disks with faster read operations (such as solid state disks or SSDs) can be used for reads, while the rest of the disks in the array are relieved of any read operations (i.e., write only except during data recovery). For example, if the copy disk has a faster read speed than the data disk, the copy disk can be used for at least some read operations even while the data disk is available for performing the read.
[0018] A RAID write algorithm in accordance with an embodiment of the present disclosure includes putting incoming data packet on a queue. A write instruction is issued to each disk in the array so that the data is written to all of the disks in the RAID. Once the packet is on the queue, the system begins to process the next transaction. The length of the queue is configured to be equal to the number of disks in the array. This guarantees that at least one disk has written the data packet before another is accepted onto the queue. The queue can be shared among all the disks, as it is unnecessary to create and maintain a queue per disk for each transaction. This reduces memory requirements as well.
[0019] While it is necessary to write the data to all disks to ensure data redundancy, it is only necessary to read the data from one of the disks. To this end, and in accordance with various embodiments, several different reading algorithms are disclosed below. These algorithms can increase the throughput of the RAID system by variously reading data from, for example, the fastest disk(s), from the idle disk(s), or the fastest idle disk(s). Asynchronous I/O is used where possible to further speed up the rate at which transactions complete.
Example Algorithms for Reading Data from a Heterogeneous Disk Array
[0020] At any given point in time during operation, each disk in the array can perform a read operation (reading data from the disk) or a write operation (writing data to the disk). If a disk is neither reading nor writing, the disk is considered idle. One example algorithm for reading data from a heterogeneous disk array, in accordance with an embodiment, uses only the fastest disk from the set of all idle disk for reads, provided that such a disk contains the requested data. If there are no idle disks available, the algorithm will wait for the first disk to become idle. A synchronous read is issued to the idle disk since it is not necessary to incur the overhead of an asynchronous command to assemble a separate thread to handle the request and then wait for that thread to complete.
[0021] Another example algorithm for reading data from a heterogeneous disk array, in accordance with an embodiment, makes use of any idle disks for a particular read, provided that such disks contain the requested data, by dividing up the packet into equal sizes and issuing each piece of the divided packet to the idle disks one at a time. When the read of each divided packet is complete, the next piece of the divided packet is issued to the next idle disk. [0022] Another example algorithm for reading data from a heterogeneous disk array, in accordance with an embodiment, is similar to the preceding example algorithm except that it uses asynchronous reads so that all the pieces of the divided packet are read in parallel from the idle disks, rather than one at a time. This algorithm will yield the best performance when the overhead of the asynchronous call is negligible compared to the actual read time. As the disks in the array have a larger range of speeds, performance begins to suffer and the asynchronous process becomes a significant cost. In such cases, it is best to revert back to the one-read-at-a-time algorithm and use a synchronous read.
[0023] Another example algorithm for reading data from a heterogeneous disk array, in accordance with an embodiment, exclusively uses the fastest disk in the array for all read transactions, provided that such a disk contains the requested data. In the case that the fastest disk is busy doing a write, the algorithm will clear the disk's queue of the remaining writes, execute the read, and then re-queue the writes that were cleared from the queue. There is an overhead cost to pausing the queued writes, but in terms of throughput in an array which has disks that have orders of magnitude difference in speeds, the disk that has the fastest disk will also likely have the fastest write speeds thus it will process the transactions in time with the slower disks. Theoretically this algorithm performs best when there is one disk that is much faster than all the rest.
[0024] Another example algorithm for reading data from a heterogeneous disk array, in accordance with an embodiment, emulates how RAIDl is implemented where all disks are identical (homogeneous). With this algorithm, all of the disks in the array are added to the queue of transactions for retrieving their respective piece of the data packet, provided that such disks contain the requested data. Since a particular disk may be busy with other writes, the read occurs after all the writes to that disk are complete. The transaction completes after the read finishes, which by proxy causes the queues to be empty at the end of each read. In a homogeneous system this is not a concern, but in a heterogeneous array it can cause great speed reductions.
[0025] Another example algorithm for reading data from a heterogeneous disk array, in accordance with an embodiment, uses a combination of techniques described above. When there are small transactions (such as a block or two) requested, it does not make sense to issue a read to multiple disks which will put the entire track into its cache. It is sufficient to have only the fastest disk do this. For example, when fewer than 8 blocks (32KB) or another user- specified amount of data is requested, then the fastest idle disk may be used, otherwise the data may be read from all idle disks in parallel. [0026] Another example algorithm for reading data from a heterogeneous disk array, in accordance with an embodiment, includes issuing a read data request to multiple (but not necessarily all) idle storage devices. Whichever disk responds with the results first is the one used for reading the requested data. This technique may be effective, for example, in applications such as multiple servers processing read data requests.
Experimental Setup
[0027] A test system consisted of a Dell PowerEdge 2900 with a Intel Xeon E5420 processor and 8GB of DDR2 RAM. The disks used in this test are shown in Table 1. The bandwidths in this table shows the measured read and write speeds of each disk for random I/O using the techniques described herein. For sequential access, the speeds are much faster.
Disk Read Bandwidth Write Bandwidth Size
Sda 13.5 MB/s 12.0 MB/s 74 GB
Sdb 17.1 MB/s 16.2 MB/s 148.5 GB
Sdc 17.2 MB/s 15.9 MB/s 148.5 GB
Sdd 25.0 MB/s 23.0 MB/s 297.5 GB
Sde 17.2 MB/s 16.0 MB/s 148.5 GB
Sdf 22.8 MB/s 21.0 MB/s 140.0 GB
Sdg 17.5 MB/s 16.1 MB/s 148.5 GB
Table 1 : Disks used in experiments
[0028] SSD storage was simulated using RAM. With Linux, it is possible to create files in RAM using the tmpfs kernel module. The file size used for experiments was 300MB to allow for multiple virtual disks. 2GB RAM files were tested in a small subset of experiments and gave similar results. Once these files are created, they can be mounted as a loopback filesystem to create a block device. The read and write bandwidths for such a setup are roughly 420MB/s and 390MB/s respectively.
[0029] For benchmarking, FIO was used as a test platform which is a tool that is designed to test various I/O. It is capable of creating various sequences of I/O that can be precisely controlled. In the conducted experiments, random requests were used to simulate disk requests from multiple sources. It is the type of workload that allows for the least amount of bandwidth due to the nature of traditional hard drives where seek times are usually what takes the longest.
[0030] In the main experiments, a stream of random 1MB reads and writes from 4 separate threads was generated. Since 2011, all sector sizes are 4096 bytes in length as opposed to 512 bytes of prior years. Since each read is 1MB in length, it allows for the possibility of small sequential data accesses to occur. To test large sequential access, the same test was run with 10MB data packet access. Another experiment was run with randomly sized data packets ranging from 1KB to 10MB in size.
[0031] The ratio of reads to writes is varied across the tests from 10% to 100% in increments of 10%. Each test is set up to run for 60 seconds. FIO issues transactions continuously until time has expired. The read/write bandwidth is determined by the data transferred in 60 seconds. Issuing a heavy stream of random traffic for a minute ensures that any cache of the disks is exhausted and gives a benchmark for a true performance of the algorithm. In the results, a 0% read was omitted from the graph, as it is a constant value in all test cases because only the read algorithm is being modified.
Experimental Results
[0032] In the experiments the addition of a significantly faster device improved the bandwidth of the array. The more reads that were performed, the closer the throughput was to the fastest disk. Since all the disks had to finish writing the data, writes didn't have as much of an improvement though caching did give a boost because it could keep the disk write queue full for slower disks.
Mirror with two disks
[0033] In Fig. 2, the experimental results of the six algorithms (discussed above) are compared, in accordance with embodiments of the present disclosure. In Fig. 2, these algorithms are labeled aiol through aio6. The left set of graphs show the stacked throughput of reads and writes using two slow disks. The disks used in the two slow disk test were SDA and SDD, the slowest and fastest magnetic drives. From this it is possible to visually see the gain in total throughput. There is a boost in speed as the ratio of reads to writes approaches all reads. Caching two transactions made it possible to have a big speed boost for writes. This is indicated, for example, on aio5, where for 10% reads there is a boost, but by 40% the boost is diminished.
[0034] With techniques that use the fastest idle disk (e.g., aiol), the throughput increases with the number of reads. The fastest idle disk is not always the same disk for every transaction. This algorithm gives a load leveling mechanism as a byproduct. In the rest of the algorithms, this is not as pronounced. The reason is that with some other techniques (e.g., aio2, aio3, and aio6), any on-disk write cache must be flushed before a read (in a different sector) can be executed. When all disks are read (e.g., aio5), the technique does not exhibit this behavior because transactions synchronous across all disks, meaning that each disk has the same amount of load put on it.
[0035] Looking at the slow and fast disk case, it is noticeable that the addition of a disk with many orders of magnitude speed can improve disk access. The algorithm where all idle disks are ready asynchronously (e.g., aio3) does not always reach full bandwidth potential because even though a disk is idle, it may not be beneficial to use it in a read. The algorithm where all disks are read (e.g., aio5) also splits up the request across the slower disk which amounts to additional performance loss.
[0036] The bottleneck in aio2 is caused by sequential reads, thus the speed of the reads is worse than that of the slowest disk in the array. In this case, asynchronous reads are preferable. Recall that writes are always asynchronous in all algorithms, so the only variance in an all-write load is the sequence of random addresses.
[0037] In tests where a larger sequential read/write was done, there was a marginal increase (15-35MB/s depending on the read to write ratio) in throughput in each of the tests with two slow and one slow one fast disks. While this is significant in the case of two slow disks as it nearly doubled the throughput on some of the algorithms, any configuration that had at least one fast disk did not achieve a noticeable speedup because of the much greater speeds.
Mirror with seven HDDs
[0038] While RAIDl is generally implemented on two disk systems, tests were run on a larger array to determine how each algorithm scales.
[0039] Fig. 3 shows a comparison between a 7 HDD array and one with two additional RAM drives added, in accordance with embodiments of the present disclosure. This larger system performed differently compared to the one with only two disks for all algorithms.
[0040] When two RAM drives are added to the array, there is a larger spread across the algorithms. Aio5 and aio4 show the best performance. Aio2 has a bottleneck since the array contains more disks that are slow thus bringing the average throughput further down.
[0041] Algorithms that use multiple disks benefit from the parallelism that the additional disks give even if the disks are slow. This can be seen, for instance, in aio3, aio5 and aio6.
Randomly sized accesses
[0042] The test runs with varying sized data packets exhibited an increase in throughput for each of the algorithms tested except for a 100% read load. The 100% reads were different because random non-sequential reads are slow. When writes dominate the disk access, the throughput approaches the throughput of the slowest disk in the array. When reads dominate, the speed approaches the combined speed of the faster disks in the array.
Overall results
[0043] The experiments show that it is possible to increase the throughput of an array with heterogeneous disks by modifying the access algorithm. In general, techniques that use the fastest disks in the array, whether idle or not, produce speed boosts when there is a much faster disk in the array. The two algorithms have superior performance even with two slow disks. When the seven disk array was examined, the algorithm where all disks are read performed best as it is able to use all disks concurrently. Recall that in the RAIDl algorithm the extra disks provide additional striped storage instead of redundancy, therefore this case would not exist in a true RAIDl array. When a fast disk was added, the technique that uses the fastest disk in the array shows the greatest speeds which suggests that there is some delay in waiting for reads to complete when using all disks in parallel.
[0044] Writes across the array show improvement because those disks that are slow are not utilized for the reads thus they have fewer transactions to perform which increases the overall array write bandwidth. This load balance minimizes the idle time of the disks in the array.
[0045] This work can be extended to other RAID levels that have redundancy such as RAID 10. Fig. 4 shows an example of such a layout where each group (Group A, B and C) is a RAIDl array, in accordance with an embodiment of the present disclosure. If the disks making up any group have different speeds, the overall speed of the array can be enhanced by strategically replacing disks. When each group contains a fast disk and a slow but reliable disk, it will create a faster and more practical system overall.
[0046] Another application of this work is for RAID5 and RAID6. In those cases, even though the data are not identical on all the disks, there is inherent redundancy which can be taken advantage of. Consider a four disk RAID configuration, where there are three data disks and one parity disk. For simplicity, assume that the RAID is level 4. A choice can be made as to which three disks are used to retrieve the data. Since the parity calculation is negligibly short, two data disks and one parity disk reads may be considered instead of simply reading the three data disks to get an entire stripe of data. An algorithm such as this would further improve systems where the data is stored on SSDs and parity on HDDs or NVRAM.
Example Methodologies
[0047] Fig. 5 is a flow diagram of several example methodologies for reading data from redundant disk array using heterogeneous disks, in accordance with embodiments of the present disclosure. Method 500 includes receiving 502, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID), where at least two of the storage devices are heterogeneous and have different read speeds. The method 500 further includes identifying 504, by the processor and in response to receiving the request, which of the storage devices are idle. The method 500 further includes causing 506, by the processor, one or more of the idle storage devices to read the requested data.
[0048] In an embodiment, the causing 506 of the one or more idle storage devices to read the requested data includes causing 508 only a fastest one of the idle storage devices to read the requested data, such as described above with respect to the aiol algorithm. [0049] In another embodiment, where two or more of the storage devices are simultaneously idle, the causing 506 of the one or more idle storage devices to read the requested data includes causing 510 each of the idle storage devices to read different portions of the requested data from each of the idle storage devices synchronously, such as described above with respect to the aio2 algorithm.
[0050] In yet another embodiment, where two or more of the storage devices are simultaneously idle, the causing 506 of the one or more idle storage devices to read the requested data includes causing 512 each of the idle storage devices to read different portions of the requested data from each of the idle storage devices asynchronously, such as described above with respect to the aio3 algorithm.
[0051] In yet another embodiment, the causing 506 of the one or more idle storage devices to read the requested data includes causing 514 only a fastest one of the idle storage devices to read the requested data, where fewer than 32 kilobytes of data are requested, otherwise causing each of the idle storage devices to read different portions of the requested data from each of the idle storage devices asynchronously, such as described above with respect to the aio6 algorithm.
[0052] Fig. 6 is a flow diagram of several other example methodologies for reading data from redundant disk array using heterogeneous disks, in accordance with embodiments of the present disclosure. Method 600 includes receiving 602, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID), where at least two of the storage devices are heterogeneous and have different read speeds. The method 600 further includes identifying 604, by the processor and in response to receiving the request, which of the storage devices is fastest. The method 600 further includes causing 606, by the processor, the fastest storage device to read the requested data, such as described above with respect to the aio4 algorithm.
[0053] In another embodiment, the method 600 further includes clearing 608 a queue of write requests prior to causing the fastest storage device to read the requested data, and restoring 610 the queue of write requests subsequent to causing the fastest storage device to read the requested data, such as further described above with respect to the aio4 algorithm.
[0054] Fig. 7 is a flow diagram of another example methodology for reading data from redundant disk array using heterogeneous disks, in accordance with an embodiment of the present disclosure. Method 700 includes receiving 702, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID), where at least two of the storage devices are heterogeneous and have different read speeds. The method 700 further includes waiting 704, by the processor and in response to receiving 702 the request, until all pending requests to write data to the storage devices have completed. The method 700 further includes causing 706, by the processor, all of the storage devices to read the requested data after all pending requests to write the data to the storage devices have completed, such as described above with respect to the aio5 algorithm.
Example Computing Device
[0055] Fig. 8 is a block diagram representing an example computing device 1000 that may be used to perform any of the techniques as variously described in this disclosure. For example, any of the algorithms (aiol through aio6, described above) or methodologies 500, 600, 700, may be implemented in the computing device 1000. The computing device 1000 may be any computer system, such as a workstation, desktop computer, server, laptop, handheld computer, tablet computer (e.g., the iPad™ tablet computer), mobile computing or communication device (e.g., the iPhone™ mobile communication device, the Android™ mobile communication device, and the like), or other form of computing or telecommunications device that is capable of communication and that has sufficient processor power and memory capacity to perform the operations described in this disclosure. A distributed computational system may be provided comprising a plurality of such computing devices.
[0056] The computing device 1000 includes one or more storage devices 1010 and/or non-transitory computer-readable media 1020 having encoded thereon one or more computer- executable instructions or software for implementing techniques as variously described in this disclosure. The storage devices 1010 may include a computer system memory or random access memory, such as a durable disk storage (which may include any suitable optical or magnetic durable storage device, e.g., RAM, ROM, Flash, USB drive, or other semiconductor-based storage medium), a hard-drive, CD-ROM, or other computer readable media, for storing data and computer-readable instructions and/or software that implement various embodiments as taught in this disclosure. The storage device 1010 may include other types of memory as well, or combinations thereof. The storage device 1010 may be provided on the computing device 1000 or provided separately or remotely from the computing device 1000. The non-transitory computer-readable media 1020 may include, but are not limited to, one or more types of hardware memory, non-transitory tangible media (for example, one or more magnetic storage disks, one or more optical disks, one or more USB flash drives), and the like. The non -transitory computer-readable media 1020 included in the computing device 1000 may store computer-readable and computer-executable instructions or software for implementing various embodiments. The computer-readable media 1020 may be provided on the computing device 1000 or provided separately or remotely from the computing device 1000.
[0057] The computing device 1000 also includes at least one processor 1030 for executing computer-readable and computer-executable instructions or software stored in the storage device 1010 and/or non-transitory computer-readable media 1020 and other programs for controlling system hardware. Virtualization may be employed in the computing device 1000 so that infrastructure and resources in the computing device 1000 may be shared dynamically. For example, a virtual machine may be provided to handle a process running on multiple processors so that the process appears to be using only one computing resource rather than multiple computing resources. Multiple virtual machines may also be used with one processor.
[0058] A user may interact with the computing device 1000 through an output device 1040, such as a screen or monitor, which may display one or more user interfaces provided in accordance with some embodiments. The output device 1040 may also display other aspects, elements and/or information or data associated with some embodiments. The computing device 1000 may include other I/O devices 1050 for receiving input from a user, for example, a keyboard, a joystick, a game controller, a pointing device (e.g., a mouse, a user's finger interfacing directly with a display device, etc.), or any suitable user interface. The computing device 1000 may include other suitable conventional I/O peripherals, such as a camera 1052. The computing device 1000 can include and/or be operatively coupled to various suitable devices for performing one or more of the functions as variously described in this disclosure.
[0059] The computing device 1000 may run any operating system, such as any of the versions of Microsoft® Windows® operating systems, the different releases of the Unix and Linux operating systems, any version of the MacOS® for Macintosh computers, any embedded operating system, any real-time operating system, any open source operating system, any proprietary operating system, any operating systems for mobile computing devices, or any other operating system capable of running on the computing device 1000 and performing the operations described in this disclosure. In an embodiment, the operating system may be run on one or more cloud machine instances.
[0060] In other embodiments, the functional components/modules may be implemented with hardware, such as gate level logic (e.g., FPGA) or a purpose-built semiconductor (e.g., ASIC). Still other embodiments may be implemented with a microcontroller having a number of input/output ports for receiving and outputting data, and a number of embedded routines for carrying out the functionality described in this disclosure. In a more general sense, any suitable combination of hardware, software, and firmware can be used, as will be apparent.
[0061] As will be appreciated in light of this disclosure, various modules and components can be implemented in software, such as a set of instructions (e.g., HMTL, XML, C, C++, object-oriented C, JavaScript, Java, BASIC, etc.) encoded on any computer readable medium or computer program product (e.g., hard drive, server, disc, or other suitable non-transient memory or set of memories), that when executed by one or more processors, cause the various methodologies provided in this disclosure to be carried out. As used in this disclosure, the terms "non-transient" and "non-transitory" exclude transitory forms of signal transmission. It will be appreciated that, in some embodiments, various functions performed by the user computing system, as described in this disclosure, can be performed by similar processors and/or databases in different configurations and arrangements, and that the depicted embodiments are not intended to be limiting. Various components of this example embodiment, including the computing device 1000, can be integrated into, for example, one or more desktop or laptop computers, workstations, tablets, smart phones, game consoles, set- top boxes, or other such computing devices. Other componentry and modules typical of a computing system, such as processors (e.g., central processing unit and co-processor, graphics processor, etc.), input devices (e.g., keyboard, mouse, touch pad, touch screen, etc.), and operating system, are not shown but will be readily apparent.
[0062] Numerous embodiments will be apparent in light of the present disclosure, and features described in this disclosure can be combined in any number of configurations. An example embodiment provides a computer-implemented method including receiving, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID). At least two of the storage devices on which the requested data are stored are heterogeneous and have different read speeds, such that the read speed of at least one of the storage devices is faster than the read speed of at least another one of the storage devices. The method further includes identifying, by the processor and in response to receiving the request, which of the storage devices are idle and causing, by the processor, one or more of the idle storage devices to read the requested data. In some cases, the requested data is read from only a fastest one (or ones) of the idle storage devices. If none of the storage devices are idle, then the method includes waiting until at least one of the storage devices is idle before causing the respective storage device(s) to read the requested data. In some cases, two or more of the storage devices are simultaneously idle, and the method includes causing, by the processor, each of the idle storage devices to read different portions of the requested data from each of the idle storage devices synchronously. In this manner, one portion of the requested data is read from one of the idle storage devices and another portion of the requested data is ready from a different one of the idle storage devices synchronously. In some cases, where two or more of the storage devices are simultaneously idle, and the method includes causing, by the processor, each of the idle storage devices to read different portions of the requested data from each of the idle storage devices asynchronously. In this manner, one portion of the requested data is read from one of the idle storage devices and another portion of the requested data is ready from a different one of the idle storage devices asynchronously. In some cases, the method includes causing, by the processor, only a fastest one of the idle storage devices to read the requested data where less than a user-specified amount of data are requested, otherwise causing, by the processor, each of the idle storage devices to read different portions of the requested data from each of the idle storage devices asynchronously.
[0063] Another example embodiment provides a computer-implemented method including receiving, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID). At least two of the storage devices are heterogeneous and have different read speeds, such that the read speed of at least one of the storage devices is faster than the read speed of at least another one of the storage devices. The method further includes identifying, by the processor and in response to receiving the request, which of the storage devices is fastest and causing, by the processor, the fastest storage device to read the requested data. In some cases, the method includes clearing a queue of write requests prior to causing the fastest storage device to read the requested data and restoring the queue of write requests subsequent to causing the fastest storage device to read the requested data. In this manner, any write requests that are pending (incomplete) for the fastest storage device are set aside until the read request has been satisfied and the requested data has been read from the fastest storage device, and then the pending write requests are restored so that the fastest disk may continue processing those write requests.
[0064] Another example embodiment provides a computer-implemented method including receiving, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID). At least two of the storage devices are heterogeneous and have different read speeds, such that the read speed of at least one of the storage devices is faster than the read speed of at least another one of the storage devices. The method further includes waiting, by the processor and in response to receiving the request, until all pending requests to write data to the storage devices have completed and causing, by the processor, all of the storage devices to read the requested data after all pending requests to write the data to the storage devices have completed.
[0065] The foregoing description and drawings of various embodiments are presented by way of example only. These examples are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Alterations, modifications, and variations will be apparent in light of this disclosure and are intended to be within the scope of the invention as set forth in the claims.

Claims

CLAIMS What is claimed is:
1. A computer-implemented method comprising:
receiving, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID), wherein at least two of the storage devices are heterogeneous and have different read speeds;
identifying, by the processor and in response to receiving the request, which of the storage devices are idle; and
causing, by the processor, one or more of the idle storage devices to read the requested data.
2. The method of claim 1, further comprising causing, by the processor, only a fastest one of the idle storage devices to read the requested data.
3. The method of claim 1, wherein two or more of the storage devices are simultaneously idle, and wherein the method further comprises causing, by the processor, each of the idle storage devices to read different portions of the requested data from each of the idle storage devices synchronously.
4. The method of claim 1, wherein two or more of the storage devices are simultaneously idle, and wherein the method further comprises causing, by the processor, each of the idle storage devices to read different portions of the requested data from each of the idle storage devices asynchronously.
5. The method of claim 1, further comprising causing, by the processor, only a fastest one of the idle storage devices to read the requested data where less than a user- specified amount of data are requested, otherwise causing, by the processor, each of the idle storage devices to read different portions of the requested data from each of the idle storage devices asynchronously.
6. A computer-implemented method comprising:
receiving, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID), wherein at least two of the storage devices are heterogeneous and have different read speeds;
identifying, by the processor and in response to receiving the request, which of the storage devices is fastest; and
causing, by the processor, the fastest storage device to read the requested data.
7. The method of claim 6, further comprising: clearing a queue of write requests prior to causing the fastest storage device to read the requested data; and restoring the queue of write requests subsequent to causing the fastest storage device to read the requested data.
8. A computer-implemented method comprising:
receiving, by a processor, a request to read data from at least one of a plurality of storage devices configured as a redundant array of independent disks (RAID), wherein at least two of the storage devices are heterogeneous and have different read speeds;
waiting, by the processor and in response to receiving the request, until all pending requests to write data to the storage devices have completed; and
causing, by the processor, all of the storage devices to read the requested data after all pending requests to write the data to the storage devices have completed.
PCT/US2016/060232 2015-11-05 2016-11-03 Redundant disk array using heterogeneous disks WO2017079373A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562251401P 2015-11-05 2015-11-05
US62/251,401 2015-11-05

Publications (1)

Publication Number Publication Date
WO2017079373A1 true WO2017079373A1 (en) 2017-05-11

Family

ID=58662928

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/060232 WO2017079373A1 (en) 2015-11-05 2016-11-03 Redundant disk array using heterogeneous disks

Country Status (1)

Country Link
WO (1) WO2017079373A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11042324B2 (en) * 2019-04-29 2021-06-22 EMC IP Holding Company LLC Managing a raid group that uses storage devices of different types that provide different data storage characteristics
CN114063908A (en) * 2021-10-23 2022-02-18 苏州普福斯信息科技有限公司 Hard disk read-write processing method and device based on RAID and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040037120A1 (en) * 2002-08-23 2004-02-26 Mustafa Uysal Storage system using fast storage devices for storing redundant data
US20040049643A1 (en) * 2002-09-06 2004-03-11 Guillermo Alavarez Storage system including a fast storage device for storing redundant data
US20050268062A1 (en) * 2002-01-21 2005-12-01 Hitachi, Ltd. Hierarchical storage apparatus and control apparatus thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050268062A1 (en) * 2002-01-21 2005-12-01 Hitachi, Ltd. Hierarchical storage apparatus and control apparatus thereof
US20040037120A1 (en) * 2002-08-23 2004-02-26 Mustafa Uysal Storage system using fast storage devices for storing redundant data
US20050086559A1 (en) * 2002-08-23 2005-04-21 Mustafa Uysal Storage system using fast storage devices for storing redundant data
US20040049643A1 (en) * 2002-09-06 2004-03-11 Guillermo Alavarez Storage system including a fast storage device for storing redundant data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11042324B2 (en) * 2019-04-29 2021-06-22 EMC IP Holding Company LLC Managing a raid group that uses storage devices of different types that provide different data storage characteristics
CN114063908A (en) * 2021-10-23 2022-02-18 苏州普福斯信息科技有限公司 Hard disk read-write processing method and device based on RAID and storage medium

Similar Documents

Publication Publication Date Title
US8930619B2 (en) Method and apparatus for efficiently destaging sequential I/O streams
US10831398B2 (en) Storage device efficiency during data replication
CN106462510B (en) Multiprocessor system with independent direct access to large amounts of solid-state storage resources
US9910700B2 (en) Migration between CPU cores
Moon et al. Introducing ssds to the hadoop mapreduce framework
Zhou et al. Optimizing virtual machine live storage migration in heterogeneous storage environment
US20180018096A1 (en) Balanced load distribution for redundant disk array
US9760293B2 (en) Mirrored data storage with improved data reliability
US10579540B2 (en) Raid data migration through stripe swapping
Wang et al. A new reliability model in replication-based big data storage systems
US11182202B2 (en) Migration between CPU cores
US10678431B1 (en) System and method for intelligent data movements between non-deduplicated and deduplicated tiers in a primary storage array
Koh et al. Faster than flash: An in-depth study of system challenges for emerging ultra-low latency SSDs
WO2016190893A1 (en) Storage management
US20120303891A1 (en) Writing of data of a first block size in a raid array that stores and mirrors data in a second block size
JP6154433B2 (en) Method and apparatus for efficiently destaging sequential input / output streams
WO2017079373A1 (en) Redundant disk array using heterogeneous disks
US10705853B2 (en) Methods, systems, and computer-readable media for boot acceleration in a data storage system by consolidating client-specific boot data in a consolidated boot volume
Kim et al. Flash-Conscious Cache Population for Enterprise Database Workloads.
US10705733B1 (en) System and method of improving deduplicated storage tier management for primary storage arrays by including workload aggregation statistics
US11055001B2 (en) Localized data block destaging
US20160077747A1 (en) Efficient combination of storage devices for maintaining metadata
US11513902B1 (en) System and method of dynamic system resource allocation for primary storage systems with virtualized embedded data protection
Caldwell Improving block-level efficiency with scsi-mq
US9645745B2 (en) I/O performance in resilient arrays of computer storage devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16862923

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16862923

Country of ref document: EP

Kind code of ref document: A1