WO2016190891A1 - Translate data operations based on data stripe size - Google Patents

Translate data operations based on data stripe size Download PDF

Info

Publication number
WO2016190891A1
WO2016190891A1 PCT/US2015/041217 US2015041217W WO2016190891A1 WO 2016190891 A1 WO2016190891 A1 WO 2016190891A1 US 2015041217 W US2015041217 W US 2015041217W WO 2016190891 A1 WO2016190891 A1 WO 2016190891A1
Authority
WO
WIPO (PCT)
Prior art keywords
array
data
operations
dataset
write
Prior art date
Application number
PCT/US2015/041217
Other languages
French (fr)
Inventor
Anil Kumar BOOGARAPU
Pranav SWAROOP
Subhakar VIPPARTI
James Michael Reuter
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Publication of WO2016190891A1 publication Critical patent/WO2016190891A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0661Format or protocol conversion arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Various examples described herein provide for translating a first set of input/output (IO) operations to write a dataset to an array of data storage devices, to a second set of 10 operations to write a dataset to the array, based on a data stripe associated with the array. Each IO operation in the second set of IO operations may be such that it writes a portion of the dataset within boundaries of an individual data stripe on the array.

Description

TRANSLATE DATA OPERATIONS BASED ON DATA STRIPE SIZE
BACKGROUND
[1] For a computer system, the data storage subsystem underlying a file system can determine the read performance, write performance, and data redundancy of the file system. Oftentimes, a computer system may improve these (and other) aspects of a file system by using an array of data storage devices, such as a redundant array of independent disks (RAID) array, as part of its data storage subsystem.
BRIEF DESCRIPTION OF THE DRAWINGS
[2] Certain examples are described in the following detailed description in reference to the following drawings.
[3] FIGs. 1 and 2 are block diagrams illustrating example data operation systems for translating a data operation based on data stripe size according to the present disclosure.
[4] FIG. 3 is a block diagram illustrating an example computer system including an example data operation system according to the present disclosure.
[5] FIG. 4 is a flow diagram illustrating an example data flow in an example computing environment that includes an example data operation system according to the present disclosure.
[6] FIG. 5 is a block diagram illustrating an example computer system for translating a data operation based on data stripe size according to the present disclosure.
[7] FIGs. 6 and 7 are flowcharts illustrating example methods performed by an example computer system to facilitate data operation translation based on data stripe size according to the present disclosure.
DETAILED DESCRIPTION
[8] With an array of data storage devices, such as a redundant array of independent disks (hereinafter, "RAID") array, a computer system can aggregate a plurality of data storage devices into a one or more logical storage devices while abstracting specific details relating those data storage devices, in this way, a RAID array can permit an operating system (OS) or a file system on a computer system to perform data access (e.g., write or read) operations in generic terms, while the RAID array implements those operations as specific input/output data operations (hereinafter, also referred to as input/output [iO] operations or lOOPs) issued to an array of data storage devices associated with the RAID array. An example IOOP can include, without limitation, a data storage operation commonly referred to as an IOP. As a result, RAID arrays and the like can present a given array of data storage devices to an OS or a file system as an aggregated logical storage device while abstracting details of the actual data storage devices underlying the array. However, such data storage abstraction results in an OS or a file system working in generic terms with the array of data storage devices, which can cause issuance of lOOPs that may not be optimal for a given data storage device underlying the array. For instance, the data size of a particular IOOP issued to an array of data storage devices may not align with an optimal data size associated with the array, such as an optimal data stripe size (e.g., optimal RAID stripe size).
[9] As used herein, a data storage device can include a hard disk drive (HDD), a solid-state drive (SSD), or some other non-volatile memory device. An array of data storage devices include one configured as a RAID array and having a RAID-level configuration, such as RAID-level 5 configuration.
[10] Additionally, as used herein, with respect to an array of data storage devices, a data stripe can refer to a data unit of storage on the array, where the data unit is made of a plurality of data slices (also known as data strips) and each data slice is stored on an individual data storage device in the array. An example of a data stripe can include a data stripe of a RAID array (hereinafter, a RAID stripe). A data stripe size (also known as data stripe width) can refer to the data size of an individual data stripe stored on an array of data storage devices, and a data slice size (also known as data strip size) can refer to the data size of an individual data slice of an individual data stripe. For instance, a given data stripe may have a data stripe size of 256 kB, and data slices of the given data stripe may have a data slice size of 64 kB (e.g., where the data slices are stored across four individual data storage devices of the array). The data stripes associated with an array of data storage devices can share the same data stripe size, and the data slices of an individual data stripe can share the same data slice size, thereby dividing the individual data stripe into equal- sized data slices. As also used herein, an input/output operation (IOOP) may include a data write operation and a data read operation issued to an array of data storage devices to perform.
[11] Various examples described herein provide translation of a set of input/output (IO) operations for an array of data storage devices based on information regarding the array of data storage devices, such as a data stripe size utilized by the array (e.g., RAID data stripe size). According to some examples, a first set of IO operations (lOOPs), to write a dataset to the array of data storage devices, is translated to a second set of lOOPs to write the dataset to the array based on the data stripe size associated with the array. Based on the data stripe size, the translation may be such that each IOOP in the second set of lOOPs writes a portion of the dataset within the boundaries of an individual data stripe on the array. In this way, each IOOP in the second set of lOOPs can avoid writing data across a data stripe boundary, which can lead to improved data performance with respect to the array of data storage devices. For example, by not writing data across data stripe boundaries, some examples can assist an array of data storage devices, such as a RAID array (e.g., RAID- level 5 configuration, having data striping and parity), in avoiding read-mod ify- write penalties when writing data to the array.
[12] In addition to writing a portion of the data within the boundaries of an individual data stripe, each IOOP in the second set of lOOPs may have a data size that utilizes a full data stripe to the extent possible. Accordingly, an IOOP from the second set writing data writing to an individual data stripe may have a data size matching the size of the individual data stripe and may be aligned to write data within the boundaries of the individual data stripe, thereby ensuring the IOOP utilizes a full data stripe. Where an individual data stripe is partially utilized (i.e., partially empty), an IOOP from the second set writing data to the individual data stripe (e.g., writing from the middle of the individual data stripe) may have a data size matching the remaining data space within the individual data stripe and may be aligned to write data within the remaining data space, thereby ensuring the IOOP fully utilizes remainder of the individual data stripe. [13] Various examples achieve improved data performance of an array of data storage devices by a file system (e.g., one based on ext4 [fourth extended file system] or XFS) without needing to specify certain characteristics of the array, such as data stripe size, to the file system (e.g., when the file system is initially created). Accordingly, certain file systems capable of concurrently utilizing a plurality of different arrays of data storage devices (e.g., btrfs [B-tree file system], utilizing a large number of different RAID local devices organized in a single "pool") can issue input/output operations (lOOPs) to the different arrays as described herein, ensure that the issued lOOPs match the individual characteristics of the different arrays, and do so without need of specifying the data stripe size to the certain file system (e.g., during its initial creation). Additionally, some examples can re-determine characteristics of the array of data storage devices (e.g., periodically or under different conditions) to ensure that the lOOPs issued to the array account for any changes to the array (e.g., on demand changes to the RAID configuration that changes a data stripe size associated with the array). In this way, various examples can account for the sensitivity of different arrays to characteristics of lOOPs they receive (e.g., IOOP data size in comparison to data stripe size). Such sensitivity for an array may depend on, for example, the caching algorithm utilized by that array.
[14] Some examples may be utilized in a computing environment where a software application requests a dataset be written to a file system. Without taking into account the data characteristics of an array of data storage devices (e.g., which may be configured to operate as a set of RAID logical storage devices), the file system may translate the request into a first set of !OOPs to write the dataset to an array of data storage devices. As a result, the first set of lOOPs may include an IOOP having a data size larger than a data stripe size associated with the array of data storage devices, or an IOOP that writes data across a data stripe boundary (e.g., does not align with the boundaries of an individual data stripe). According to some examples, information regarding characteristics of the array of data storage devices may be obtained from the array of data storage devices and, based on the obtained information, the first set of lOOPs may be translated into a second set of iOOPs to write the dataset to the array of data storage devices. As described herein, the information regarding characteristics of the array of data storage devices can describe a data stripe size associated with the array (e.g., data stripe size utilized by the RAID array). Accordingly, based on the information, each IOOP in the second set of lOOPs may be sized and aligned to match the data stripe of the array of data storage devices.
[15] The following provides a detailed description of examples illustrated by FIGs. 1-7.
[16] FIG. 1 is a block diagram illustrating an example data operation system 100 for translating a data operation based on data stripe size according to the present disclosure. As shown, the data operation system 100 includes an array characteristics module 102 and an operation translation module 104. Depending on the example, the data operation system 100 may be part of a computer system, such as a desktop, laptop, hand-held computing device (e.g., personal digital assistants, smartphones, tablets, etc.), workstation, server, or other device that includes a processor, in particular examples, the data operation system 100 may work in conjunction with and may be part of a file system operating on the computer system. In various examples, the components or the arrangement of components in the data operation system 100 may differ from what is depicted in FIG. 1.
[17] As used herein, modules and other components of various examples may comprise, in whole or in part, machine-readable instructions or electronic circuitry. For instance, a module may comprise computer-readable instructions executable by a processor to perform one or more functions in accordance with various examples described herein. Likewise, in another instance, a module may comprise electronic circuitry to perform one or more functions in accordance with various examples described herein. The elements of a module may be combined in a single package, maintained in several packages, or maintained separately.
[18] The array characteristics module 102 may facilitate determining a set of characteristics associated with an array of data storage devices, including a data stripe size, array configuration (e.g., RAID-level configuration), data slice size, and the like. For some examples, the array characteristics module 102 determines the data stripe size associated with the array of data storage devices, may do so by directly querying the array for such information. For some examples, the determination of the data stripe size may occur before a first set of lOOPs is translated to a second set of iOOPs by the data operation system 100 and, more specifically, may occur before the first set of IOOPs is received by the data operation system 100. Depending on the example, the data stripe size may be determined by the array characteristics module 102 periodically, when a first set of IOOPs is detected by the array characteristics module 102 (e.g., detected as being generated by a file system or as being received by the data operation system 100), or when the array characteristics module 102 detects a change to the array (e.g., configuration change), or when some other condition is satisfied.
[19] The operation translation module 104 may facilitate translating a first set of IOOPs for an array of data storage devices to a second set of IOOPs for the array, where each of the first set of iO operations and the second set of iO operations is to write a dataset to the array. The first set of IOOPs may originate from a file system, which generates the first set of IOOPs in response to a software application or an operating system requesting the file system to write data to the array of data storage devices (e.g., page cache flush by the operation system). Additionally, the operation translation module 104 may translate the first set of IOOPs to the second set of IOOPs, based on the set of characteristics obtained by the array characteristics module 102 from the array of date storage devices. For some examples, translating the first set of IOOPs to the second set of IOOPs may be based on a data stripe size including in the set of characteristics associated with the array. Based on the data stripe size, the operation translation module 104 may generate the second set of IOOPs such that each IOOP in the second set writes a portion of a dataset within the boundaries of an individual data stripe on the array. Further, the operation translation module 104 may generate the second set of IOOPs in this manner while the first set of IOOPs includes at least one IOOP that writes a portion of the dataset across a boundary between two data stripes on the array. Using the data stripe size, the operation translation module 104 can further ensure that each IOOP in the second set of IOOPs aligns with (and thus utilizes) a full data stripe or whatever data space remains in a partially-utilized data stripe. Eventually, the second set of IOOPs resulting from the operation translation module 104 may be issued to the array of the data storage devices, in place of the first set of lOOPs, for execution by the array (e.g., to cause the dataset to be written to the array).
[20] Depending on the example, the first set of lOOPs may be received by the operation translation module 104 in series and the second set of iOOPs may be generated in series. Where translation results in the second set of IOOPs being generated in series, individual IOOPs of the second set may be issued to the array of data storage devices as they are generated. For some examples, where the first set of IOOPs writes the dataset across a first set of data stripes on the array starting at a first data stripe, the second set of IO operations writes the dataset across a second set of data stripes on the array that also starts at the first data stripe, but where the first set is different from the second set.
[21] In certain examples, the operation translation module 104 translates the first set of IOOPs to the second set of IOOPs as follows. The operation translation module 104 first determines what first set of data stripes on the array of data storage devices the first set of IOOPs is intending to write the dataset and may further determine whether the first set of IOOPs will begin writing the dataset at the start of a data stripe (i.e., start at a data stripe boundary). If not, a first IOOP is generated for the second set of IOOPs such that the first IOOP will start writing the dataset at the same start point as the first set of IOOPs but will have a data size that ensures that it only writes data up to the next (nearest) data stripe boundary. Next, for any portion of the dataset remaining to be written by the second set of IOOPs after the first IOOP of the second set, such remaining portion is divided into as many intermediate IOOPs of the second set as possible where each intermediate IOOP writes, and has a data size equal to, a full data stripe. Next, any portion of the dataset still remaining to be written by the second set of IOOPs after the first IOOP and the intermediate IOOPs of the second set would be smaller than a full data stripe and, thus, a last IOOP is generated for the second set to write the still remaining portion of the dataset to a single data stripe. This last IOOP would partially utilize the single data stripe, as its data size would be smaller than a data stripe size.
[22] FIG. 2 is a block diagram illustrating an example data operation system 200 for translating a data operation based on data stripe size according to the present disclosure. As shown, the data operation system 200 includes an operation receiver module 202, an array characteristics module 204, and an operation translation module 206. Depending on the example, the data operation system 200 may be part of a computer system. In particular examples, the data operation system 200 may work in conjunction with and may be part of a file system operating on the computer system. In various examples, the components or the arrangement of components in the data operation system 200 may differ from what is depicted in FIG. 2.
[23] The operation receiver module 202 may facilitate receiving a first set of lOOPs to write a dataset, such as a file or a portion of a file, to an array of data storage devices. The first set of lOOPs may be generated by a file system in response to a request, from an operating system or a software application (e.g., through the operating system), to write the dataset to the file system. A request to write the dataset to the file system may include a page cache flush to the file system by an operating system, which may be the result of a software application writing to the page cache of the operating system.
[24] The array characteristics module 204 and the operation translation module 206 may be respectively similar to the array characteristics module 102 and the operation translation module 104 described above with respect to the data operation system 100 of FIG. 1.
[25] FIG. 3 is a block diagram illustrating an example computer system 300 including the example data operation system 200 according to the present disclosure. The computer system 300 may be any computing device having a processor, such as a desktop, laptop, hand-held computing device (e.g., personal digital assistants, smartphones, tablets, etc.), workstation, or server. As shown, the computer system 300 includes an application module 302, an operating system module 304, a file system module 306 including the data operation system 200, and a data storage module 308. In various examples, the components or the arrangement of components in the computer system 300 may differ from what is depicted in FIG. 3.
[26] The application module 302 represents any firmware or software (e.g., software application or operating system) operable on the computer system 300 and capable of accessing (e.g., read from, write to, or modify) a file stored on a file system. For some examples, the application module 302 opens a file from a file system when the application module 302 wishes to access the file and, eventually, may close the file once the application module 302 has completed its access of the file. Additionally, the application module 302 may write a dataset to a file system when the application module 302 wishes to write some or all of a file to the file system, or modify some or all of a file stored on the file system.
[27] The operating system module 304 may comprise an operating system (OS), such as MICROSOFT WINDOWS, and the like. The operating system module 304 may manage resources of the computer system 300, and may provide the application module 302 with common computing services, such as page caching and memory mapped access of files. Through the operating system module 304, the application module 302 may read data from, write data to, or modify data stored on a file system accessible by the computer system 300.
[28] The file system module 306 may provide the computer system 300 and its various components, such as the application module 302 and the operating system module 304, with local access to a file system, which may be implemented using an array of data storage devices, such a RAID array. The data storage devices included by the array may comprise a hard disk drive (HDD), a solid-state drive (SSD), or the like. A file system accessed through the file system module 306 may include one based on NTFS (Net Technology File System), FAT (File Allocation Table), ext4 (fourth extended file system), XFS, btrfs (B-tree file system), or the like. Additionally, the file system accessed through the file system module 306 may be one locally implemented on the computer system 300, or remotely implemented on another computer system accessible to the computer system 300 (e.g., over a communication network, the Internet, a wireless network, a wired local area network, or the like). While a local file system may be implemented using a local array of data storage devices included by the computer system 300, a remote file system may be implemented using an array of data storage system included by a remote computer system. In some examples, the file system module 306 implements a local file system on the computer system 300.
[29] As shown, the file system module 306 includes the data operation system 200, which performs various operations described herein on the computer system 300. For instance, with respect to a particular file system (local or remote) accessed through the file system module 306, the data operation system 200 may determine a set of characteristics associated with an array of data storage devices that implements the file system. The data operation system 200 may receive a first set of lOOPs generated by the file system module 306 in response to the application module 302 or the operation system 304 accessing the particular file system through the file system module 306. As described herein, the first set of iOOPs may include at least one IOOP that writes data across a boundary between two data stripes of the array of data storage devices. The data operation system 200 may translate the first set of IOOPs to a second set of IOOPs based on the set of characteristics, where each IOOP in the second set of IOOPs write data within boundaries of an individual data stripe. Subsequently, the second set of IOOPs may be issued to the array of data storage devices.
[30] The data storage module 308 may facilitate local access to an array of data storage devices accessible to the computer system 300, such an array that is included by the computer system 300. A local file system accessed through the file system module 306 may cause the file system module 306 to generate a first set of IOOPs, the first set of IOOPs may be translated to a second set of IOOPs by the data operation system 200, and the second set of IOOPs may be issued to the local array of data storage devices through the data storage module 308.
[31] FIG. 4 is a flow diagram illustrating an example data flow in an example computing environment 400 that includes the data operation system 200 according to the present disclosure. As shown, the computing environment 400 includes an application module 402, a file system module 406, a data operation system 200, a data storage module 408, and an array of data storage devices 410. For some examples, the modules 402, 404, 406, and 408 may be respectively similar to modules 302, 304, 306, and 308 of the computer system 300 described with respect to FIG. 3.
[32] During an operation in the computing environment 400, the application module 402 may request the file system module 406 to write a dataset to a local file system implemented by the file system module 406 using the array of data storage devices 410. initially, this request through the file system module 406 may cause the dataset to be written to a page cache being maintained by the operating system module 404. The dataset may remain in the page cache until the operating system module 404 decides to write out (flush) a dataset from the page cache (page cache dataset) to the array of data storage devices 410 (e.g., when the page cache is full). For some examples, the operating system module 404 is unaware of details regarding the array of data storage devices 410, and requests the file system module 406 to write the page cache dataset to the local file system.
[33] in response to the request from the file system module 406 to write the page cache dataset to the local file system, the file system module 406 may generate a first set of lOOPs 412 that would cause the page cache dataset to be written to the array of data storage devices 410. Consider for example where the array of data storage devices 410 comprises a RAID array having a RAID- level 5 configuration with seven data storage devices used for data storage and one data storage device used for data parity. Consider further that the page cache dataset to be written to the array of data storage devices 410 may have a data size (416) of 3072 kB. in view of this example, the first set of lOOPs 412 generated by the file system module 406 may comprise six lOOPs each having a data size of 512 kB, where at least one of the six lOOPs may write the 512 kB across a boundary between two data stripes of the array of data storage devices 410.
[34] The data operation system 200 may determine a set of characteristics associated with the array of data storage devices 410, including a data stripe size 422 (e.g., 448 kB data stripe including 64 kB data slices) associated with data stripes 424 of the array of data storage devices 410. The data operation system 200 may achieve this determination by querying the array of data storage devices 410 for its associated data stripe size, and then receiving a response to the query. The data operation system 200 may submit the query and receive the response through the data storage module 408. For some examples, the data operation system 200 utilizes an interface function, such as bdev_io_opt() for LINUX, to fetch the data stripe size of the array of data storage devices 410, among other characteristics of the array.
[35] Eventually, the data operation system 200 may receive the first set of lOOPs 412. The data operation system 200 may translate the first set of lOOPs 412 to a second set of lOOPs 414 that would cause the page cache data to be written to the array of data storage devices 410, where each IOOP of the second set of lOOPs writes a portion of the page cache dataset within the boundaries of an individual data stripe. Additionally, each IOOP in the second set of lOOPs 414 may have a data size that permits the IOOP to align with available space in a partially-utilized data stripe or align with a full data stripe.
[36] As shown in FIG. 4, the first IOOP in the second set of lOOPs 414 has a data size 418a that causes the first IOOP aligns with a starting position 420a in a series of data stripes and a boundary of a data stripe boundary 426a. As described herein, the second set of lOOPs may write the page cache dataset across a range of data stripes starting at the same data stripe as the range of data stripes on the array of data storage devices 410 that the first set of lOOPs is to write the page cache dataset.
[37] As further shown, middle lOOPs in the second set of lOOPs 414 have a data size 418b that causes each middle IOOP to align with a full data stripe. As also shown, the last IOOP in the second set of lOOPs 414 has a data size 418c sufficient to write the remainder of the page cache dataset to the array of data storage devices 410. In FIG 4, the individual lOOPs of the second set of lOOPs 414 are shown to write data within the boundaries of individual data stripes.
[38] As described herein, the second set of lOOPs 414 may be generated in series by the data operation system 200 and then issued to the array of data storage devices 410 as generated.
[39] FIG. 5 is a block diagram illustrating an example computer system 500 for translating a data operation based on data stripe size according to the present disclosure. As shown, the computer system 500 includes a computer- readable medium 502, a processor 504, and a data storage interface 506. In various examples, the components or the arrangement of components of the computer system 500 may differ from what is depicted in FIG. 5. For instance, the computer system 500 can include more or less components than those depicted in FIG. 5.
[40] The computer-readable medium 502 may be any electronic, magnetic, optical, or other physical storage device that stores executable instructions. For example, the computer-readable medium 502 may be a Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, or the like. The computer-readable medium 502 can be encoded to store executable instructions that cause the processor 504 to perform operations in accordance with various examples described herein. In various examples, the computer-readable medium 502 is non-transitory. As shown in FIG. 5, the computer-readable medium 502 includes data stripe size determination instructions 510, first set of input/output operations translation instructions 512, second set of input/output operations issuance instructions 514.
[41] The processor 504 may be one or more central processing units (CPUs), microprocessors, or other hardware devices suitable for retrieval and execution of one or more instructions stored in the computer-readable medium 502. The processor 504 may fetch, decode, and execute the instructions 510, 512 and 514 to enable the computer system 500 to perform operations in accordance with various examples described herein. For some examples, the processor 504 includes one or more electronic circuits comprising a number of electronic components for performing the functionality of one or more of the instructions 510, 512, and 514.
[42] The data storage interface 506 may facilitate access to an array of data storage devices by the computer system 500 and its various components. Depending on the example, the array of data storage devices may be configured as a RAID array and the date storage interface 506 may include a RAID controller that facilitates data operations with respect to the array. Additionally, the array of data storages may be included as part of the computer system 500 or may be separate from the computer system 500 (e.g., accessible to the computer system 500 over a communications network).
[43] The data stripe size determination instructions 510 may cause the processor 504 to determine a data stripe size associated with an array of data storage devices. The processor 504 may determine the data stripe size by querying the array of data storage devices through the data storage interface 506. The first set of input/output operations translation instructions 512 may cause the processor 504 to translate, based on the data stripe size, a first set of lOOPs for the array to a second set of lOOPs for the array, where each of the first set of lOOPs and the second set of lOOPs is to write a dataset to the array. At least one IO operation in the first set of lOOPs may write a portion of the dataset across a boundary between two data stripes on the array, while each IOOP in the second set of !OOPs may write a portion of the dataset within the boundaries of an individual data stripe on the array. The second set of input/output operations issuance instructions 514 may cause the processor 504 to issue the second set of lOOPs to the array in place of the first set of lOOPs. The processor 504 may issue the second set of lOOPs to the array of data storage devices through the data storage interface 506.
[44] FIG. 6 is a flowchart illustrating an example method 600 performed by an example computer system to facilitate data operation translation based on data stripe size according to the present disclosure. Although execution of the method 600 is described below with reference to the data operation system 200 of FIG. 2, execution of the method 600 by other suitable systems or devices may be possible. The method 600 may be implemented in the form of executable instructions stored on a computer-readable medium or in the form of electronic circuitry.
[45] in FIG. 6, the method 600 may begin at block 602, with the operation receiver module 202 receiving a first set of input/output (IO) operations to write a dataset to an array of data storage devices. The method 600 may continue to block 604, with the array characteristics module 204 determining a data stripe size associated with the array of data storage devices. The method 600 may continue to block 606 with the operation translation module 206 translating, based on the data stripe size determined at block 604, the first set of input/output (IO) operations to a second set of IO operations to write the dataset to the array of data storage devices.
[46] FIG. 7 is a flowchart illustrating an example method 700 performed by an example computer system to facilitate data operation translation based on data stripe size according to the present disclosure. Although execution of the method 700 is described below with reference to the data operation system 200 of FIG. 2, execution of the method 700 by other suitable systems or devices may be possible. The method 700 may be implemented in the form of executable instructions stored on a computer-readable medium or in the form of electronic circuitry. [47] In FIG. 7, the method 700 may begin at block 702 and may continue to blocks 704 and 706, which may be respectively similar to blocks 602, 604, and 606 of the method 600 as described above with respect to FIG. 6. The method 700 may continue to block 708, with the second set of input/output (IO) operations produced at block 706 being issued to the array of data storage devices (e.g., by the operation translation module 206). The method 700 may continue to block 710, with the operation receiver module 202 receiving a third set of input/output (IO) operations to write a second dataset to the array of data storage devices.
[48] The method 700 may continue to block 712, with the array characteristics module 204 re-determining the data stripe size associated with the array of data storage devices, thereby producing an updated data stripe size. The array characteristics module 204 may re-determine the data stripe size upon a configuration change of the array of data storage devices (e.g., change in RAID configuration) or some other condition being satisfied. The method 700 may continue to block 714 with the operation translation module 206 translating, based on the updated data stripe size re-determined at block 712, the third set of input/output (IO) operations to a fourth set of IO operations to write the second dataset to the array of data storage devices. Blocks 712 and 714 represent an instance where a data stripe size is automatically redetermined and accounted for in the translation IOOP process described herein.
[49] In the foregoing description, numerous details are set forth to provide an understanding of the subject disclosed herein. However, various examples may be practiced without some or all of these details. Some examples may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.

Claims

1. A data operation system, comprising:
an array characteristics module to determine a data stripe size associated with an array of data storage devices; and
an operation translation module to translate, based on the data stripe size, a first set of input/output (IO) operations for the array to a second set of IO operations for the array, wherein each of the first set of IO operations and the second set of IO operations is to write a dataset to the array, wherein at least one IO operation in the first set of IO operations is to write a portion of the dataset across a boundary between two data stripes on the array, and wherein each IO operation in the second set of IO operations is to write a portion of the dataset within boundaries of an individual data stripe on the array.
2. The data operation system of claim 1, wherein the array characteristics module is to determine the data stripe size associated with the array by obtaining from the array a set of characteristics of the array, and wherein the set of characteristics includes information regarding the data stripe size.
3. The date operation system of claim 1, wherein the array is configured as a redundant array of independent disks (RAID) array.
4. The data operation system of claim 1, wherein the individual data stripe is a partially-utilized data stripe.
5. The data operation system of claim 1, wherein the first set of IO operations is to write the dataset across a first set of data stripes on the array starting at a first data stripe, the second set of IO operations is to write the dataset across a second set of data stripes on the array starting at the first data stripe, and the first set is different from the second set.
6. The data operation system of claim 1, comprising an operation receiver module to receive the first set of IO operations, wherein the first set of IO operations is generated by a file system.
7. A non-transitory computer readable medium having instructions stored thereon, the instructions being executable by a processor of a computer system, the instructions causing the processor to:
determine a data stripe size associated with an array of data storage devices;
translate, based on the data stripe size, a first set of input/output (IO) operations for the array to a second set of IO operations for the array, wherein each of the first set of 10 operations and the second set of IO operations is to write a dataset to the array, wherein at least one IO operation in the first set of IO operations is to write a portion of the dataset across a boundary between two data stripes on the array, and wherein each IO operation in the second set of IO operations is to write a portion of the dataset within the boundaries of an individual data stripe on the array; and
issue the second set of IO operations to the array in place of the first set of IO operations.
8. The non-transitory computer readable medium of claim 7, wherein the first set of IO operations is to write the dataset across a first set of data stripes on the array starting at a first data stripe, the second set of IO operations is to write the dataset across a second set of data stripes on the array starting at the first data stripe, and the first set is different from the second set.
9. A method, comprising:
receiving a first set of input/output (IO) operations for an array of data storage devices, wherein the first set of IO operations is to write a dataset to the array;
determining a data stripe size associated with the array; and
translating, based on the data stripe size, the first set of IO operations to a second set of IO operations to write the dataset to the array, wherein at least one IO operation in the first set of IO operations is to write a portion of the dataset across a boundary between two data stripes on the array, and wherein each iO operation in the second set of IO operations is to write a portion of the dataset within the boundaries of an individual data stripe on the array.
10. The method of claim 9, wherein receiving the first set of iO operations comprises:
receiving a request to write the dataset to the array, wherein the request comprises a page cache flush request; and
translating the request to the first set of IO operations.
11. The method of claim 9, wherein determining the data stripe size associated with the array comprises obtaining from the array a set of characteristics of the array, and wherein the set of characteristics includes information regarding the data stripe size.
12. The method of claim 9, comprising
issuing the second set of IO operations to the array;
receiving a third set of IO operations to write a second dataset to the array;
re-determining the data stripe size associated with the array to produce an updated data stripe size; and
translating, based on the updated data stripe size, the third set of IO operations to a fourth set of IO operations to write the second dataset to the array, wherein at least one IO operation in the third set of IO operations is to write a portion of the second dataset across a boundary between two data stripes on the array, and wherein each IO operation in the fourth set of IO operations is to write a portion of the second dataset within the boundaries of an individual data stripe on the array.
13. The method of claim 9, wherein the array is configured as a redundant array of independent disks (RAID) array.
14. The method of claim 9, wherein the individual data stripe is a partially- utilized empty data stripe.
15. The method of claim 9, wherein the first set of IO operations is to write the dataset across a first set of data stripes on the array starting at a first data stripe, the second set of IO operations is to write the dataset across a second set of data stripes on the array starting at the first data stripe, and the first set is different from the second set.
PCT/US2015/041217 2015-05-23 2015-07-21 Translate data operations based on data stripe size WO2016190891A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN2585CH2015 2015-05-23
IN2585/CHE/2015 2015-05-23

Publications (1)

Publication Number Publication Date
WO2016190891A1 true WO2016190891A1 (en) 2016-12-01

Family

ID=57392203

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/041217 WO2016190891A1 (en) 2015-05-23 2015-07-21 Translate data operations based on data stripe size

Country Status (1)

Country Link
WO (1) WO2016190891A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021238284A1 (en) * 2020-05-24 2021-12-02 苏州浪潮智能科技有限公司 Distributed file system input output pre-reading method and apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030023818A1 (en) * 2001-07-30 2003-01-30 Archibald John E. System and method for improving file system transfer through the use of an intelligent geometry engine
EP1347369A2 (en) * 2002-03-21 2003-09-24 Network Appliance, Inc. Method for writing contiguous arrays of stripes in a raid storage system
US20070113008A1 (en) * 2003-04-26 2007-05-17 Scales William J Configuring Memory for a Raid Storage System
US20130091237A1 (en) * 2005-09-13 2013-04-11 Ambalavanar Arulambalam Aligned Data Storage for Network Attached Media Streaming Systems
US20140229658A1 (en) * 2013-02-14 2014-08-14 Lsi Corporation Cache load balancing in storage controllers

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030023818A1 (en) * 2001-07-30 2003-01-30 Archibald John E. System and method for improving file system transfer through the use of an intelligent geometry engine
EP1347369A2 (en) * 2002-03-21 2003-09-24 Network Appliance, Inc. Method for writing contiguous arrays of stripes in a raid storage system
US20070113008A1 (en) * 2003-04-26 2007-05-17 Scales William J Configuring Memory for a Raid Storage System
US20130091237A1 (en) * 2005-09-13 2013-04-11 Ambalavanar Arulambalam Aligned Data Storage for Network Attached Media Streaming Systems
US20140229658A1 (en) * 2013-02-14 2014-08-14 Lsi Corporation Cache load balancing in storage controllers

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021238284A1 (en) * 2020-05-24 2021-12-02 苏州浪潮智能科技有限公司 Distributed file system input output pre-reading method and apparatus

Similar Documents

Publication Publication Date Title
US10423361B2 (en) Virtualized OCSSDs spanning physical OCSSD channels
US8239584B1 (en) Techniques for automated storage management
US9229870B1 (en) Managing cache systems of storage systems
US8966476B2 (en) Providing object-level input/output requests between virtual machines to access a storage subsystem
JP7116381B2 (en) Dynamic relocation of data using cloud-based ranks
US8639876B2 (en) Extent allocation in thinly provisioned storage environment
US9323682B1 (en) Non-intrusive automated storage tiering using information of front end storage activities
US9256382B2 (en) Interface for management of data movement in a thin provisioned storage system
CN108604162B (en) Method and system for providing access to production data for application virtual machines
US20140279911A1 (en) Data storage and retrieval mediation system and methods for using same
US20190243758A1 (en) Storage control device and storage control method
US11199990B2 (en) Data reduction reporting in storage systems
US10936243B2 (en) Storage system and data transfer control method
US10678431B1 (en) System and method for intelligent data movements between non-deduplicated and deduplicated tiers in a primary storage array
US10346077B2 (en) Region-integrated data deduplication
US10089125B2 (en) Virtual machines accessing file data, object data, and block data
US11366601B2 (en) Regulating storage device rebuild rate in a storage system
US10635330B1 (en) Techniques for splitting up I/O commands in a data storage system
US10705733B1 (en) System and method of improving deduplicated storage tier management for primary storage arrays by including workload aggregation statistics
US11429318B2 (en) Redirect-on-write snapshot mechanism with delayed data movement
US20160077747A1 (en) Efficient combination of storage devices for maintaining metadata
WO2016190891A1 (en) Translate data operations based on data stripe size
US11513902B1 (en) System and method of dynamic system resource allocation for primary storage systems with virtualized embedded data protection
US11662949B2 (en) Storage server, a method of operating the same storage server and a data center including the same storage server
US11315028B2 (en) Method and apparatus for increasing the accuracy of predicting future IO operations on a storage system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15893526

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15893526

Country of ref document: EP

Kind code of ref document: A1