US20170228156A1 - Raid set initialization - Google Patents

Raid set initialization Download PDF

Info

Publication number
US20170228156A1
US20170228156A1 US15/438,561 US201715438561A US2017228156A1 US 20170228156 A1 US20170228156 A1 US 20170228156A1 US 201715438561 A US201715438561 A US 201715438561A US 2017228156 A1 US2017228156 A1 US 2017228156A1
Authority
US
United States
Prior art keywords
raid
data
format
raid set
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/438,561
Inventor
Larry Fenske
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Silicon Graphics International Corp
Original Assignee
Silicon Graphics International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Silicon Graphics International Corp filed Critical Silicon Graphics International Corp
Priority to US15/438,561 priority Critical patent/US20170228156A1/en
Publication of US20170228156A1 publication Critical patent/US20170228156A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0632Configuration or reconfiguration of storage systems by initialisation or re-initialisation of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Definitions

  • the present invention generally relates to initializing a redundant array of independent disks (RAID).
  • RAID redundant array of independent disks
  • the invention enables a RAID set to service requests immediately after the RAID set is described.
  • Redundant arrays of independent disks commonly referred to as RAID, must be initialized prior to use.
  • a RAID set includes two or more data storage devices used to store data redundantly and increase data storage performance.
  • the initialization of a RAID set requires zeros to be written to each data storage location contained within each of the data storage devices in the RAID set.
  • each individual data storage device in a RAID set commonly contains many Terabytes of data. Since the process of writing zeros to a one Terabyte drive takes days, there is a large overhead associated with initializing a RAID set. Delays associated with waiting for RAID set initialization increases both direct and indirect costs of running a data center. Direct costs include the cost of space and power used during initialization. Indirect costs include not being able to use or lease data storage space on the RAID set when the RAID set is being initialized. Costs associated with initializing a RAID set would be reduced if the RAID set could act as an initialized RAID set immediately after it was described.
  • RAID levels examples include RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5, RAID 6, and RAID 10.
  • RAID 0 includes two or more data storage devices where each data storage device is written with the data evenly distributed across the data storage devices in equal sized chunks, without redundancy. These chunks of data are referred to as stripes. Each consecutive stripe is written to a different data storage device until all of the data storage devices contain a first stripe. After each data storage device has been written with a first stripe, a second set of stripes are written to the RAID set. This process is repeated until all data is written to the RAID set.
  • RAID 0 is typically used when one wishes to increase the input/output performance of a data storage system.
  • RAID 1 is a mirror, wherein two or more data storage devices contain identical data. Essentially a second data storage device is a copy of a first data storage device.
  • RAID 1 provides a simple form of data redundancy.
  • RAID 2 like RAID 0, stripes data across a plurality of data storage devices. RAID 2, however, stripes at the bit level, not in larger chunks.
  • RAID 2 also includes Hamming code error correction.
  • RAID 3 combines data striping with parity, wherein parity data is stored on a dedicated data storage device. Parity allows the data to be reconstructed when a single data storage device fails using a simple XOR function.
  • RAID 3 stripes data across the data storage devices at the byte level (8 bits).
  • RAID 4 uses data striping, and also uses a data storage device dedicated to storing parity data. RAID 4 stripes data at the block level. In most data storage devices today, data blocks, also known as logical blocks, contain 512 bytes. RAID 5 uses block level data striping combined with parity distributed across all of the data storage devices in the RAID set.
  • RAID 6 is similar to RAID 5, yet includes a second set of parity information distributed across the data storage devices in the RAID set.
  • RAID 10 combines striping and mirroring, it is a combination of RAID 0 and RAID 1.
  • the present system allows system administrators to immediately use a RAID set after describing it.
  • a software driver is used to intercept commands targeting the RAID set, and to respond to those commands as if the RAID set were initialized, even when it is not.
  • the invention described herein in certain embodiments,-supports any desired RAID level.
  • the method of the invention will frequently begin after an operator describes a RAID configuration and initiates the initialization of the RAID set.
  • a RAID configuration includes a RAID level, and a set of physical or virtual data storage devices. Once the operator has chosen a RAID level, the operator assigns data storage devices to the RAID set. Then the operator will initiate a command to initialize the RAID set.
  • the command to initialize the RAID set will typically be intercepted by a driver (or other software) in the software stack of the computer system. Once the driver has intercepted the RAID initialization command, the driver will assemble and store information that describes the initialization state of the RAID set.
  • the driver will look (snoop) for commands targeting the RAID set. Commands targeting the RAID set will be intercepted when they are identified (snooped).
  • the command is a read command to an uninitialized portion of the RAID set, i.e. when the read command targets a portion of the RAID set that may not contain valid RAID set data, the data returned will be zeros.
  • the driver (or other software) will respond to the command by sending zeros for all stripes or logical blocks requested by the read command to the initiator of the read command.
  • the driver may also perform background tasks, such as writing zeros to the uninitialized portions of the RAID set.
  • the driver When the command is a write command to an uninitialized portion of the RAID set, the driver will intercept the write command and write that data to the RAID set. When the write command does not span all data areas or logical blocks in a particular RAID stripe, the driver will write zeros to the data areas or logical blocks in the RAID stripe that were not written by the write command.
  • the command When the command is a read command or a write command to an initialized portion of the RAID set, the command will be passed through to the RAID set. Commands targeting the RAID set will typically be sent to the RAID set by an initiator, frequently the initiator is a computer writing data to or reading data from the RAID set. An initiator consistent with the invention is not limited to a computer, however.
  • the initiator can be another data storage device or set of data storage devices.
  • the small computer system interface (SCSI) standard allows a data storage device to act as an initiator of commands to another data storage device.
  • the initiator in the presently claimed invention is not limited to a computer, a host, a data storage device, a set of data storage devices, a personal computer, a server, a mobile device, or a network appliance acting as an initiator.
  • the RAID set is incrementally initialized with each write to the RAID set, or by using writes in the background.
  • the driver performs the incremental initialization of the RAID set, the driver maintains a record of which stripes or logical blocks are not initialized. Typically, this record will be maintained in a data structure stored in a non-volatile memory. This non-volatile memory typically will reside in the uninitialized portions of the RAID set.
  • a first data storage device in the RAID set may already store data that the operator wishes to save in a RAID configuration. In these instances the data contained in the first data storage device will have to be converted from a non-RAID configuration to a RAID configuration. Data targeted for conversion from a non-RAID configuration to a RAID configuration may be referred to in this disclosure as “valid uninitialized data”, or “data to be converted” to a RAID level.
  • valid uninitialized data or “data to be converted” to a RAID level.
  • the instant initialization of the RAID set includes: creating information describing an initialization state of a RAID set corresponding to an intercepted RAID set initialization command, intercepting a data transfer command to the RAID set, and transmitting data to the initiator of the data transfer command when the command is a read command.
  • a read command is a read to an uninitialized portion of the RAID set, and when the read command targets a portion of the RAID set that does not contain valid RAID set data, the data returned will include zeros.
  • the driver when the data transfer command is a write command to an uninitialized portion of the RAID set, the driver (or other software) will write that data to two data storage devices. In other instances, data will be written in complete stripes to a plurality of data storage devices.
  • the presently claimed invention is not limited to initializing a single stripe or single set of logical blocks. Thus, the presently claimed invention may initialize any number of stripes or sets of logical blocks and be consistent with the presently claimed invention.
  • FIG. 1 is a block diagram of a Linux software stack.
  • FIG. 2 is a flow chart of a method of the initialization of a RAID set.
  • FIG. 3 is a flow chart of a method of the initialization of the RAID set as the initialization occurs incrementally over time.
  • FIG. 4A illustrates two data storage devices in an un-initialized RAID 1 level.
  • FIG. 4B illustrates the two data storage devices shown in FIG. 4A , with data copied from the first data storage device to the second data storage device.
  • FIG. 4C illustrates the two data storage devices shown in FIG. 4B after a write command has been received to an LBA.
  • FIG. 5A illustrates a first data storage device containing valid uninitialized data and two other data storage devices that do not contain valid data.
  • FIG. 5B illustrates a RAID stripe written in stripes to three data storage devices.
  • FIG. 5C illustrates a second RAID stripe written in stripes to the same data storage devices shown in FIG. 5B after a write to a portion of a stripe in the RAID set.
  • the present system allows system administrators to immediately use a RAID set after describing it.
  • a software driver is used to intercept commands targeting the RAID set, and to respond to those commands as if the RAID set were initialized, even when it is not.
  • Certain implementations are directed to a RAID set, wherein each of the data storage devices are contained within a single enclosure.
  • Other implementations of the invention are directed to a RAID set, wherein each of the data storage devices may be located in a plurality of different physical locations.
  • Software may be used to make a RAID set appear to consist of a set of data storage devices contained in a single enclosure, even when they are not.
  • the present system may initialize the RAID set using background tasks.
  • a background task is an operation or process that occurs without an operator being aware that the operation or process is being performed.
  • certain background initialization tasks may be paused when the RAID set is operating above an operational threshold.
  • Such an operational threshold includes yet is not limited to the number of input/output requests per second (IOPS) sent to a RAID set.
  • the system may begin after an operator describes a RAID configuration and initiates the initialization of a RAID set.
  • a RAID configuration includes a RAID level, and a set of physical or virtual data storage devices. Once the operator has chosen a RAID level, the operator assigns data storage devices to the RAID set. Then the operator will initiate the initialization of the RAID set.
  • the presently claimed invention is not limited to initializing a single stripe or single set of logical blocks. Thus, the presently claimed invention may initialize any number of stripes or sets of logical blocks and be consistent with the presently claimed invention.
  • FIG. 1 is a block diagram of a Linux software stack.
  • User space 1 including an application layer 1 A is depicted in the figure.
  • User space 1 may include a collection of software programs. Typically, these software programs are applications that are not part of the operating system software.
  • Kernel space 2 is located at a lower level of the software stack. Kernel space 2 contains multiple different software programs, when combined; these programs comprise the operating system software of a computer.
  • Kernel space 2 includes a system call interface 2 A, a protocol agnostic interface 2 B, network protocols 2 C, a device agnostic interface 2 D, and device drivers 2 E.
  • the physical device hardware 3 exists below all of the software levels.
  • an operator manipulates an application program in user space 1 , the application program communicates with one or more of the software modules in kernel space 2 , and the software modules in kernel space 2 communicate with physical device hardware 3 of the computer system or RAID set.
  • the initialization command may be intercepted by a driver in the computer system's software stack. While not depicted in FIG. 1 , this driver may be located either in a lower level of user space 1 or within kernel space 2 . The driver is located at a level below a RAID set configuration application program, and above certain software modules in kernel space 2 . Conventionally, software that initializes RAID sets operates in kernel space 2 . An example of such a software program in kernel space 2 is the multiple disk administrator utility (MDADM) in the Linux software operating system.
  • MDADM multiple disk administrator utility
  • FIG. 2 is a flow chart of a method of the initialization of a RAID set.
  • a driver may look (i.e., snoop) for a RAID set initialization command at step 200 . When the driver does not find a RAID set initialization command, the method stays in step 200 . When the driver encounters a RAID set initialization command, the method proceeds to step 201 where the driver intercepts the command. Once the driver has intercepted a RAID initialization command, the driver will assemble information that describes the initialization state of the RAID set. Information describing the initialization state of the RAID set may include information identifying stripes that are currently not initialized, stripes that have already been initialized, or both.
  • a driver or software consistent with the presently claimed invention uses this information to identify areas in a RAID set that need to be initialized. In certain instances these areas may be identified by stripe number, in other instances they may be identified by logical block (LBA) number.
  • LBA logical block
  • the method then moves to step 202 , where it assembles and stores RAID set initialization information in a non-volatile memory.
  • the RAID set initialization information may indicate that the entire RAID set is not initialized.
  • the driver then begins looking for (i.e., intercepting) commands directed at the RAID set at step 203 .
  • the method moves to step 204 where the driver determines if the command is a read command addressing an uninitialized portion of the RAID set.
  • the driver sends zeros to the initiator of the read command 205 .
  • Program flow then moves to step 206 where the RAID set initialization information is updated and stored in the non-volatile memory.
  • the driver evaluates if the command is a write command to an uninitialized area of the RAID set at step 207 .
  • the command is a write command to an uninitialized portion of the RAID set
  • data corresponding to the write command is written to the RAID set at step 208 .
  • a write to a RAID set is not a write to an entire stripe. In these instances, the driver may also write zeros to any portions of the stripe not addressed by the write command.
  • data associated with the write command is written to data blocks addressed by the write command, and zeros are written to data blocks in the stripe not addressed by the write command.
  • Program flow then updates and stores the RAID set initialization information 206 in the non-volatile memory.
  • the driver checks to see if the RAID set initialization process is complete 210 . If initialization is complete at step 210 , the method ends at step 211 . Typically at step 211 , the driver or software will stop intercepting commands addressing the RAID set. When the RAID set is not completely initialized, program flow moves from step 210 to step 203 where the driver snoops for more RAID set commands.
  • Step 207 is where any command that is not related to an uninitialized area of the RAID set is performed.
  • Commands performed at step 209 may be commands relating to initialized areas of the RAID set, or other types of commands that may be sent the RAID set.
  • the driver performing the RAID set initialization may pass commands performed at step 209 through to another software module without change.
  • FIG. 2 does not show background writes of zeros to un-initialized stripes of the RAID set.
  • Various embodiments of the invention performing these background writes of zeros will also update and store RAID set initialization information in the non-volatile memory.
  • the updating of RAID set initialization information in the non-volatile memory will typically be performed after a background write of zeros is performed, or after an initiator writes to an un-initialized portion of the RAID set.
  • the description of FIG. 2 above, is exemplary, it is not intended to limit the invention. While the use of a driver is described above, the steps of FIG. 2 may be performed using one or more other software programs.
  • the driver or software may maintain a record of what stripes or logical blocks are initialized, instead of those that are not initialized. Since, the driver or software will typically be aware of the total number of stripes or logical blocks that are contained in the RAID set, the driver or software operating on a processor can determine which stripes or logical blocks are not initialized when it has a record of the logical blocks that are initialized. Similarly, the driver or software can determine which stripes or logical blocks are initialized when the driver or software has a record of the logical blocks that are not initialized, when the driver or software is aware of the total number of stripes or logical blocks that are contained within the RAID set.
  • FIG. 3 is a flow chart of a method of the initialization of the RAID set as the initialization occurs incrementally over time.
  • the uninitialized RAID set initially includes stripes 0-100. After a first background write, stripes 1-100 are uninitialized at step 301 . After a first user write to RAID stripe 50, RAID stripes 1-49, & 51-100 remain uninitialized at step 302 .
  • the uninitialized stripes include stripes 25-49, & 51-100.
  • stripes 25-40, 42-49, 51-60, and 62-100 are uninitialized.
  • the initialization is complete 305 .
  • the driver may stop intercepting commands to the RAID set.
  • Information that describes the initialization state of the RAID set may be stored in different types of memory and memory locations. For example, information may be stored in the uninitialized stripes of the RAID set, in a dedicated location within a data storage device, in a memory within a RAID box, in a data storage device that is external to the RAID set, or combination thereof.
  • Types of non-volatile memory where the information that describes the initialization state of the RAID set may be stored include, yet are not limited to: a disk drive, a flash drive, flash memory, phase change memory, resistive memory (RERAM), ferroelectric memory (FERAM), battery backed up dynamic random access memory, or racetrack memory.
  • Data storage devices in the RAID set itself may also consist of different types of memory that include, yet are not limited to: disk drives, flash drives, flash memory, phase change memory, resistive memory (RERAM), ferroelectric memory (FERAM), battery backed up dynamic random access memory, or racetrack memory.
  • RAM resistive memory
  • FERAM ferroelectric memory
  • racetrack memory battery backed up dynamic random access memory
  • the driver may be required to change where the information is stored. For example, when this information is stored in stripe 100, and stripe 100 is written to by a user, the driver will move the information to another stripe. In such an embodiment, the driver will typically track the stripe number where the information is stored. The driver may also maintain a pointer to that stripe in a small piece of non-volatile memory. When a write to the last remaining stripe is intercepted, the driver may assign the last remaining stripe the last remaining stripe number, and then write data to the stripe.
  • the driver may not allow a user level application program to use all stripes that the RAID set contains.
  • the RAID set may allow a user to access stripes 0-99, and use stripe 100 for storing the information.
  • the present system may change the mapping of the stripes of the RAID set. When changing the mapping, the driver could map stripes 1-100 as user stripes 0-99 and use the original stripe 0 to store the information.
  • the information describing the initialization state of the RAID set may be stored in a reserved area of a data storage device.
  • the driver (or other software) may artificially change the maximum logical block of a data storage device in the RAID set.
  • the driver may also reserve logical blocks above the artificial maximum logical block for storing the initialization information.
  • disk drives and FLASH drives can be programmed to report a maximum logical block number that is below the real maximum logical block in the drive.
  • Logical blocks on these drives can be accessed by software that is aware that the data storage devices capacity was truncated.
  • the reserved area of the data storage device is accessed using special commands.
  • Most user level programs will not be aware that a drives capacity has been truncated.
  • Such programs would not be able to access the logical blocks in the reserved area because they would not be aware that the reserved area exists. Truncating the capacity of a disk drive in this way is commonly referred to as de-stroking the drive.
  • An example of special commands used to truncate the capacity of a data storage device include the Read Native Max Capacity and the Set Max Address commands from the data storage advanced technology attachment (ATA) specification.
  • First the total native capacity of a disk drive is determined using the Read Native Max Capacity command.
  • the capacity of the drive may be artificially set to a capacity less than the total capacity of the drive by sending the drive a Set Max Address command with a maximum LBA number.
  • the ATA Set Max Address command causes the drive to enter the maximum LBA number provided with the command into a table inside of the drive.
  • the drive will now service commands addressing LBA0 through the maximum LBA number provided with the command. At this time, conventional commands attempting to access an LBA beyond this artificial maximum LBA will cause drive to respond with an error message.
  • LBAs located above this artificial maximum LBA may be accessed by sending the drive another Set Max Address command with a maximum LBA corresponding to the Native Max Capacity of the drive.
  • the driver or software will typically intercept and stall commands addressing the RAID set.
  • the driver or software accesses LBAs above the artificial maximum LBA, the driver or software once again can truncate the capacity of the drive using the Set Max Address command.
  • a driver or software consistent with embodiments of the invention may access information located above the artificial maximum LBA when other programs or initiators cannot.
  • the example above is for exemplary purposes, it is not intended to limit the scope of the invention described herein.
  • a first data storage device selected by the operator to be included in a RAID set may already store data that the operator wishes to save in a RAID configuration.
  • the data contained in the first data storage device will have to be converted from a non-RAID configuration to a RAID configuration.
  • the data conversion process When converting to a RAID 1 level, the data conversion process will include copying data from the first data storage device to a second data storage device.
  • the conversion process When converting to a RAID level that uses striping, the conversion process will include a series of other background tasks.
  • the conversion process includes copying the data stored on the data storage device to another data storage device
  • any write commands targeting the RAID set will be written to the first data storage device and to a second data storage device. Any read commands sent to the RAID set will be read from the first data storage device. At this time the logical blocks read from the first data storage device may be written to the second data storage device.
  • the drive will store RAID set initialization information to the non-volatile memory.
  • the RAID set initialization information may include information identifying the data areas or logical blocks that are different from, or identical to data areas or logical blocks on the first data storage device.
  • FIG. 4A shows two data storage devices in an un-initialized RAID 1 level (copy).
  • the two data storage devices 401 & 402 are in an un-initialized RAID 1 level (copy) configuration.
  • Each of these data storage devices ( 401 & 402 ) include logical blocks 0-9,where each logical block has a logical block address 0-9 (LBA 0-9). All of the LBAs (0-9) in data storage device 401 in FIG. 4A contain valid uninitialized data VUD stored in a non-RAID configuration. Since LBAs 0-9 on data storage device 402 in FIG. 4A do not contain valid data (initialized or uninitialized), they are depicted as containing no data ND.
  • Copying data from one data storage device to another comprises reading one or more LBAs on a first data storage device 401 , and writing that data to a second data storage device 402 .
  • the process of copying may be performed in a background task, or may be performed when a read command to the RAID set is received.
  • FIG. 4B shows the same two data storage devices shown in FIG. 4A , with data copied from the first data storage device to the second data storage device.
  • LBAS 0-1 have been copied from data storage device 401 to data storage device 402 .
  • FIG. 4B shows the same two data storage devices shown in FIG. 4A , with data copied from the first data storage device to the second data storage device.
  • LBAS 0-1 have been copied from data storage device 401 to data storage device 402 .
  • FIG. 4B depicts LBAS 0-1 data storage device 401 and data storage device 402 containing initialized data InD. These LBAs have been initialized to RAID 1 level. At this time, LBAs 2-9 on data storage device 401 contain valid uninitialized data VUD, and LBAS 2-9 on data storage device 402 contain no data ND.
  • FIG. 4C shows two data storage devices shown in FIG. 4B after a write command has been received to an LBA.
  • LBA 6 is written to both data storage devices 401 & 402 .
  • LBAS 2-5, & 7-9 contain valid uninitialized data VUD on data storage device 401
  • LBAS 2-5, & 7-9 on data storage device 402 contain no data ND; LBAS 2-5, & 7-9, therefore do not contain initialized data.
  • FIG. 4C also depicts, LBAs 0-1 & 6 in both data storage devices 401 & 402 , containing initialized data InD.
  • background tasks include reading data on the first data storage device, and writing that data in a series of stripes to the drives in the RAID set.
  • the driver When a write occurs to an uninitialized portion of the RAID set, the driver will intercept the write command and write stripes of data to the RAID set. When such a write command does not include a complete stripe, additional data will be read from the first drive, and that data will be written to the plurality of data storage devices in the RAID set in complete stripes.
  • the RAID set may still contain un-initialized areas.
  • Any un-initialized areas of the RAID set may be initialized using the read and write processes reviewed above, including processes like those reviewed in FIG. 2 .
  • read requests targeting data residing in uninitialized areas on the first data storage device will be read from the first data storage device.
  • the data read from the first data storage device may then be written to the RAID set in complete stripes.
  • FIG. 5A depicts a first data storage device containing valid uninitialized data and two other data storage devices that do not contain valid data.
  • the first data storage device 501 contains valid uninitialized data VUD in LBAS 0 - 8 (these LBAS contain valid data in an uninitialized RAID state).
  • FIG. 5A also includes two other data storage devices 502 & 503 . LBAS in data storage device 502 & 503 do not contain valid data (initialized or uninitialized); these LBAs are depicted as containing no data ND.
  • FIG. 5B depicts a RAID stripe written in stripes to three data storage devices.
  • RAID stripe 0 is written in stripes to data storage devices 501 , 502 , and 503 .
  • RAID stripe 0 was written after reading LBAs 0-2 on data storage device 501 in one or more background tasks.
  • LBA 0 in data storage device 501 contains stripe 0A
  • LBA 0 in data storage device 502 contains stripe 0B
  • LBA 0 in data storage device 503 contains stripe 0C.
  • LBAS 3-8 in data storage device 501 contain valid uninitialized data VUD.
  • LBAS 1-8 on data storage devices 502 and 503 in the figure, contain no data ND (these LBAS do not contain data belonging to the RAID set).
  • LBAS 1-2 on data storage device 501 are depicted as containing moved data MD. Data from these LBAS has already been moved (i.e., migrated) to RAID stripe 0.
  • FIG. 5C shows a second RAID stripe written in stripes to the same data storage devices shown in FIG. 5B after a write to a portion of a stripe in the RAID set.
  • RAID stripe 2 is written to LBA 6 on data storage devices 501 , 502 , & 503 .
  • data storage device 501 contains stripe 2A
  • data storage device 502 contains stripe 2B
  • data storage device 503 contains stripe 2C.
  • data stripe 2 was created after receiving a write to LBA 6 , a portion of data stripe 2.
  • data stripe 2 should include data stored corresponding to uninitialized LBAs 6-8 on data storage device 501 .
  • the steps performed when creating RAID stripe 2 are essentially a read modify write. In this instance, these steps include: receiving a write command targeting LBA 6, reading LBAs 7-8 on data storage data storage device 501 , combining data from the write command to LBA 6 with data read from LBAs 7-8, and then writing stripe 2 to LBA 6 on each of the data storage devices 501 , 502 , & 503 .
  • LBAs in FIG. 5C that do not contain valid uninitialized data or initialized data are depicted as containing no data ND.
  • LBAs 1-2 & 7-8 on data storage device 501 are depicted as containing moved data MD. This data has already been moved (i.e., migrated) to RAID stripes 0 or 2.
  • LBAs 1-5 & 7-8 on data storage devices 502 & 503 are depicted as containing no data ND.
  • mapping LBAs on a data storage device that contains valid initialized data may include sequentially associating LBA number to stripe number.
  • An example of a simple sequential mapping of LBA to stripe number is where the LBA number corresponds to stripe number.
  • mapping of LBAs to stripes in FIG. 5C is not sequential.
  • stripe 0 is mapped to LBA 0 on each of the three data storage devices
  • stripe 2 is mapped to LBA 6 on each of the three data storage devices.
  • translation information could be maintained or referenced by the driver (or other software) as the RAID set is initialized. This translation information, in some instances may be stored in a translation table.
  • Yet other implementations may include a first data storage device containing valid uninitialized data that needs to be migrated to a RAID set that does not contain the first data storage device.
  • background tasks may include reading data on the first data storage and writing that data in complete stripes to the RAID set.
  • a write command targeting data on the first data storage device is received, that data will be written to the RAID set.
  • additional data may be read from the first drive such that complete stripes can be written to the RAID set.
  • Any uninitialized areas of the RAID set may also be initialized using the read and write processes reviewed above, including processes like those reviewed in FIG. 2 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the presently claimed invention enable a RAID set to appear as if it were initialized immediately after a command to initialize a RAID set is initiated. Typically, a driver or other software in the software stack intercepts the command to initialize the RAID set. The driver then responds to user application programs as if the RAID set initialization is complete, even when it is not. After intercepting the RAID set initialization command, the driver will intercept and respond to data read or write commands as if the RAID set were initialized. The driver or other software will then, typically initialize the RAID set using background tasks. In certain instances, data stored in a non-RAID configuration may be migrated to a RAID configuration during the initialization process.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • The present application is a continuation and claims the priority benefit of U.S. patent application Ser. No. 14/164,045 filed Jan. 24, 2014, the disclosure of which is incorporated herein by reference.
  • BACKGROUND OF THE INVENTION
  • Field of the Invention
  • The present invention generally relates to initializing a redundant array of independent disks (RAID). In particular, the invention enables a RAID set to service requests immediately after the RAID set is described.
  • Description of the Related Art
  • Redundant arrays of independent disks, commonly referred to as RAID, must be initialized prior to use. A RAID set includes two or more data storage devices used to store data redundantly and increase data storage performance.
  • Conventionally, the initialization of a RAID set requires zeros to be written to each data storage location contained within each of the data storage devices in the RAID set. Today, each individual data storage device in a RAID set commonly contains many Terabytes of data. Since the process of writing zeros to a one Terabyte drive takes days, there is a large overhead associated with initializing a RAID set. Delays associated with waiting for RAID set initialization increases both direct and indirect costs of running a data center. Direct costs include the cost of space and power used during initialization. Indirect costs include not being able to use or lease data storage space on the RAID set when the RAID set is being initialized. Costs associated with initializing a RAID set would be reduced if the RAID set could act as an initialized RAID set immediately after it was described.
  • Examples of standard RAID levels include RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5, RAID 6, and RAID 10.
  • RAID 0 includes two or more data storage devices where each data storage device is written with the data evenly distributed across the data storage devices in equal sized chunks, without redundancy. These chunks of data are referred to as stripes. Each consecutive stripe is written to a different data storage device until all of the data storage devices contain a first stripe. After each data storage device has been written with a first stripe, a second set of stripes are written to the RAID set. This process is repeated until all data is written to the RAID set. RAID 0 is typically used when one wishes to increase the input/output performance of a data storage system.
  • RAID 1 is a mirror, wherein two or more data storage devices contain identical data. Essentially a second data storage device is a copy of a first data storage device. RAID 1 provides a simple form of data redundancy. RAID 2, like RAID 0, stripes data across a plurality of data storage devices. RAID 2, however, stripes at the bit level, not in larger chunks. RAID 2 also includes Hamming code error correction. RAID 3 combines data striping with parity, wherein parity data is stored on a dedicated data storage device. Parity allows the data to be reconstructed when a single data storage device fails using a simple XOR function. RAID 3 stripes data across the data storage devices at the byte level (8 bits).
  • RAID 4 uses data striping, and also uses a data storage device dedicated to storing parity data. RAID 4 stripes data at the block level. In most data storage devices today, data blocks, also known as logical blocks, contain 512 bytes. RAID 5 uses block level data striping combined with parity distributed across all of the data storage devices in the RAID set.
  • RAID 6 is similar to RAID 5, yet includes a second set of parity information distributed across the data storage devices in the RAID set. RAID 10 combines striping and mirroring, it is a combination of RAID 0 and RAID 1.
  • SUMMARY OF THE CLAIMED INVENTION
  • The present system allows system administrators to immediately use a RAID set after describing it. Typically, a software driver is used to intercept commands targeting the RAID set, and to respond to those commands as if the RAID set were initialized, even when it is not. The invention described herein, in certain embodiments,-supports any desired RAID level. The method of the invention will frequently begin after an operator describes a RAID configuration and initiates the initialization of the RAID set. Here, a RAID configuration includes a RAID level, and a set of physical or virtual data storage devices. Once the operator has chosen a RAID level, the operator assigns data storage devices to the RAID set. Then the operator will initiate a command to initialize the RAID set. The command to initialize the RAID set will typically be intercepted by a driver (or other software) in the software stack of the computer system. Once the driver has intercepted the RAID initialization command, the driver will assemble and store information that describes the initialization state of the RAID set.
  • Next the driver will look (snoop) for commands targeting the RAID set. Commands targeting the RAID set will be intercepted when they are identified (snooped). When the command is a read command to an uninitialized portion of the RAID set, i.e. when the read command targets a portion of the RAID set that may not contain valid RAID set data, the data returned will be zeros. In certain instances, the driver (or other software) will respond to the command by sending zeros for all stripes or logical blocks requested by the read command to the initiator of the read command. The driver may also perform background tasks, such as writing zeros to the uninitialized portions of the RAID set. When the command is a write command to an uninitialized portion of the RAID set, the driver will intercept the write command and write that data to the RAID set. When the write command does not span all data areas or logical blocks in a particular RAID stripe, the driver will write zeros to the data areas or logical blocks in the RAID stripe that were not written by the write command. When the command is a read command or a write command to an initialized portion of the RAID set, the command will be passed through to the RAID set. Commands targeting the RAID set will typically be sent to the RAID set by an initiator, frequently the initiator is a computer writing data to or reading data from the RAID set. An initiator consistent with the invention is not limited to a computer, however. In certain instances the initiator can be another data storage device or set of data storage devices. For example, the small computer system interface (SCSI) standard allows a data storage device to act as an initiator of commands to another data storage device. Furthermore, the initiator in the presently claimed invention is not limited to a computer, a host, a data storage device, a set of data storage devices, a personal computer, a server, a mobile device, or a network appliance acting as an initiator.
  • Thus, the RAID set is incrementally initialized with each write to the RAID set, or by using writes in the background. As the driver performs the incremental initialization of the RAID set, the driver maintains a record of which stripes or logical blocks are not initialized. Typically, this record will be maintained in a data structure stored in a non-volatile memory. This non-volatile memory typically will reside in the uninitialized portions of the RAID set.
  • In certain instances, a first data storage device in the RAID set may already store data that the operator wishes to save in a RAID configuration. In these instances the data contained in the first data storage device will have to be converted from a non-RAID configuration to a RAID configuration. Data targeted for conversion from a non-RAID configuration to a RAID configuration may be referred to in this disclosure as “valid uninitialized data”, or “data to be converted” to a RAID level. The process of initializing such a RAID set varies depending on the selected RAID level, and whether the first data storage device is included in the new RAID set description.
  • In general, the instant initialization of the RAID set includes: creating information describing an initialization state of a RAID set corresponding to an intercepted RAID set initialization command, intercepting a data transfer command to the RAID set, and transmitting data to the initiator of the data transfer command when the command is a read command. When such a read command is a read to an uninitialized portion of the RAID set, and when the read command targets a portion of the RAID set that does not contain valid RAID set data, the data returned will include zeros.
  • In certain instances, when the data transfer command is a write command to an uninitialized portion of the RAID set, the driver (or other software) will write that data to two data storage devices. In other instances, data will be written in complete stripes to a plurality of data storage devices. The presently claimed invention is not limited to initializing a single stripe or single set of logical blocks. Thus, the presently claimed invention may initialize any number of stripes or sets of logical blocks and be consistent with the presently claimed invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of a Linux software stack.
  • FIG. 2 is a flow chart of a method of the initialization of a RAID set.
  • FIG. 3 is a flow chart of a method of the initialization of the RAID set as the initialization occurs incrementally over time.
  • FIG. 4A illustrates two data storage devices in an un-initialized RAID 1 level.
  • FIG. 4B illustrates the two data storage devices shown in FIG. 4A, with data copied from the first data storage device to the second data storage device.
  • FIG. 4C illustrates the two data storage devices shown in FIG. 4B after a write command has been received to an LBA.
  • FIG. 5A illustrates a first data storage device containing valid uninitialized data and two other data storage devices that do not contain valid data.
  • FIG. 5B illustrates a RAID stripe written in stripes to three data storage devices.
  • FIG. 5C illustrates a second RAID stripe written in stripes to the same data storage devices shown in FIG. 5B after a write to a portion of a stripe in the RAID set.
  • DETAILED DESCRIPTION OF THE INVENTION
  • The present system allows system administrators to immediately use a RAID set after describing it. Typically, a software driver is used to intercept commands targeting the RAID set, and to respond to those commands as if the RAID set were initialized, even when it is not. Certain implementations are directed to a RAID set, wherein each of the data storage devices are contained within a single enclosure. Other implementations of the invention are directed to a RAID set, wherein each of the data storage devices may be located in a plurality of different physical locations. Software may be used to make a RAID set appear to consist of a set of data storage devices contained in a single enclosure, even when they are not.
  • The present system may initialize the RAID set using background tasks. Those of average skill in the art would recognize that a background task is an operation or process that occurs without an operator being aware that the operation or process is being performed. In some embodiments of the invention, certain background initialization tasks may be paused when the RAID set is operating above an operational threshold. Such an operational threshold includes yet is not limited to the number of input/output requests per second (IOPS) sent to a RAID set.
  • In operation, the system may begin after an operator describes a RAID configuration and initiates the initialization of a RAID set. Typically, a RAID configuration includes a RAID level, and a set of physical or virtual data storage devices. Once the operator has chosen a RAID level, the operator assigns data storage devices to the RAID set. Then the operator will initiate the initialization of the RAID set.
  • The presently claimed invention is not limited to initializing a single stripe or single set of logical blocks. Thus, the presently claimed invention may initialize any number of stripes or sets of logical blocks and be consistent with the presently claimed invention.
  • FIG. 1 is a block diagram of a Linux software stack. User space 1, including an application layer 1A is depicted in the figure. User space 1 may include a collection of software programs. Typically, these software programs are applications that are not part of the operating system software. Kernel space 2 is located at a lower level of the software stack. Kernel space 2 contains multiple different software programs, when combined; these programs comprise the operating system software of a computer. Kernel space 2, as depicted in FIG. 1, includes a system call interface 2A, a protocol agnostic interface 2B, network protocols 2C, a device agnostic interface 2D, and device drivers 2E. The physical device hardware 3 exists below all of the software levels. Conventionally, an operator manipulates an application program in user space 1, the application program communicates with one or more of the software modules in kernel space 2, and the software modules in kernel space 2 communicate with physical device hardware 3 of the computer system or RAID set.
  • In certain instances, the initialization command may be intercepted by a driver in the computer system's software stack. While not depicted in FIG. 1, this driver may be located either in a lower level of user space 1 or within kernel space 2. The driver is located at a level below a RAID set configuration application program, and above certain software modules in kernel space 2. Conventionally, software that initializes RAID sets operates in kernel space 2. An example of such a software program in kernel space 2 is the multiple disk administrator utility (MDADM) in the Linux software operating system. By intercepting RAID set initialization commands at a higher level; software programs like MDADM are prevented from receiving the initialization command, and are prevented from performing a conventional initialization of the RAID set.
  • FIG. 2 is a flow chart of a method of the initialization of a RAID set. A driver may look (i.e., snoop) for a RAID set initialization command at step 200. When the driver does not find a RAID set initialization command, the method stays in step 200. When the driver encounters a RAID set initialization command, the method proceeds to step 201 where the driver intercepts the command. Once the driver has intercepted a RAID initialization command, the driver will assemble information that describes the initialization state of the RAID set. Information describing the initialization state of the RAID set may include information identifying stripes that are currently not initialized, stripes that have already been initialized, or both. A driver or software consistent with the presently claimed invention uses this information to identify areas in a RAID set that need to be initialized. In certain instances these areas may be identified by stripe number, in other instances they may be identified by logical block (LBA) number. The method then moves to step 202, where it assembles and stores RAID set initialization information in a non-volatile memory. The RAID set initialization information may indicate that the entire RAID set is not initialized.
  • The driver then begins looking for (i.e., intercepting) commands directed at the RAID set at step 203. When a command for the RAID set is detected, the method moves to step 204 where the driver determines if the command is a read command addressing an uninitialized portion of the RAID set. When the command is a read command addressing an uninitialized portion of the RAID set, the driver sends zeros to the initiator of the read command 205. Program flow then moves to step 206 where the RAID set initialization information is updated and stored in the non-volatile memory.
  • When the command is not a read command to an uninitialized portion of the RAID set, the driver evaluates if the command is a write command to an uninitialized area of the RAID set at step 207. When the command is a write command to an uninitialized portion of the RAID set, data corresponding to the write command is written to the RAID set at step 208. Sometimes, a write to a RAID set is not a write to an entire stripe. In these instances, the driver may also write zeros to any portions of the stripe not addressed by the write command. At step 208 data associated with the write command is written to data blocks addressed by the write command, and zeros are written to data blocks in the stripe not addressed by the write command. Thus, an entire stripe may be initialized when the write command writes to a portion of an uninitialized stripe. Program flow then updates and stores the RAID set initialization information 206 in the non-volatile memory.
  • After the RAID set initialization information is updated and stored at step 206, the driver checks to see if the RAID set initialization process is complete 210. If initialization is complete at step 210, the method ends at step 211. Typically at step 211, the driver or software will stop intercepting commands addressing the RAID set. When the RAID set is not completely initialized, program flow moves from step 210 to step 203 where the driver snoops for more RAID set commands.
  • When a command directed at the RAID set is not a read command to uninitialized portions of the RAID set, program flow moves from step 204 to step 207. When the command is not a write command to uninitialized areas of the RAID set, program flow moves from step 207 to step 209. Step 209 is where any command that is not related to an uninitialized area of the RAID set is performed. Commands performed at step 209 may be commands relating to initialized areas of the RAID set, or other types of commands that may be sent the RAID set. The driver performing the RAID set initialization may pass commands performed at step 209 through to another software module without change.
  • FIG. 2 does not show background writes of zeros to un-initialized stripes of the RAID set. Various embodiments of the invention performing these background writes of zeros will also update and store RAID set initialization information in the non-volatile memory. The updating of RAID set initialization information in the non-volatile memory will typically be performed after a background write of zeros is performed, or after an initiator writes to an un-initialized portion of the RAID set. The description of FIG. 2, above, is exemplary, it is not intended to limit the invention. While the use of a driver is described above, the steps of FIG. 2 may be performed using one or more other software programs.
  • In general, methods consistent with the apparent immediate initialization of the RAID set incrementally initialize the RAID set using background writes of zeros, and by initializing previously uninitialized portions of the RAID set when write commands to the RAID set are intercepted. As the incremental initialization of the RAID set proceeds, the driver or software maintains a record of what stripes or logical blocks are not initialized. Typically this record will be maintained in a data structure stored in a non-volatile memory. The data structure of this record includes, yet is not limited to a B-Tree, an array, or a linked list. When the data structure used to store this record has a tree structure, information created to describe the initialization state of the RAID set may be stored in a base node of the tree. Furthermore, subsequent updates to this information may be stored in one or more child nodes of the tree.
  • In other implementations of the invention, the driver or software may maintain a record of what stripes or logical blocks are initialized, instead of those that are not initialized. Since, the driver or software will typically be aware of the total number of stripes or logical blocks that are contained in the RAID set, the driver or software operating on a processor can determine which stripes or logical blocks are not initialized when it has a record of the logical blocks that are initialized. Similarly, the driver or software can determine which stripes or logical blocks are initialized when the driver or software has a record of the logical blocks that are not initialized, when the driver or software is aware of the total number of stripes or logical blocks that are contained within the RAID set.
  • Over time, uninitialized portions of the RAID stripe are initialized using processes consistent with those reviewed above. Throughout this process, information describing the initialization of the RAID set will be updated in the non-volatile memory either incrementally or periodically as the RAID set initialization proceeds. FIG. 3 is a flow chart of a method of the initialization of the RAID set as the initialization occurs incrementally over time. In the example shown in FIG. 3, step 300, the uninitialized RAID set initially includes stripes 0-100. After a first background write, stripes 1-100 are uninitialized at step 301. After a first user write to RAID stripe 50, RAID stripes 1-49, & 51-100 remain uninitialized at step 302. Next, after an additional 24 background writes, the uninitialized stripes include stripes 25-49, & 51-100. As a plurality of additional user writes occurs at step 304, stripes 25-40, 42-49, 51-60, and 62-100 are uninitialized. When all uninitialized areas of the RAID set have been initialized, the initialization is complete 305. When all stripes or logical blocks of the RAID set have been written, the driver may stop intercepting commands to the RAID set.
  • Information that describes the initialization state of the RAID set may be stored in different types of memory and memory locations. For example, information may be stored in the uninitialized stripes of the RAID set, in a dedicated location within a data storage device, in a memory within a RAID box, in a data storage device that is external to the RAID set, or combination thereof. Types of non-volatile memory where the information that describes the initialization state of the RAID set may be stored include, yet are not limited to: a disk drive, a flash drive, flash memory, phase change memory, resistive memory (RERAM), ferroelectric memory (FERAM), battery backed up dynamic random access memory, or racetrack memory. Data storage devices in the RAID set itself may also consist of different types of memory that include, yet are not limited to: disk drives, flash drives, flash memory, phase change memory, resistive memory (RERAM), ferroelectric memory (FERAM), battery backed up dynamic random access memory, or racetrack memory.
  • When the information describing the initialization state of the RAID set is stored in a RAID stripe, the driver may be required to change where the information is stored. For example, when this information is stored in stripe 100, and stripe 100 is written to by a user, the driver will move the information to another stripe. In such an embodiment, the driver will typically track the stripe number where the information is stored. The driver may also maintain a pointer to that stripe in a small piece of non-volatile memory. When a write to the last remaining stripe is intercepted, the driver may assign the last remaining stripe the last remaining stripe number, and then write data to the stripe.
  • In certain instances, the driver may not allow a user level application program to use all stripes that the RAID set contains. For example, the RAID set may allow a user to access stripes 0-99, and use stripe 100 for storing the information. In some of these instances, the present system may change the mapping of the stripes of the RAID set. When changing the mapping, the driver could map stripes 1-100 as user stripes 0-99 and use the original stripe 0 to store the information.
  • The information describing the initialization state of the RAID set may be stored in a reserved area of a data storage device. In these embodiments the driver (or other software) may artificially change the maximum logical block of a data storage device in the RAID set. The driver may also reserve logical blocks above the artificial maximum logical block for storing the initialization information.
  • Today, disk drives and FLASH drives can be programmed to report a maximum logical block number that is below the real maximum logical block in the drive. Logical blocks on these drives can be accessed by software that is aware that the data storage devices capacity was truncated. In certain embodiments of the invention the reserved area of the data storage device is accessed using special commands. Most user level programs will not be aware that a drives capacity has been truncated. Typically, such programs would not be able to access the logical blocks in the reserved area because they would not be aware that the reserved area exists. Truncating the capacity of a disk drive in this way is commonly referred to as de-stroking the drive.
  • An example of special commands used to truncate the capacity of a data storage device include the Read Native Max Capacity and the Set Max Address commands from the data storage advanced technology attachment (ATA) specification. First the total native capacity of a disk drive is determined using the Read Native Max Capacity command. Then the capacity of the drive may be artificially set to a capacity less than the total capacity of the drive by sending the drive a Set Max Address command with a maximum LBA number. The ATA Set Max Address command causes the drive to enter the maximum LBA number provided with the command into a table inside of the drive. The drive will now service commands addressing LBA0 through the maximum LBA number provided with the command. At this time, conventional commands attempting to access an LBA beyond this artificial maximum LBA will cause drive to respond with an error message. LBAs located above this artificial maximum LBA may be accessed by sending the drive another Set Max Address command with a maximum LBA corresponding to the Native Max Capacity of the drive. During the time when the maximum capacity of the drive is equal to the native capacity of the drive, the driver or software will typically intercept and stall commands addressing the RAID set. After the driver or software accesses LBAs above the artificial maximum LBA, the driver or software once again can truncate the capacity of the drive using the Set Max Address command. In this manner, a driver or software consistent with embodiments of the invention may access information located above the artificial maximum LBA when other programs or initiators cannot. The example above is for exemplary purposes, it is not intended to limit the scope of the invention described herein.
  • In certain instances, a first data storage device selected by the operator to be included in a RAID set may already store data that the operator wishes to save in a RAID configuration. In these embodiments the data contained in the first data storage device will have to be converted from a non-RAID configuration to a RAID configuration. When converting to a RAID 1 level, the data conversion process will include copying data from the first data storage device to a second data storage device. When converting to a RAID level that uses striping, the conversion process will include a series of other background tasks.
  • When the conversion process includes copying the data stored on the data storage device to another data storage device any write commands targeting the RAID set will be written to the first data storage device and to a second data storage device. Any read commands sent to the RAID set will be read from the first data storage device. At this time the logical blocks read from the first data storage device may be written to the second data storage device. As data is written to the second data storage device, the drive will store RAID set initialization information to the non-volatile memory. Depending on the particular implementation, the RAID set initialization information may include information identifying the data areas or logical blocks that are different from, or identical to data areas or logical blocks on the first data storage device.
  • FIG. 4A shows two data storage devices in an un-initialized RAID 1 level (copy). Here, the two data storage devices 401 & 402 are in an un-initialized RAID 1 level (copy) configuration. Each of these data storage devices (401 & 402) include logical blocks 0-9,where each logical block has a logical block address 0-9 (LBA 0-9). All of the LBAs (0-9) in data storage device 401 in FIG. 4A contain valid uninitialized data VUD stored in a non-RAID configuration. Since LBAs 0-9 on data storage device 402 in FIG. 4A do not contain valid data (initialized or uninitialized), they are depicted as containing no data ND.
  • When, an operator initiates the conversion of data storage device 401 to a RAID 1 level (copy), data is copied from data storage device 401 to data storage device 402 using background tasks. Before any background initialization tasks are performed, all of the LBAS on these drives are uninitialized to the RAID 1 level, thus LBAS 0-9 are not identical on the two data storage devices 401 & 402.
  • Copying data from one data storage device to another comprises reading one or more LBAs on a first data storage device 401, and writing that data to a second data storage device 402. As mentioned above, the process of copying (reading then writing) may be performed in a background task, or may be performed when a read command to the RAID set is received. FIG. 4B shows the same two data storage devices shown in FIG. 4A, with data copied from the first data storage device to the second data storage device. Here, LBAS 0-1 have been copied from data storage device 401 to data storage device 402. In FIG. 4B data stored in LBA 0 of data storage device 401 is identical to data stored in LBA 0 of data storage device 402, and data stored in LBA 1 of data storage device 401 is identical to data stored in LBA 1 of data storage device 402. FIG. 4B depicts LBAS 0-1 data storage device 401 and data storage device 402 containing initialized data InD. These LBAs have been initialized to RAID 1 level. At this time, LBAs 2-9 on data storage device 401 contain valid uninitialized data VUD, and LBAS 2-9 on data storage device 402 contain no data ND.
  • FIG. 4C shows two data storage devices shown in FIG. 4B after a write command has been received to an LBA. In this instance, LBA 6 is written to both data storage devices 401 & 402. In this figure, LBAS 2-5, & 7-9 contain valid uninitialized data VUD on data storage device 401, and LBAS 2-5, & 7-9 on data storage device 402 contain no data ND; LBAS 2-5, & 7-9, therefore do not contain initialized data. FIG. 4C also depicts, LBAs 0-1 & 6 in both data storage devices 401 & 402, containing initialized data InD.
  • The process for converting, data from the first drive to a RAID configuration using stripes includes a series of different steps. In these embodiments, background tasks include reading data on the first data storage device, and writing that data in a series of stripes to the drives in the RAID set. When a write occurs to an uninitialized portion of the RAID set, the driver will intercept the write command and write stripes of data to the RAID set. When such a write command does not include a complete stripe, additional data will be read from the first drive, and that data will be written to the plurality of data storage devices in the RAID set in complete stripes. At some point, when all data from the first drive has been moved to stripes in the RAID set, the RAID set may still contain un-initialized areas. Any un-initialized areas of the RAID set may be initialized using the read and write processes reviewed above, including processes like those reviewed in FIG. 2. In this embodiment, however, read requests targeting data residing in uninitialized areas on the first data storage device, will be read from the first data storage device. After servicing the read command, the data read from the first data storage device may then be written to the RAID set in complete stripes.
  • FIG. 5A depicts a first data storage device containing valid uninitialized data and two other data storage devices that do not contain valid data. Here, the first data storage device 501 contains valid uninitialized data VUD in LBAS 0-8 (these LBAS contain valid data in an uninitialized RAID state). FIG. 5A also includes two other data storage devices 502 & 503. LBAS in data storage device 502 & 503 do not contain valid data (initialized or uninitialized); these LBAs are depicted as containing no data ND.
  • FIG. 5B depicts a RAID stripe written in stripes to three data storage devices. Here, RAID stripe 0 is written in stripes to data storage devices 501, 502, and 503. RAID stripe 0 was written after reading LBAs 0-2 on data storage device 501 in one or more background tasks. At this time, LBA 0 in data storage device 501 contains stripe 0A, LBA 0 in data storage device 502 contains stripe 0B, and LBA 0 in data storage device 503 contains stripe 0C. Furthermore, LBAS 3-8 in data storage device 501 contain valid uninitialized data VUD. LBAS 1-8 on data storage devices 502 and 503, in the figure, contain no data ND (these LBAS do not contain data belonging to the RAID set). LBAS 1-2 on data storage device 501 are depicted as containing moved data MD. Data from these LBAS has already been moved (i.e., migrated) to RAID stripe 0.
  • Since write commands are serviced while converting to an initialized RAID state, some implementations of the process may also include read modify writes. FIG. 5C shows a second RAID stripe written in stripes to the same data storage devices shown in FIG. 5B after a write to a portion of a stripe in the RAID set. In the figure, RAID stripe 2 is written to LBA 6 on data storage devices 501, 502, & 503. Here data storage device 501 contains stripe 2A, data storage device 502 contains stripe 2B, and data storage device 503 contains stripe 2C. In this instance, data stripe 2 was created after receiving a write to LBA 6, a portion of data stripe 2. Since a mapping of LBAs on data storage device 501 map to data stripe 2, data stripe 2 should include data stored corresponding to uninitialized LBAs 6-8 on data storage device 501. The steps performed when creating RAID stripe 2 are essentially a read modify write. In this instance, these steps include: receiving a write command targeting LBA 6, reading LBAs 7-8 on data storage data storage device 501, combining data from the write command to LBA 6 with data read from LBAs 7-8, and then writing stripe 2 to LBA 6 on each of the data storage devices 501, 502, & 503.
  • LBAs in FIG. 5C that do not contain valid uninitialized data or initialized data, are depicted as containing no data ND. Here, LBAs 1-2 & 7-8 on data storage device 501 are depicted as containing moved data MD. This data has already been moved (i.e., migrated) to RAID stripes 0 or 2. At this time, LBAs 1-5 & 7-8 on data storage devices 502 & 503 are depicted as containing no data ND.
  • The process of mapping LBAs, on a data storage device that contains valid initialized data may include sequentially associating LBA number to stripe number. An example of a simple sequential mapping of LBA to stripe number is where the LBA number corresponds to stripe number. When there are three data storage devices in a RAID set, such a mapping could map LBA 0 on each of the three data storage devices to stripe 0, map LBA 1 on each of the three data storage devices to stripe 1, and map each subsequent LBA number on the data storage devices to subsequent stripes.
  • In contrast, the mapping of LBAs to stripes in FIG. 5C is not sequential. Here, stripe 0 is mapped to LBA 0 on each of the three data storage devices, yet stripe 2 is mapped to LBA 6 on each of the three data storage devices. In such an instance, translation information could be maintained or referenced by the driver (or other software) as the RAID set is initialized. This translation information, in some instances may be stored in a translation table.
  • Yet other implementations may include a first data storage device containing valid uninitialized data that needs to be migrated to a RAID set that does not contain the first data storage device. Here again, background tasks may include reading data on the first data storage and writing that data in complete stripes to the RAID set. When a write command targeting data on the first data storage device is received, that data will be written to the RAID set. Furthermore, additional data may be read from the first drive such that complete stripes can be written to the RAID set. Any uninitialized areas of the RAID set may also be initialized using the read and write processes reviewed above, including processes like those reviewed in FIG. 2.
  • The above description is illustrative and not restrictive. Many variations of the invention will become apparent to those of average skill in the art upon review of this disclosure. While the present invention has been described in connection with a variety of embodiments, these descriptions are not intended to limit the scope of the invention to the particular forms set forth herein. To the contrary, the present descriptions are intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims and otherwise appreciated by one of ordinary skill in the art.

Claims (21)

1. (canceled)
2. A method for initializing data in a RAID set:
intercepting a RAID set initialization command, wherein the intercepted RAID set initialization command corresponds to a set of data currently stored in a non-RAID format;
converting a first part of the data currently stored in the non-raid format from the non-RAID format to a RAID format;
writing the first portion of data in the RAID format to one or more data storage locations associated with the first portion of the RAID set as one or more first background tasks;
storing information that identifies the initialization state of the RAID state based on the first portion of the RAID set being written by the one or more first background tasks, wherein the stored information that identifies the initialization state of the RAID set identifies that the first portion of the RAID set has been initialized into the RAID format;
receiving a read command relating to the RAID set from the application in the user space;
identifying that the received read command maps to an un-initialized portion of the RAID set; and
servicing the received read command by reading data from the set of data stored in the non-RAID format and by providing the data read from the set of data stored in the non-RAID format to application in the user space.
3. The method of claim 2, further comprising receiving a write command to the RAID set from the application in the user space and writing data from the received write command to the RAID set.
4. The method of claim 2, further comprising identifying that the received write command is to the un-initialized portion of the RAID set and storing updated information that identifies an updated initialization state of the RAID state based on data written according to the received write command to the un-initialized portion of the RAID set.
5. The method of claim 2, further comprising:
converting a second part of the data currently stored in the non-raid format from the non-RAID format to the RAID format;
writing the second portion of data in the RAID format to one or more data storage locations associated with the portion of the RAID set as one or more second set of background tasks, wherein the application residing in user space is not aware of the one or more second background tasks; and
storing updated information that identifies an updated initialization state of the RAID state, wherein the stored information that identifies the updated initialization state of the RAID set identifies that the second portion of the RAID set has been initialized into the RAID format.
6. The method of claim 2, wherein the application in the user space provides commands to a driver according to the RAID format when the RAID set is not initialized.
7. The method of claim 2, wherein the RAID format corresponds to a copy of the data stored in the non-RAID format.
8. The method of claim 2, wherein the RAID format includes a plurality of stripes.
9. The method of claim 2, wherein the RAID format includes parity information.
10. The method of claim 6, wherein the driver is in kernel space.
11. A non-transitory computer readable storage medium having embodied thereon a program executable by a processor to perform a method for initializing data in a RAID set:
intercepting a RAID set initialization command, wherein the intercepted RAID set initialization command corresponds to a set of data currently stored in a non-RAID format;
converting a first part of the data currently stored in the non-raid format from the non-RAID format to a RAID format;
writing the first portion of data in the RAID format to one or more data storage locations associated with the first portion of the RAID set as one or more first background tasks;
storing information that identifies the initialization state of the RAID state based on the first portion of the RAID set being written by the one or more first background tasks, wherein the stored information that identifies the initialization state of the RAID set identifies that the first portion of the RAID set has been initialized into the RAID format;
receiving a read command relating to the RAID set from the application in the user space;
identifying that the received read command maps to an un-initialized portion of the RAID set; and
servicing the received read command by reading data from the set of data stored in the non-RAID format and by providing the data read from the set of data stored in the non-RAID format to application in the user space.
12. The non-transitory computer readable storage medium of claim 11, the program is further executable to receive a write command to the RAID set from the application in the user space and to write data from the received write command to the RAID set.
13. The non-transitory computer readable storage medium of claim 11, the program is further executable to identify that the received write command is to the un-initialized portion of the RAID set and storing updated information that identifies an updated initialization state of the RAID state based on data written according to the received write command to the un-initialized portion of the RAID set.
14. The non-transitory computer readable storage medium of claim 11, the program is further executable to:
convert a second part of the data currently stored in the non-raid format from the non-RAID format to the RAID format;
write the second portion of data in the RAID format to one or more data storage locations associated with the portion of the RAID set as one or more second set of background tasks, wherein the application residing in user space is not aware of the one or more second background tasks; and
store updated information that identifies an updated initialization state of the RAID state, wherein the stored information that identifies the updated initialization state of the RAID set identifies that the second portion of the RAID set has been initialized into the RAID format.
15. The non-transitory computer readable storage medium of claim 11, wherein the application in the user space provides commands to a driver according to the RAID format when the RAID set is not initialized.
16. The non-transitory computer readable storage medium of claim 11, wherein the RAID format corresponds to a copy of the data stored in the non-RAID format.
17. The non-transitory computer readable storage medium of claim 11, wherein the RAID format includes a plurality of stripes.
18. The non-transitory computer readable storage medium of claim 11, wherein the RAID format includes parity information.
19. The non-transitory computer readable storage medium of claim 16, wherein the driver is in kernel space.
20. A apparatus for initializing data in a RAID set:
a memory; and
a processor executing instructions out of the memory that:
intercept a RAID set initialization command, wherein the intercepted RAID set initialization command corresponds to a set of data currently stored in a non-RAID format;
convert a first part of the data currently stored in the non-raid format from the non-RAID format to a RAID format;
write the first portion of data in the RAID format to one or more data storage locations associated with the first portion of the RAID set as one or more first background tasks;
control the storage of information that identifies the initialization state of the RAID state based on the first portion of the RAID set being written by the one or more first background tasks, wherein the stored information that identifies the initialization state of the RAID set identifies that the first portion of the RAID set has been initialized into the RAID format;
receive a read command relating to the RAID set from the application in the user space;
identify that the received read command maps to an un-initialized portion of the RAID set; and
service the received read command by reading data from the set of data stored in the non-RAID format and by providing the data read from the set of data stored in the non-RAID format to application in the user space.
21. The apparatus of claim 20, wherein a write command to the RAID set is received from the application in the user space and data from the received write command is written to the RAID set.
US15/438,561 2014-01-24 2017-02-21 Raid set initialization Abandoned US20170228156A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/438,561 US20170228156A1 (en) 2014-01-24 2017-02-21 Raid set initialization

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/164,045 US9612745B2 (en) 2014-01-24 2014-01-24 Raid set initialization
US15/438,561 US20170228156A1 (en) 2014-01-24 2017-02-21 Raid set initialization

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/164,045 Continuation US9612745B2 (en) 2014-01-24 2014-01-24 Raid set initialization

Publications (1)

Publication Number Publication Date
US20170228156A1 true US20170228156A1 (en) 2017-08-10

Family

ID=53679075

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/164,045 Expired - Fee Related US9612745B2 (en) 2014-01-24 2014-01-24 Raid set initialization
US15/438,561 Abandoned US20170228156A1 (en) 2014-01-24 2017-02-21 Raid set initialization

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/164,045 Expired - Fee Related US9612745B2 (en) 2014-01-24 2014-01-24 Raid set initialization

Country Status (1)

Country Link
US (2) US9612745B2 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9690823B2 (en) * 2014-09-25 2017-06-27 Dropbox, Inc. Synchronizing copies of an extent in an append-only storage system
CN108733316B (en) * 2017-04-17 2021-08-31 伊姆西Ip控股有限责任公司 Method and manager for managing storage system
US11592991B2 (en) 2017-09-07 2023-02-28 Pure Storage, Inc. Converting raid data between persistent storage types
US10552090B2 (en) 2017-09-07 2020-02-04 Pure Storage, Inc. Solid state drives with multiple types of addressable memory
US11609718B1 (en) 2017-06-12 2023-03-21 Pure Storage, Inc. Identifying valid data after a storage system recovery
US10417092B2 (en) 2017-09-07 2019-09-17 Pure Storage, Inc. Incremental RAID stripe update parity calculation
US10789020B2 (en) 2017-06-12 2020-09-29 Pure Storage, Inc. Recovering data within a unified storage element
EP3612922A1 (en) 2017-06-12 2020-02-26 Pure Storage, Inc. Accessible fast durable storage integrated into a bulk storage device
US10922012B1 (en) 2019-09-03 2021-02-16 Dropbox, Inc. Fair data scrubbing in a data storage system
CN115098046B (en) * 2022-08-26 2023-01-24 苏州浪潮智能科技有限公司 Disk array initialization method, system, electronic device and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5502836A (en) * 1991-11-21 1996-03-26 Ast Research, Inc. Method for disk restriping during system operation
US6546457B1 (en) * 2000-09-29 2003-04-08 Emc Corporation Method and apparatus for reconfiguring striped logical devices in a disk array storage
US6976124B2 (en) * 2001-12-25 2005-12-13 Hitachi, Ltd. Control method of control device conducting data input and output to storage devices, and control device used therefor
US6993676B2 (en) * 2002-06-12 2006-01-31 Sun Microsystems, Inc. Method and apparatus for fast initialization of redundant arrays of storage devices
US7437507B2 (en) * 2005-06-06 2008-10-14 Cisco Technology, Inc. Online restriping technique for distributed network based virtualization
US7711897B1 (en) * 2005-06-10 2010-05-04 American Megatrends, Inc. Method, system, apparatus, and computer-readable medium for improving disk array performance
US7979773B2 (en) * 2007-03-13 2011-07-12 Summit Data Systems Llc RAID array auto-initialization (RAAI)
EP2645251A1 (en) * 2008-06-06 2013-10-02 Pivot3 Method and system for distributed RAID implementation
US20110029728A1 (en) * 2009-07-28 2011-02-03 Lsi Corporation Methods and apparatus for reducing input/output operations in a raid storage system
JP6119736B2 (en) * 2012-03-19 2017-04-26 富士通株式会社 Data access method, program, and data access apparatus

Also Published As

Publication number Publication date
US9612745B2 (en) 2017-04-04
US20150212736A1 (en) 2015-07-30

Similar Documents

Publication Publication Date Title
US20170228156A1 (en) Raid set initialization
US9880766B2 (en) Storage medium storing control program, method of controlling information processing device, information processing system, and information processing device
CN106708425B (en) Distributed multi-mode storage management
US8819338B2 (en) Storage system and storage apparatus
US9229870B1 (en) Managing cache systems of storage systems
US8555029B2 (en) Virtualized storage system and method of operating thereof
US8539180B2 (en) System and method for migration of data
US9304685B2 (en) Storage array system and non-transitory recording medium storing control program
KR102703983B1 (en) Storage device storing data in raid manner
US9632702B2 (en) Efficient initialization of a thinly provisioned storage array
US9875043B1 (en) Managing data migration in storage systems
US9842024B1 (en) Flash electronic disk with RAID controller
KR20150105323A (en) Method and system for data storage
JP2020533694A (en) Dynamic relocation of data using cloud-based ranks
US11256447B1 (en) Multi-BCRC raid protection for CKD
US20170262220A1 (en) Storage control device, method of controlling data migration and non-transitory computer-readable storage medium
US20170116087A1 (en) Storage control device
US11315028B2 (en) Method and apparatus for increasing the accuracy of predicting future IO operations on a storage system
US7293193B2 (en) Array controller for disk array, and method for rebuilding disk array
US9244632B1 (en) Data storage system configuration
US9779002B2 (en) Storage control device and storage system
WO2018055686A1 (en) Information processing system
US9639417B2 (en) Storage control apparatus and control method
US9164681B1 (en) Method and apparatus for dynamic path-selection for improving I/O performance in virtual provisioned storage arrays with data striping
US11340795B2 (en) Snapshot metadata management

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE