US20220107871A1 - Creating and distributing spare capacity of a disk array - Google Patents
Creating and distributing spare capacity of a disk array Download PDFInfo
- Publication number
- US20220107871A1 US20220107871A1 US17/061,922 US202017061922A US2022107871A1 US 20220107871 A1 US20220107871 A1 US 20220107871A1 US 202017061922 A US202017061922 A US 202017061922A US 2022107871 A1 US2022107871 A1 US 2022107871A1
- Authority
- US
- United States
- Prior art keywords
- drive
- protection group
- members
- drives
- partition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2094—Redundant storage or storage space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
- G06F11/1088—Reconstruction on already foreseen single or plurality of spare disks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0631—Configuration or reconfiguration of storage systems by allocating resources to storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/85—Active fault masking without idle spares
Definitions
- the subject matter of this disclosure is generally related to electronic data storage and more particularly to providing scalable drive subsets with protection groups and spare capacity.
- Protection groups help to avoid data loss by enabling a failing or failed protection group member to be reconstructed.
- Individual disk drives are protection group members in a typical data storage system, e.g. members of a redundant array of independent drives (RAID) protection group.
- RAID (D+P) protection group has D data members and P parity members.
- the data members store data.
- the parity members store parity information such as XORs of data values. The parity information enables reconstruction of data in the event that a data member fails. Parity information can be reconstructed from the data on the data members in the event that a parity member fails.
- a failed protection group member is typically reconstructed on a spare drive.
- storage capacity may be increased when existing storage capacity becomes fully utilized.
- a storage system that implements RAID-5 (4+1), for example, may be scaled-up in increments of five new drives plus one spare drive.
- a RAID-5 (3+1) may be scaled-up in increments of four new drives plus one spare drive.
- One drawback of scaling storage capacity in increments of (W+1) new drives is that it may introduce excess storage capacity that will not be utilized within a reasonable timeframe.
- FIG. 1 illustrates a storage array with a drive manager configured to create and distribute spare capacity associated with scalable drive subsets on which protection groups are maintained.
- FIG. 2 illustrates layers of abstraction between the managed drives and the production volume of the storage array of FIG. 1 .
- FIG. 3 illustrates creation of an initial drive subset of (W+1) drives on which RAID (D+P) protections groups are maintained.
- FIG. 4 is a matrix representation of the drive subset of FIG. 3 .
- FIGS. 5 and 6 illustrate use of spare partitions in response to drive failure in the drive subset represented by FIG. 4 .
- FIGS. 7 and 8 illustrate selection and movement of protection group members to a first new drive added to the drive subset represented by FIG. 4 .
- FIG. 9 illustrates creation of a new protection group using the partitions freed by movement of existing protection group members to the first new drive.
- FIG. 10 illustrates selection and movement of protection group members to a second new drive added to the drive subset and creation of a new protection group using the partitions freed by movement of existing protection group members to the second new drive.
- FIG. 11 illustrates selection and movement of protection group members to a third new drive added to the drive subset and creation of a new protection group using the partitions freed by movement of existing protection group members to the third new drive.
- FIG. 12 illustrates selection and movement of protection group members to a fourth new drive added to the drive subset.
- FIG. 13 illustrates splitting the drive subset of FIG. 12 into two independent drive subsets.
- FIG. 14 illustrates selection and movement of protection group members to distribute spares and configure one of the split-away drive subsets for scaling.
- FIG. 15 illustrates use of spares of one of the split-away drive subsets for creation of a new protection group.
- FIG. 16 illustrates selection and movement of protection group members to a fifth new drive added to the drive subset of FIG. 12 .
- FIG. 17 illustrates splitting the drive subset of FIG. 16 into two independent drive subsets with distributed spare capacity.
- FIG. 18 illustrates use of spare capacity from a first drive subset for rebuilding protection group members from a second drive subset in the event of drive failure.
- FIG. 19 illustrates steps associated with creation of an initial drive subset of (W+1) drives on which RAID (D+P) protections groups are maintained.
- FIG. 20 illustrates steps associated with creation and distribution of spare capacity associated with scalable drive subsets on which protection groups are maintained.
- disk and “drive” are used interchangeably herein and are not intended to refer to any specific type of non-volatile storage media.
- logical and “virtual” are used to refer to features that are abstractions of other features, e.g. and without limitation abstractions of tangible features.
- physical is used to refer to tangible features that possibly include, but are not limited to, electronic hardware. For example, multiple virtual computers could operate simultaneously on one physical computer.
- logic if used herein, refers to special purpose physical circuit elements, firmware, software, computer instructions that are stored on a non-transitory computer-readable medium and implemented by multi-purpose tangible processors, alone or in any combination.
- inventive concepts are described as being implemented in a data storage system that includes host servers and a storage array. Such implementations should not be viewed as limiting. Those of ordinary skill in the art will recognize that there are a wide variety of implementations of the inventive concepts in view of the teachings of the present disclosure.
- Some aspects, features, and implementations described herein may include machines such as computers, electronic components, optical components, and processes such as computer-implemented procedures and steps. It will be apparent to those of ordinary skill in the art that the computer-implemented procedures and steps may be stored as computer-executable instructions on a non-transitory computer-readable medium. Furthermore, it will be understood by those of ordinary skill in the art that the computer-executable instructions may be executed on a variety of tangible processor devices, i.e. physical hardware. For practical reasons, not every step, device, and component that may be part of a computer or data storage system is described herein. Those of ordinary skill in the art will recognize such steps, devices, and components in view of the teachings of the present disclosure and the knowledge generally available to those of ordinary skill in the art. The corresponding machines and processes are therefore enabled and within the scope of the disclosure.
- FIG. 1 illustrates a storage array with a drive manager configured to create and distribute spare capacity associated with scalable drive subsets on which protection groups are maintained.
- Each drive subset managed by the drive manager 102 is scalable in single drive increments and can be split into multiple drive subsets when enough drives have been added.
- the storage array 100 is one example of a storage area network (SAN), which is one example of a data storage system in which the drive manager could be implemented.
- the storage array 100 is depicted in a simplified data center environment supporting two network server hosts 103 that run host applications.
- the hosts 103 include volatile memory, non-volatile storage, and one or more tangible processors.
- the storage array 100 includes one or more bricks 104 .
- Each brick includes an engine 106 and one or more drive array enclosures (DAEs) 108 .
- Each engine 106 includes a pair of interconnected compute nodes 112 , 114 that are arranged in a failover relationship and may be referred to as “storage directors.” Although it is known in the art to refer to the compute nodes of a SAN as “hosts,” that naming convention is avoided in this disclosure to help distinguish the network server hosts 103 from the compute nodes 112 , 114 . Nevertheless, the host applications could run on the compute nodes, e.g. on virtual machines or in containers.
- Each compute node includes resources such as at least one multi-core processor 116 and local memory 118 .
- the processor may include central processing units (CPUs), graphics processing units (GPUs), or both.
- the local memory 118 may include volatile media such as dynamic random-access memory (DRAM), non-volatile memory (NVM) such as storage class memory (SCM), or both.
- Each compute node includes one or more host adapters (HAs) 120 for communicating with the hosts 103 .
- Each host adapter has resources for servicing input-output commands (IOs) from the hosts.
- the host adapter resources may include processors, volatile memory, and ports via which the hosts may access the storage array.
- Each compute node also includes a remote adapter (RA) 121 for communicating with other storage systems.
- Each compute node also includes one or more drive adapters (DAs) 128 for communicating with managed drives 101 in the DAEs 108 .
- RA remote adapter
- DAs drive adapters
- Each drive adapter has processors, volatile memory, and ports via which the compute node may access the DAEs for servicing IOs.
- Each compute node may also include one or more channel adapters (CAs) 122 for communicating with other compute nodes via an interconnecting fabric 124 .
- the managed drives 101 include non-volatile storage media such as, without limitation, solid-state drives (SSDs) based on electrically erasable programmable read-only memory (EEPROM) technology such as NAND and NOR flash memory and hard disk drives (HDDs) with spinning disk magnetic storage media.
- SSDs solid-state drives
- EEPROM electrically erasable programmable read-only memory
- HDDs hard disk drives
- Drive controllers may be associated with the managed drives as is known in the art.
- An interconnecting fabric 130 enables implementation of an N-way active-active backend.
- a backend connection group includes all drive adapters that can access the same drive or drives.
- every drive adapter 128 in the storage array can reach every DAE via the fabric 130 .
- every drive adapter in the storage array can access every managed drive 101 .
- the managed drives 101 are not discoverable by the hosts but the storage array creates a logical storage device referred to herein as a production volume 140 that can be discovered and accessed by the hosts.
- the production volume may also be referred to as a storage object, source device, production device, or production LUN, where the logical unit number (LUN) is a number used to identify logical storage volumes in accordance with the small computer system interface (SCSI) protocol.
- LBAs fixed-size logical block addresses
- the compute nodes maintain metadata that maps between the production volume 140 and the managed drives 101 in order to process IOs from the hosts.
- FIG. 2 illustrates layers of abstraction between the managed drives 101 and the production volume 140 .
- the smallest unit of storage capacity that can be processed by a managed drive 101 is a sector.
- Different types of managed drives may be characterized by different sector sizes but for context and without limitation the sector size of all managed drives may be 2 KB.
- IOs between the compute nodes and the managed drives may be in larger allocation units such as 128 KB tracks that are a fixed size that may be an integer multiple of the sector size. For example, an IO may read or write the sectors of a track.
- the managed drives 101 are each organized into partitions 201 of equal storage capacity, i.e. every partition has the same fixed size.
- partition storage capacity is a design implementation and, for context and without limitation, may be some fraction or percentage of the capacity of a managed drive equal to an integer multiple of sectors greater than 1.
- Each partition may include a contiguous range of logical addresses.
- Groups of partitions that include partitions from different managed drives are used to create RAID protection groups 207 .
- the RAID protection groups are distributed on data devices (TDATs) 203 .
- a storage resource pool 205 also known as a “data pool” or “thin pool,” is a collection of TDATs 203 of the same emulation and RAID protection group type, e.g. RAID-5.
- TDATs in a drive group are of a single RAID protection group type and all have the same size (storage capacity).
- Logical thin devices (TDEVs) 219 are created using TDATs. The TDATs and TDEVs are accessed using tracks as the allocation unit. Multiple TDEVs 219 are organized into a storage group 225 .
- the production volume 140 is created from a single storage group 225 . Host application data, which is stored in blocks on the production volume 140 , is mapped to tracks of the TDEVs, which map to sectors of the managed drives. Regardless of the specific allocation unit capacities selected, a track is larger than both the sectors and the fixed size blocks used in communications between the storage array and the hosts to access the production volume.
- FIG. 3 illustrates an implementation of RAID (D+P) protections groups on a subset 300 of the managed drives 101 ( FIG. 1 ).
- W (D+P) by definition.
- the storage array includes multiple drive subsets such as drive subset 300 , each of which is created with W or (W+1) drives and W partitions. Using (W+1) drives provides spare capacity for use in response to drive failure.
- RAID (3+1) is implemented in the illustrated example and the drive subset includes (W+1) drives so there are five drives D 1 -D 5 and four partition indexes P 1 -P 4 . For purposes of explanation all of the drives have the same storage capacity and all of the partitions have the same fixed size in terms of storage capacity.
- protection group members are located in the drive partitions such that no more than one member of a protection group is located on the same drive.
- Protection groups 1 - 4 represent data and/or parity members.
- the protection group members and spare capacity partitions S are distributed within the drive subset in a manner that facilitates failover and scaling as will be described below.
- FIG. 4 is a matrix representation of the drive subset of FIG. 3 .
- Rows 1-5 in the matrix represent drives D 1 -D 5 and columns 1-4 in the matrix represent partitions P 1 -P 4 .
- FIGS. 5 and 6 illustrate use of spare partitions in response to drive failure in the drive subset represented by FIG. 4 .
- drive 1 fails or is failing so the protection group members 4 , 1 , 2 , 3 on partitions 1 - 4 of drive 1 must be relocated or rebuilt.
- the protection group members are relocated or rebuilt in the spare partitions S such that no more than one member of a protection group is located on the same drive.
- the distributions of the protection group members and spare partitions assures that at least one solution is available such no more than one member of a protection group is located on the same drive. If multiple solutions are available, then any one of those solutions may be implemented.
- the member of protection group 4 at partition 1 of drive 1 is relocated or rebuilt in partition 1 of drive 1
- the member of protection group 1 at partition 2 of drive 1 is relocated or rebuilt in partition 4 of drive 2
- the member of protection group 2 at partition 3 of drive 1 is relocated or rebuilt in partition 3 of drive 3
- the member of protection group 3 at partition 4 of drive 1 is relocated or rebuilt in partition 2 of drive 4 .
- the failed drive is removed from service and may eventually be replaced. Following replacement of the failed drive the relocated protection group members may be returned to their original drive and partition locations, thereby restoring the diagonally distributed spare partitions.
- FIGS. 7 and 8 illustrate selection and movement of protection group members to a first new drive added to the drive subset represented by FIG. 4 .
- the first new drive is sequentially numbered relative to existing drives 1 - 5 and is thus drive 6 .
- the new drive is formatted with the same number and size partitions as the existing drives and thus has W partitions numbered 1 - 4 .
- a new drive is populated using a rotation technique in which a vertical column of partitions is “rotated” to a horizontal row of partitions on a new drive.
- the W protection group members in the lowest numbered unrotated partition excluding the single partition protection group initially created in the lowest numbered partition, are rotated from the first W drives in ascending order to the W partitions of the new drive in ascending order.
- the protection group member on the first drive is moved to the first partition of the new drive
- the protection group member on the second drive is moved to the second partition of the new drive
- the protection group member on the third drive is moved to the third partition of the new drive, and so forth. Consequently, the drive number from which the member is moved becomes the partition number to which the member is moved.
- protection group member 1 is moved from drive 1 , partition 2 to drive 6 , partition 1
- protection group member 2 is moved from drive 2 , partition 2 to drive 6 , partition 2
- protection group member 3 is moved from drive 3 , partition 2 to drive 6 , partition 3
- the spare protection group member S is moved from drive 4 , partition 2 to drive 6 , partition 4 .
- a new spare partition S is created at partition 2 of drive 4 .
- FIG. 9 illustrates creation of a new protection group using the partitions freed by movement of existing protection group members to the first new drive.
- the new protection group is sequentially numbered relative to the existing protection groups and includes the same number of members, i.e. W.
- the new protection group is assigned the number 5 because the existing protection groups are numbered 1 - 4 .
- the members of protection group 5 are located in the partitions that were made available by the rotation of protection group members 1 , 2 , 3 in partition 2 of drives 1 - 3 .
- the rotated spare partition S at partition 4 of drive 6 is utilized for the fourth member of protection group 5 in order to maintain the original distribution of the spare partitions S.
- FIG. 10 illustrates selection and movement of protection group members to a second new drive added to the drive subset and creation of a new protection group using the partitions freed by movement of existing protection group members to the second new drive.
- the second new drive is sequentially numbered relative to existing drives 1 - 6 and is thus drive 7 .
- the W protection group members in the lowest numbered unrotated partition, excluding the single partition protection group initially created in the lowest numbered partition, are in partition 3 of drives 1 - 4 so those members and the spare 2 , 3 , S, 1 are rotated to new drive 7 .
- the new protection group is assigned the number 6 because the existing protection groups are numbered 1 - 5 .
- the members of protection group 6 are located in the partitions that were made available by the rotation of protection group members 2 , 3 , 1 in partition 3 of drives 1 , 2 and 4 .
- the rotated spare partition S at partition 3 of drive 7 is utilized for the fourth member of protection group 6 in order to maintain the original distribution of the spare partitions S.
- FIG. 11 illustrates selection and movement of protection group members to a third new drive added to the drive subset and creation of a new protection group using the partitions freed by movement of existing protection group members to the third new drive.
- the third new drive is sequentially numbered relative to existing drives 1 - 7 and is thus drive 8 .
- the W protection group members in the lowest numbered unrotated partition, excluding the single partition protection group initially created in the lowest numbered partition, are in partition 5 of drives 1 - 4 so those members and the spare 3 , S, 1 , 2 are rotated to new drive 8 .
- the new protection group is assigned the number 7 because the existing protection groups are numbered 1 - 6 .
- the members of protection group 7 are located in the partitions that were made available by the rotation of protection group members 3 , 1 , 2 in partition 4 of drives 1 , 3 and 4 .
- the rotated spare partition S at partition 2 of drive 8 is utilized for the fourth member of protection group 7 in order to maintain the original distribution of the spare partitions S.
- FIG. 12 illustrates selection and movement of protection group members to a fourth new drive added to the drive subset.
- the fourth new drive is designated as drive 9 .
- All of the partitions are excluded or already rotated so protection group members located on a diagonal starting with partition 1 of drive 9 with adjacent partitions located on incrementally decreasing drive numbers and incrementally increasing partitions are selected for rotation.
- protection group members 5 , 6 , 7 are rotated into partitions 2 , 3 , 4 of drive 9 . The rotation and resulting vacated diagonal partitions help to prepare for a split.
- FIG. 13 illustrates splitting the drive subset of FIG. 12 into two independent drive subsets 350 , 352 .
- the first drive subset 350 includes drives 1 , 2 , 3 , 4 , and 9 .
- the second drive subset 352 includes drives 5 , 6 , 7 , and 8 . Spare partitions S are created in the vacated partitions.
- the drive subsets are independent because they are managed separately, and all of the members of each protection group reside on only one drive subset.
- FIG. 14 illustrates selection and movement of protection group members of drive subset 352 to distribute spares and prepare for scaling.
- a new drive 10 is added to the drive subset.
- the W members in the lowest numbered partition of the W lowest numbered drives are rotated onto the new drive 10 .
- the spare partition S is relocated from partition 1 of drive 5 to partition 1 of drive 10 , thereby creating the distribution of spares S along the diagonal as was done in the original drive subset.
- a new protection group 8 is created in the partitions vacated due to the rotation, thereby recreating the single partition protection group, diagonally distributed spares, and symmetrically distributed protection group members of the original drive subset.
- FIG. 15 illustrates use of spares of drive subset 352 for creation of a new protection group. Rather than adding a new drive and redistributing protection group members and spares as described above the four spare partitions S are used to create a new protection group 8 . This variation may be used when the split-away drive subset does not need spare capacity and will not be scaled.
- FIG. 16 illustrates selection and movement of protection group members to a fifth new drive added to the drive subset of FIG. 12 .
- the fifth new drive is designated as drive 10 .
- W protection group members in the lowest numbered partition excluding the members of the single partition protection group initially created in the lowest numbered partition, are rotated onto the new drive.
- S, 1 , 2 , 3 in partition 1 of drives 5 - 8 are rotated to partitions 1 - 4 of new drive 10 .
- a spare partition S is created at partition 1 of drive 5 to maintain the original diagonally distributed spares. are in partition 5 of drives 1 - 4 so those members and the spare 3 , S, 1 , 2 are rotated to new drive 8 .
- the new protection group is assigned the number 7 because the existing protection groups are numbered 1 - 6 .
- the members of protection group 7 are located in the partitions that were made available by the rotation of protection group members 3 , 1 , 2 in partition 4 of drives 1 , 3 and 4 .
- the rotated spare partition S at partition 2 of drive 8 is utilized for the fourth member of protection group 7 in order to maintain the original distribution of the spare partitions S.
- a new protection group 8 may be created using the vacated locations at partition 1 of drives 5 - 8 .
- FIG. 17 illustrates splitting the drive subset of FIG. 16 into two independent drive subsets 360 , 362 with distributed spare capacity.
- Drive subset 360 is created using drives 1 , 2 , 3 , 4 , and 9 .
- Drive subset 362 is created using drives 5 , 6 , 7 , 8 , and 10 .
- Drive subset 362 is pre-configured for scaling and use of spare capacity because a single partition protection group is located in partition 1 , spare partitions S are distributed along a diagonal, and the other protection group members are symmetrically distributed. In other words, drive subset 362 is organized in the same way as the original drive subset.
- Drive subset 360 may be maintained in the illustrated form, reconfigured in the same way as the original drive subset, or used to create an additional protection group by using the spare partitions.
- FIG. 18 illustrates use of spare capacity from a first drive subset for rebuilding protection group members from a second drive subset in the event of drive failure.
- Drive subset 364 is organized in the same way as the original drive subset including W spare partitions.
- Drive subset 366 has no spare partitions. This situation may occur when a drive subset is split and only one of the split-away drive subsets includes spares, e.g. because there were no spares at the time of the split or because spares were used to create a new protection group.
- the protection group members are rebuilt using the spare partitions of the drive subset 364 that has spare capacity. This results in distribution of members of protection groups on multiple drive subsets, e.g. members of protection group 8 are on drive subset 364 and drive subset 366 . However, when the failed drive 5 is replaced the protection group members in the spare partitions S of drive subset 364 are relocated to the replaced drive 5 in drive subset 366 .
- FIG. 19 illustrates steps associated with creation of an initial drive subset of (W+1) drives on which RAID (D+P) protections groups are maintained.
- Step 400 is creating W partitions on each drive of a drive subset of (W+1) drives.
- Step 402 is creating one protection group in the first partition index of the drive subset. The drives and partitions in the drive subset are sequentially ordered, e.g. using numbers. The first partition may be the lowest numbered partition.
- the protection group which includes W members, is created in lowest numbered partition of the W lowest numbered drives.
- Step 406 is creating additional sequentially numbered protection groups. Additional protection groups are created, and their members symmetrically distributed in the remaining free partitions such that the RAID member at drive X partition index Y belongs to RAID group N where:
- FIG. 20 illustrates steps associated with creation and distribution of spare capacity associated with scalable drive subsets on which protection groups are maintained such as the drive subset created using the steps shown in FIG. 19 .
- Step 408 is adding a new drive to the drive subset.
- Step 410 is rotating the first W protection group members in the lowest numbered unrotated partition. The single partition protection group initially created in the lowest numbered partition is excluded from consideration.
- the selected protection group members, including spares are rotated from the first W drives in ascending order to the W partitions of the new drive in ascending order. For example, for the first new drive the protection group member on the first drive is moved to the first partition of the first new drive, the protection group member on the second drive is moved to the second partition of the first new drive, etc.
- Step 412 is creating a new protection group using the partitions that were vacated due to rotation of protection group members.
- a new spare partition is created in place of a rotated spare in order to maintain the desired distribution of spare capacity, so a member of the new protection group is not located in place of the vacated spare. Rather, that extra new protection group member is located in the rotated spare, i.e. the spare location on the new drive. If there are enough drives in the drive subset for a split as determined in step 414 then the drive subset is split into two drive subsets as indicated in step 416 . Otherwise steps 408 through 414 may be iterated, e.g.
- the number of drives required for a split is in part a design decision. For example, an original drive subset with 2*W drives could be split into two drive subsets of W drives or the original drive subset may be maintained until there are 2*W+1 drives or 2*(D+P+1) drives, depending on whether maintenance of spare capacity in one or both split-away drive subsets is desired.
- the drive subset may be split-away such that one of the new drive subsets is preconfigured with a single partition protection group, diagonally distributed spares, and symmetrically distributed protection group members or reconfigured after being split away.
- a split-away drive subset may be scaled and split by repeating steps 408 through 414 .
- one or more new protection groups may be created using spare partitions as indicated in step 418 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
- The subject matter of this disclosure is generally related to electronic data storage and more particularly to providing scalable drive subsets with protection groups and spare capacity.
- Protection groups help to avoid data loss by enabling a failing or failed protection group member to be reconstructed. Individual disk drives are protection group members in a typical data storage system, e.g. members of a redundant array of independent drives (RAID) protection group. A RAID (D+P) protection group has D data members and P parity members. The data members store data. The parity members store parity information such as XORs of data values. The parity information enables reconstruction of data in the event that a data member fails. Parity information can be reconstructed from the data on the data members in the event that a parity member fails. A failed protection group member is typically reconstructed on a spare drive.
- It is sometimes necessary to increase the total storage capacity of a data storage system. For example, storage capacity may be increased when existing storage capacity becomes fully utilized. The storage capacity of a data storage system that uses individual drives as protection group members is increased by adding a new protection group, i.e. (W+1) drives for a RAID (D+P) protection group and spare drive where W=(D+P). A storage system that implements RAID-5 (4+1), for example, may be scaled-up in increments of five new drives plus one spare drive. Similarly, a RAID-5 (3+1) may be scaled-up in increments of four new drives plus one spare drive. One drawback of scaling storage capacity in increments of (W+1) new drives is that it may introduce excess storage capacity that will not be utilized within a reasonable timeframe. This drawback is becoming more troublesome as the storage capacity of individual drives increases due to technological advancements. More specifically, as the storage capacity and cost of drives increases, the amount of excess storage capacity and cost associated with adding W+1 drives to a storage system also increases, particularly for larger values of W.
- All examples, aspects and features mentioned in this document can be combined in any technically possible way.
- In accordance with some implementations a method of creating and distributing spare capacity on a scalable drive subset on which protection groups are maintained comprises: creating W=(D+P) partitions that are equal in size and number on W+1 drives, wherein the partitions are sequentially ordered, and the drives are sequentially ordered; creating a first vertical protection group that has D data members and P parity members in one partition of W of the drives; creating and distributing W spares at values of drive X, partition Y that satisfy (X+Y)=(W+2); and symmetrically distributing members of additional protection groups with W members on remaining partitions; whereby the spares are distributed such that all protection group members on a failed one of the drives can be relocated to ones of the spares such that no more than one member of any of the protection groups is located on a single one of the drives.
- In accordance with some implementations an apparatus comprises: a plurality of non-volatile drives; a plurality of interconnected compute nodes that manage access to the drives; and a drive manager configured to: create W=(D+P) partitions that are equal in size and number on (W+1) drives, wherein the partitions are sequentially ordered, and the drives are sequentially ordered; create a first vertical protection group that has D data members and P parity members in one partition of W of the drives; create and distribute W spares at values of drive X, partition Y that satisfy (X+Y)=(W+2); and symmetrically distribute members of additional protection groups with W members on remaining partitions; whereby the spares are distributed such that all protection group members on a failed one of the drives can be relocated to ones of the spares such that no more than one member of any of the protection groups is located on a single one of the drives.
- In accordance with some implementations a computer-readable storage medium stores instructions that when executed by a computer cause the computer to perform a method for using a computer system to create and distribute spare capacity on a scalable drive subset on which protection groups are maintained, the method comprising: creating W=(D+P) partitions that are equal in size and number on (W+1) drives, wherein the partitions are sequentially ordered, and the drives are sequentially ordered; creating a first vertical protection group that has D data members and P parity members in one partition of W of the drives; creating and distributing W spares at values of drive X, partition Y that satisfy (X+Y)=(W+2); and symmetrically distributing members of additional protection groups with W members on remaining partitions; whereby the spares are distributed such that all protection group members on a failed one of the drives can be relocated to ones of the spares such that no more than one member of any of the protection groups is located on a single one of the drives.
-
FIG. 1 illustrates a storage array with a drive manager configured to create and distribute spare capacity associated with scalable drive subsets on which protection groups are maintained. -
FIG. 2 illustrates layers of abstraction between the managed drives and the production volume of the storage array ofFIG. 1 . -
FIG. 3 illustrates creation of an initial drive subset of (W+1) drives on which RAID (D+P) protections groups are maintained. -
FIG. 4 is a matrix representation of the drive subset ofFIG. 3 . -
FIGS. 5 and 6 illustrate use of spare partitions in response to drive failure in the drive subset represented byFIG. 4 . -
FIGS. 7 and 8 illustrate selection and movement of protection group members to a first new drive added to the drive subset represented byFIG. 4 . -
FIG. 9 illustrates creation of a new protection group using the partitions freed by movement of existing protection group members to the first new drive. -
FIG. 10 illustrates selection and movement of protection group members to a second new drive added to the drive subset and creation of a new protection group using the partitions freed by movement of existing protection group members to the second new drive. -
FIG. 11 illustrates selection and movement of protection group members to a third new drive added to the drive subset and creation of a new protection group using the partitions freed by movement of existing protection group members to the third new drive. -
FIG. 12 illustrates selection and movement of protection group members to a fourth new drive added to the drive subset. -
FIG. 13 illustrates splitting the drive subset ofFIG. 12 into two independent drive subsets. -
FIG. 14 illustrates selection and movement of protection group members to distribute spares and configure one of the split-away drive subsets for scaling. -
FIG. 15 illustrates use of spares of one of the split-away drive subsets for creation of a new protection group. -
FIG. 16 illustrates selection and movement of protection group members to a fifth new drive added to the drive subset ofFIG. 12 . -
FIG. 17 illustrates splitting the drive subset ofFIG. 16 into two independent drive subsets with distributed spare capacity. -
FIG. 18 illustrates use of spare capacity from a first drive subset for rebuilding protection group members from a second drive subset in the event of drive failure. -
FIG. 19 illustrates steps associated with creation of an initial drive subset of (W+1) drives on which RAID (D+P) protections groups are maintained. -
FIG. 20 illustrates steps associated with creation and distribution of spare capacity associated with scalable drive subsets on which protection groups are maintained. - The terminology used in this disclosure is intended to be interpreted broadly within the limits of subject matter eligibility. The terms “disk” and “drive” are used interchangeably herein and are not intended to refer to any specific type of non-volatile storage media. The terms “logical” and “virtual” are used to refer to features that are abstractions of other features, e.g. and without limitation abstractions of tangible features. The term “physical” is used to refer to tangible features that possibly include, but are not limited to, electronic hardware. For example, multiple virtual computers could operate simultaneously on one physical computer. The term “logic,” if used herein, refers to special purpose physical circuit elements, firmware, software, computer instructions that are stored on a non-transitory computer-readable medium and implemented by multi-purpose tangible processors, alone or in any combination. Aspects of the inventive concepts are described as being implemented in a data storage system that includes host servers and a storage array. Such implementations should not be viewed as limiting. Those of ordinary skill in the art will recognize that there are a wide variety of implementations of the inventive concepts in view of the teachings of the present disclosure.
- Some aspects, features, and implementations described herein may include machines such as computers, electronic components, optical components, and processes such as computer-implemented procedures and steps. It will be apparent to those of ordinary skill in the art that the computer-implemented procedures and steps may be stored as computer-executable instructions on a non-transitory computer-readable medium. Furthermore, it will be understood by those of ordinary skill in the art that the computer-executable instructions may be executed on a variety of tangible processor devices, i.e. physical hardware. For practical reasons, not every step, device, and component that may be part of a computer or data storage system is described herein. Those of ordinary skill in the art will recognize such steps, devices, and components in view of the teachings of the present disclosure and the knowledge generally available to those of ordinary skill in the art. The corresponding machines and processes are therefore enabled and within the scope of the disclosure.
-
FIG. 1 illustrates a storage array with a drive manager configured to create and distribute spare capacity associated with scalable drive subsets on which protection groups are maintained. Each drive subset managed by thedrive manager 102 is scalable in single drive increments and can be split into multiple drive subsets when enough drives have been added. Thestorage array 100 is one example of a storage area network (SAN), which is one example of a data storage system in which the drive manager could be implemented. Thestorage array 100 is depicted in a simplified data center environment supporting twonetwork server hosts 103 that run host applications. Thehosts 103 include volatile memory, non-volatile storage, and one or more tangible processors. Thestorage array 100 includes one ormore bricks 104. Each brick includes anengine 106 and one or more drive array enclosures (DAEs) 108. Eachengine 106 includes a pair ofinterconnected compute nodes compute nodes multi-core processor 116 andlocal memory 118. The processor may include central processing units (CPUs), graphics processing units (GPUs), or both. Thelocal memory 118 may include volatile media such as dynamic random-access memory (DRAM), non-volatile memory (NVM) such as storage class memory (SCM), or both. Each compute node includes one or more host adapters (HAs) 120 for communicating with thehosts 103. Each host adapter has resources for servicing input-output commands (IOs) from the hosts. The host adapter resources may include processors, volatile memory, and ports via which the hosts may access the storage array. Each compute node also includes a remote adapter (RA) 121 for communicating with other storage systems. Each compute node also includes one or more drive adapters (DAs) 128 for communicating with manageddrives 101 in theDAEs 108. Each drive adapter has processors, volatile memory, and ports via which the compute node may access the DAEs for servicing IOs. Each compute node may also include one or more channel adapters (CAs) 122 for communicating with other compute nodes via an interconnectingfabric 124. The managed drives 101 include non-volatile storage media such as, without limitation, solid-state drives (SSDs) based on electrically erasable programmable read-only memory (EEPROM) technology such as NAND and NOR flash memory and hard disk drives (HDDs) with spinning disk magnetic storage media. Drive controllers may be associated with the managed drives as is known in the art. An interconnectingfabric 130 enables implementation of an N-way active-active backend. A backend connection group includes all drive adapters that can access the same drive or drives. In some implementations everydrive adapter 128 in the storage array can reach every DAE via thefabric 130. Further, in some implementations every drive adapter in the storage array can access every manageddrive 101. - Data associated with the hosted application instances running on the
hosts 103 is maintained on the managed drives 101. The managed drives 101 are not discoverable by the hosts but the storage array creates a logical storage device referred to herein as aproduction volume 140 that can be discovered and accessed by the hosts. Without limitation, the production volume may also be referred to as a storage object, source device, production device, or production LUN, where the logical unit number (LUN) is a number used to identify logical storage volumes in accordance with the small computer system interface (SCSI) protocol. From the perspective of thehosts 103, theproduction volume 140 is a single drive having a set of contiguous fixed-size logical block addresses (LBAs) on which data used by the instances of the host application resides. However, the host application data is stored at non-contiguous addresses on various managed drives 101. The compute nodes maintain metadata that maps between theproduction volume 140 and the managed drives 101 in order to process IOs from the hosts. -
FIG. 2 illustrates layers of abstraction between the managed drives 101 and theproduction volume 140. The smallest unit of storage capacity that can be processed by a manageddrive 101 is a sector. Different types of managed drives may be characterized by different sector sizes but for context and without limitation the sector size of all managed drives may be 2 KB. IOs between the compute nodes and the managed drives may be in larger allocation units such as 128 KB tracks that are a fixed size that may be an integer multiple of the sector size. For example, an IO may read or write the sectors of a track. The managed drives 101 are each organized intopartitions 201 of equal storage capacity, i.e. every partition has the same fixed size. Selection of partition storage capacity is a design implementation and, for context and without limitation, may be some fraction or percentage of the capacity of a managed drive equal to an integer multiple of sectors greater than 1. Each partition may include a contiguous range of logical addresses. Groups of partitions that include partitions from different managed drives are used to createRAID protection groups 207. The RAID protection groups are distributed on data devices (TDATs) 203. Astorage resource pool 205, also known as a “data pool” or “thin pool,” is a collection ofTDATs 203 of the same emulation and RAID protection group type, e.g. RAID-5. In some implementations all TDATs in a drive group are of a single RAID protection group type and all have the same size (storage capacity). Logical thin devices (TDEVs) 219 are created using TDATs. The TDATs and TDEVs are accessed using tracks as the allocation unit.Multiple TDEVs 219 are organized into astorage group 225. Theproduction volume 140 is created from asingle storage group 225. Host application data, which is stored in blocks on theproduction volume 140, is mapped to tracks of the TDEVs, which map to sectors of the managed drives. Regardless of the specific allocation unit capacities selected, a track is larger than both the sectors and the fixed size blocks used in communications between the storage array and the hosts to access the production volume. -
FIG. 3 illustrates an implementation of RAID (D+P) protections groups on asubset 300 of the managed drives 101 (FIG. 1 ). In the present disclosure W=(D+P) by definition. The storage array includes multiple drive subsets such asdrive subset 300, each of which is created with W or (W+1) drives and W partitions. Using (W+1) drives provides spare capacity for use in response to drive failure. RAID (3+1) is implemented in the illustrated example and the drive subset includes (W+1) drives so there are five drives D1-D5 and four partition indexes P1-P4. For purposes of explanation all of the drives have the same storage capacity and all of the partitions have the same fixed size in terms of storage capacity. In accordance with RAID requirements, protection group members are located in the drive partitions such that no more than one member of a protection group is located on the same drive. Protection groups 1-4 represent data and/or parity members. The protection group members and spare capacity partitions S are distributed within the drive subset in a manner that facilitates failover and scaling as will be described below. -
FIG. 4 is a matrix representation of the drive subset ofFIG. 3 . Rows 1-5 in the matrix represent drives D1-D5 and columns 1-4 in the matrix represent partitions P1-P4. Initially, asingle protection group 4 is created in the lowest numbered partition index of the W lowest numbered drives. In the illustrated example W=4 soprotection group 4 members are located inpartition 1 of drives 1-4. Spares at drive X partition index Y satisfy the equation X+Y=W+2. Additional protection groups are created, and their members symmetrically distributed in the remaining free partitions such that the RAID member at drive X partition index Y belongs to RAID group N where: -
- a. If (X+Y)<(W+2), then N=(X+Y−2); and
- b. If (X+Y)>(W+2), then N=(X+Y−W−2).
In the illustrated example W=4 and a spare partition S is located at partition index P1 of drive D1 because (X+Y)=(5+1)=(4+2). A member ofprotection group 2 is located atpartition index 3 of drive D1 because (X+Y)=(1+3)=(2+2)=(N+2). The resulting distribution of spare partitions S is along a diagonal of the matrix with adjacent spare partitions located on incrementally decreasing drive numbers and incrementally increasing partitions. Apart from the singlepartition protection group 4, the protection groups are symmetrically distributed. In contrast, it is typical in previous designs for protection group members to be located on single partitions and all spare capacity to be on a spare drive.
-
FIGS. 5 and 6 illustrate use of spare partitions in response to drive failure in the drive subset represented byFIG. 4 . In the illustratedexample drive 1 fails or is failing so theprotection group members drive 1 must be relocated or rebuilt. The protection group members are relocated or rebuilt in the spare partitions S such that no more than one member of a protection group is located on the same drive. The distributions of the protection group members and spare partitions assures that at least one solution is available such no more than one member of a protection group is located on the same drive. If multiple solutions are available, then any one of those solutions may be implemented. In the illustrated example the member ofprotection group 4 atpartition 1 ofdrive 1 is relocated or rebuilt inpartition 1 ofdrive 1, the member ofprotection group 1 atpartition 2 ofdrive 1 is relocated or rebuilt inpartition 4 ofdrive 2, the member ofprotection group 2 atpartition 3 ofdrive 1 is relocated or rebuilt inpartition 3 ofdrive 3, and the member ofprotection group 3 atpartition 4 ofdrive 1 is relocated or rebuilt inpartition 2 ofdrive 4. The failed drive is removed from service and may eventually be replaced. Following replacement of the failed drive the relocated protection group members may be returned to their original drive and partition locations, thereby restoring the diagonally distributed spare partitions. -
FIGS. 7 and 8 illustrate selection and movement of protection group members to a first new drive added to the drive subset represented byFIG. 4 . The first new drive is sequentially numbered relative to existing drives 1-5 and is thus drive 6. The new drive is formatted with the same number and size partitions as the existing drives and thus has W partitions numbered 1-4. A new drive is populated using a rotation technique in which a vertical column of partitions is “rotated” to a horizontal row of partitions on a new drive. In the illustrated example the W protection group members in the lowest numbered unrotated partition, excluding the single partition protection group initially created in the lowest numbered partition, are rotated from the first W drives in ascending order to the W partitions of the new drive in ascending order. For example, the protection group member on the first drive is moved to the first partition of the new drive, the protection group member on the second drive is moved to the second partition of the new drive, the protection group member on the third drive is moved to the third partition of the new drive, and so forth. Consequently, the drive number from which the member is moved becomes the partition number to which the member is moved. In the illustrated example W=4,protection group 4 is excluded, and none of the partitions have been rotated sopartition 2 of drives 1-4 is rotated ontonew drive 6. Specifically,protection group member 1 is moved fromdrive 1,partition 2 to drive 6,partition 1,protection group member 2 is moved fromdrive 2,partition 2 to drive 6,partition 2,protection group member 3 is moved fromdrive 3,partition 2 to drive 6,partition 3, and the spare protection group member S is moved fromdrive 4,partition 2 to drive 6,partition 4. In order to maintain the original distribution of spares a new spare partition S is created atpartition 2 ofdrive 4. -
FIG. 9 illustrates creation of a new protection group using the partitions freed by movement of existing protection group members to the first new drive. The new protection group is sequentially numbered relative to the existing protection groups and includes the same number of members, i.e. W. In the illustrated example the new protection group is assigned thenumber 5 because the existing protection groups are numbered 1-4. The members ofprotection group 5 are located in the partitions that were made available by the rotation ofprotection group members partition 2 of drives 1-3. The rotated spare partition S atpartition 4 ofdrive 6 is utilized for the fourth member ofprotection group 5 in order to maintain the original distribution of the spare partitions S. -
FIG. 10 illustrates selection and movement of protection group members to a second new drive added to the drive subset and creation of a new protection group using the partitions freed by movement of existing protection group members to the second new drive. The second new drive is sequentially numbered relative to existing drives 1-6 and is thus drive 7. The W protection group members in the lowest numbered unrotated partition, excluding the single partition protection group initially created in the lowest numbered partition, are inpartition 3 of drives 1-4 so those members and the spare 2, 3, S, 1 are rotated tonew drive 7. The new protection group is assigned thenumber 6 because the existing protection groups are numbered 1-5. The members ofprotection group 6 are located in the partitions that were made available by the rotation ofprotection group members partition 3 ofdrives partition 3 ofdrive 7 is utilized for the fourth member ofprotection group 6 in order to maintain the original distribution of the spare partitions S. -
FIG. 11 illustrates selection and movement of protection group members to a third new drive added to the drive subset and creation of a new protection group using the partitions freed by movement of existing protection group members to the third new drive. The third new drive is sequentially numbered relative to existing drives 1-7 and is thus drive 8. The W protection group members in the lowest numbered unrotated partition, excluding the single partition protection group initially created in the lowest numbered partition, are inpartition 5 of drives 1-4 so those members and the spare 3, S, 1, 2 are rotated tonew drive 8. The new protection group is assigned thenumber 7 because the existing protection groups are numbered 1-6. The members ofprotection group 7 are located in the partitions that were made available by the rotation ofprotection group members partition 4 ofdrives partition 2 ofdrive 8 is utilized for the fourth member ofprotection group 7 in order to maintain the original distribution of the spare partitions S. -
FIG. 12 illustrates selection and movement of protection group members to a fourth new drive added to the drive subset. The fourth new drive is designated asdrive 9. All of the partitions are excluded or already rotated so protection group members located on a diagonal starting withpartition 1 ofdrive 9 with adjacent partitions located on incrementally decreasing drive numbers and incrementally increasing partitions are selected for rotation. In the illustrated exampleprotection group members partitions drive 9. The rotation and resulting vacated diagonal partitions help to prepare for a split. -
FIG. 13 illustrates splitting the drive subset ofFIG. 12 into twoindependent drive subsets first drive subset 350 includesdrives second drive subset 352 includesdrives -
FIG. 14 illustrates selection and movement of protection group members ofdrive subset 352 to distribute spares and prepare for scaling. Anew drive 10 is added to the drive subset. The W members in the lowest numbered partition of the W lowest numbered drives are rotated onto thenew drive 10. As a result of rotation, the spare partition S is relocated frompartition 1 ofdrive 5 to partition 1 ofdrive 10, thereby creating the distribution of spares S along the diagonal as was done in the original drive subset. Anew protection group 8 is created in the partitions vacated due to the rotation, thereby recreating the single partition protection group, diagonally distributed spares, and symmetrically distributed protection group members of the original drive subset. -
FIG. 15 illustrates use of spares ofdrive subset 352 for creation of a new protection group. Rather than adding a new drive and redistributing protection group members and spares as described above the four spare partitions S are used to create anew protection group 8. This variation may be used when the split-away drive subset does not need spare capacity and will not be scaled. -
FIG. 16 illustrates selection and movement of protection group members to a fifth new drive added to the drive subset ofFIG. 12 . The fifth new drive is designated asdrive 10. W protection group members in the lowest numbered partition, excluding the members of the single partition protection group initially created in the lowest numbered partition, are rotated onto the new drive. In the illustrated example S, 1, 2, 3 inpartition 1 of drives 5-8 are rotated to partitions 1-4 ofnew drive 10. A spare partition S is created atpartition 1 ofdrive 5 to maintain the original diagonally distributed spares. are inpartition 5 of drives 1-4 so those members and the spare 3, S, 1, 2 are rotated tonew drive 8. The new protection group is assigned thenumber 7 because the existing protection groups are numbered 1-6. The members ofprotection group 7 are located in the partitions that were made available by the rotation ofprotection group members partition 4 ofdrives partition 2 ofdrive 8 is utilized for the fourth member ofprotection group 7 in order to maintain the original distribution of the spare partitions S. Anew protection group 8 may be created using the vacated locations atpartition 1 of drives 5-8. -
FIG. 17 illustrates splitting the drive subset ofFIG. 16 into twoindependent drive subsets Drive subset 360 is created usingdrives Drive subset 362 is created usingdrives Drive subset 362 is pre-configured for scaling and use of spare capacity because a single partition protection group is located inpartition 1, spare partitions S are distributed along a diagonal, and the other protection group members are symmetrically distributed. In other words, drivesubset 362 is organized in the same way as the original drive subset.Drive subset 360 may be maintained in the illustrated form, reconfigured in the same way as the original drive subset, or used to create an additional protection group by using the spare partitions. -
FIG. 18 illustrates use of spare capacity from a first drive subset for rebuilding protection group members from a second drive subset in the event of drive failure.Drive subset 364 is organized in the same way as the original drive subset including W spare partitions.Drive subset 366 has no spare partitions. This situation may occur when a drive subset is split and only one of the split-away drive subsets includes spares, e.g. because there were no spares at the time of the split or because spares were used to create a new protection group. When one of the drives of thesubset 366 that lacks spare capacity fails, the protection group members are rebuilt using the spare partitions of thedrive subset 364 that has spare capacity. This results in distribution of members of protection groups on multiple drive subsets, e.g. members ofprotection group 8 are ondrive subset 364 and drivesubset 366. However, when the faileddrive 5 is replaced the protection group members in the spare partitions S ofdrive subset 364 are relocated to the replaceddrive 5 indrive subset 366. -
FIG. 19 illustrates steps associated with creation of an initial drive subset of (W+1) drives on which RAID (D+P) protections groups are maintained. Step 400 is creating W partitions on each drive of a drive subset of (W+1) drives. Step 402 is creating one protection group in the first partition index of the drive subset. The drives and partitions in the drive subset are sequentially ordered, e.g. using numbers. The first partition may be the lowest numbered partition. The protection group, which includes W members, is created in lowest numbered partition of the W lowest numbered drives. Step 404 is creating distributed spares. Spares at drive X partition index Y satisfy the equation X+Y=W+2. The result is a diagonal distribution of spare partitions. Step 406 is creating additional sequentially numbered protection groups. Additional protection groups are created, and their members symmetrically distributed in the remaining free partitions such that the RAID member at drive X partition index Y belongs to RAID group N where: -
- a. If (X+Y)<(W+2), then N=(X+Y−2); and
- b. If (X+Y)>(W+2), then N=(X+Y−W−2). The resulting drive subset is configured for scaling and use of spare capacity.
-
FIG. 20 illustrates steps associated with creation and distribution of spare capacity associated with scalable drive subsets on which protection groups are maintained such as the drive subset created using the steps shown inFIG. 19 . Step 408 is adding a new drive to the drive subset. Step 410 is rotating the first W protection group members in the lowest numbered unrotated partition. The single partition protection group initially created in the lowest numbered partition is excluded from consideration. The selected protection group members, including spares, are rotated from the first W drives in ascending order to the W partitions of the new drive in ascending order. For example, for the first new drive the protection group member on the first drive is moved to the first partition of the first new drive, the protection group member on the second drive is moved to the second partition of the first new drive, etc. Consequently, the drive number from which the member is moved becomes the partition number to which the member is moved. Step 412 is creating a new protection group using the partitions that were vacated due to rotation of protection group members. A new spare partition is created in place of a rotated spare in order to maintain the desired distribution of spare capacity, so a member of the new protection group is not located in place of the vacated spare. Rather, that extra new protection group member is located in the rotated spare, i.e. the spare location on the new drive. If there are enough drives in the drive subset for a split as determined instep 414 then the drive subset is split into two drive subsets as indicated instep 416. Otherwise steps 408 through 414 may be iterated, e.g. scaling the drive subset in single drive increments until a split becomes possible. The number of drives required for a split is in part a design decision. For example, an original drive subset with 2*W drives could be split into two drive subsets of W drives or the original drive subset may be maintained until there are 2*W+1 drives or 2*(D+P+1) drives, depending on whether maintenance of spare capacity in one or both split-away drive subsets is desired. Further, the drive subset may be split-away such that one of the new drive subsets is preconfigured with a single partition protection group, diagonally distributed spares, and symmetrically distributed protection group members or reconfigured after being split away. When suitably configured a split-away drive subset may be scaled and split by repeatingsteps 408 through 414. Optionally, one or more new protection groups may be created using spare partitions as indicated instep 418. - Specific examples have been presented to provide context and convey inventive concepts. The specific examples are not to be considered as limiting. A wide variety of modifications may be made without departing from the scope of the inventive concepts described herein. Moreover, the features, aspects, and implementations described herein may be combined in any technically possible way. Accordingly, modifications and combinations are within the scope of the following claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/061,922 US11314608B1 (en) | 2020-10-02 | 2020-10-02 | Creating and distributing spare capacity of a disk array |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/061,922 US11314608B1 (en) | 2020-10-02 | 2020-10-02 | Creating and distributing spare capacity of a disk array |
Publications (2)
Publication Number | Publication Date |
---|---|
US20220107871A1 true US20220107871A1 (en) | 2022-04-07 |
US11314608B1 US11314608B1 (en) | 2022-04-26 |
Family
ID=80932281
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/061,922 Active US11314608B1 (en) | 2020-10-02 | 2020-10-02 | Creating and distributing spare capacity of a disk array |
Country Status (1)
Country | Link |
---|---|
US (1) | US11314608B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230027532A1 (en) * | 2021-07-22 | 2023-01-26 | EMC IP Holding Company LLC | Expanding raid systems |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11860746B2 (en) * | 2021-08-10 | 2024-01-02 | Dell Products L.P. | Resilient data storage system with efficient space management |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6993701B2 (en) * | 2001-12-28 | 2006-01-31 | Network Appliance, Inc. | Row-diagonal parity technique for enabling efficient recovery from double failures in a storage array |
JP4215606B2 (en) * | 2003-09-24 | 2009-01-28 | 日本電気株式会社 | Disk array device, storage capacity expansion method and program |
US7302522B2 (en) * | 2004-12-27 | 2007-11-27 | Lsi Corporation | Optimizing I/O performance in a RAID subsystem using an adaptive maximum request size for a logical drive |
US9009405B2 (en) * | 2012-04-30 | 2015-04-14 | Lsi Corporation | Methods and systems for instantaneous online capacity expansion |
US9471259B2 (en) * | 2014-01-28 | 2016-10-18 | Netapp, Inc. | Shared storage architecture |
US10540103B1 (en) * | 2017-07-31 | 2020-01-21 | EMC IP Holding Company LLC | Storage device group split technique for extent pool with hybrid capacity storage devices system and method |
CN112615917B (en) * | 2017-12-26 | 2024-04-12 | 华为技术有限公司 | Storage device management method in storage system and storage system |
CN112714910B (en) * | 2018-12-22 | 2022-12-27 | 华为云计算技术有限公司 | Distributed storage system and computer program product |
US10860210B2 (en) * | 2019-03-25 | 2020-12-08 | EMC IP Holding Company LLC | Division raid for disk array expansion |
CN112748858B (en) * | 2019-10-30 | 2024-04-19 | 伊姆西Ip控股有限责任公司 | Method, electronic device and computer program product for managing disk arrays |
US11650737B2 (en) * | 2019-11-26 | 2023-05-16 | International Business Machines Corporation | Disk offset-distance awareness data placement for storage system data protection |
US11144396B1 (en) * | 2021-01-27 | 2021-10-12 | Dell Products L.P. | Raid reliability with a provisional spare disk |
-
2020
- 2020-10-02 US US17/061,922 patent/US11314608B1/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230027532A1 (en) * | 2021-07-22 | 2023-01-26 | EMC IP Holding Company LLC | Expanding raid systems |
US11775182B2 (en) * | 2021-07-22 | 2023-10-03 | EMC IP Holding Company LLC | Expanding raid systems |
Also Published As
Publication number | Publication date |
---|---|
US11314608B1 (en) | 2022-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11449226B2 (en) | Reorganizing disks and raid members to split a disk array during capacity expansion | |
US10318169B2 (en) | Load balancing of I/O by moving logical unit (LUN) slices between non-volatile storage represented by different rotation groups of RAID (Redundant Array of Independent Disks) extent entries in a RAID extent table of a mapped RAID data storage system | |
US10073621B1 (en) | Managing storage device mappings in storage systems | |
US11144396B1 (en) | Raid reliability with a provisional spare disk | |
US8839028B1 (en) | Managing data availability in storage systems | |
US9851909B2 (en) | Intelligent data placement | |
US11340789B2 (en) | Predictive redistribution of capacity in a flexible RAID system | |
US11327668B1 (en) | Predictable member assignment for expanding flexible raid system | |
US8838889B2 (en) | Method of allocating raid group members in a mass storage system | |
US11256447B1 (en) | Multi-BCRC raid protection for CKD | |
US11314608B1 (en) | Creating and distributing spare capacity of a disk array | |
US11507287B1 (en) | Adding single disks to an array by relocating raid members | |
US11474901B2 (en) | Reliable RAID system with embedded spare capacity and flexible growth | |
US11740816B1 (en) | Initial cache segmentation recommendation engine using customer-specific historical workload analysis | |
US11526447B1 (en) | Destaging multiple cache slots in a single back-end track in a RAID subsystem | |
US10977130B2 (en) | Method, apparatus and computer program product for managing raid storage in data storage systems | |
US11983414B2 (en) | Successive raid distribution for single disk expansion with efficient and balanced spare capacity | |
US11327666B2 (en) | RAID member distribution for granular disk array growth | |
US11868637B2 (en) | Flexible raid sparing using disk splits | |
US11874748B2 (en) | Storage host retirement and rollback | |
US11860746B2 (en) | Resilient data storage system with efficient space management | |
US11256428B2 (en) | Scaling raid-based storage by redistributing splits | |
US11630596B1 (en) | Sharing spare capacity of disks with multiple sizes to parallelize RAID rebuild | |
US11403022B2 (en) | Growing and splitting a disk array by moving RAID group members | |
US11144413B2 (en) | Cluster member transfer for raid system expansion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: EMC IP HOLDING COMPANY LLC, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HUA, KUOLIN;GAO, KUNXIU;REEL/FRAME:053960/0695 Effective date: 20200909 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, NORTH CAROLINA Free format text: SECURITY AGREEMENT;ASSIGNORS:EMC IP HOLDING COMPANY LLC;DELL PRODUCTS L.P.;REEL/FRAME:054591/0471 Effective date: 20201112 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:EMC IP HOLDING COMPANY LLC;DELL PRODUCTS L.P.;REEL/FRAME:054475/0523 Effective date: 20201113 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:EMC IP HOLDING COMPANY LLC;DELL PRODUCTS L.P.;REEL/FRAME:054475/0609 Effective date: 20201113 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:EMC IP HOLDING COMPANY LLC;DELL PRODUCTS L.P.;REEL/FRAME:054475/0434 Effective date: 20201113 |
|
AS | Assignment |
Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST AT REEL 054591 FRAME 0471;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058001/0463 Effective date: 20211101 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST AT REEL 054591 FRAME 0471;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058001/0463 Effective date: 20211101 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (054475/0609);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062021/0570 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (054475/0609);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:062021/0570 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (054475/0434);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060332/0740 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (054475/0434);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060332/0740 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (054475/0523);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060332/0664 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (054475/0523);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060332/0664 Effective date: 20220329 |