US12008271B1 - Adaptive raid width and distribution for flexible storage - Google Patents
Adaptive raid width and distribution for flexible storage Download PDFInfo
- Publication number
- US12008271B1 US12008271B1 US18/302,837 US202318302837A US12008271B1 US 12008271 B1 US12008271 B1 US 12008271B1 US 202318302837 A US202318302837 A US 202318302837A US 12008271 B1 US12008271 B1 US 12008271B1
- Authority
- US
- United States
- Prior art keywords
- distribution pattern
- protection groups
- storage
- distribution
- linear
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0631—Configuration or reconfiguration of storage systems by allocating resources to storage systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Definitions
- the subject matter of this disclosure is generally related to data storage systems.
- Scalable data storage systems for organizations are designed to have low initial cost and be reconfigurable to grow to meet future needs.
- homogeneous software-defined, server-based storage area network (SAN) storage nodes can be used as modular storage system building blocks.
- Storage system resiliency is based on redundant array of independent disks (RAID) protection groups having members distributed across different storage nodes.
- RAID redundant array of independent disks
- the width of the implemented RAID level can limit the granularity at which storage capacity can be scaled.
- reconfiguring the storage system to a different RAID level may be difficult, which is problematic because modular storage systems may initially only be capable of being configured with a smaller RAID width but eventually include enough storage nodes to be configured for a more efficient larger RAID width.
- an apparatus comprises a plurality of homogeneous storage nodes that are interconnected via a network, each storage node comprising a server and local storage, each server comprising a multi-core processor, memory, and a storage controller configured to: subdivide storage capacity of the local storage into indexed same-size cells; create and distribute members of protection groups across the storage nodes in the cells in a recursive fractal distribution pattern; and metamorphose distribution of sets of the members of the protection groups from the recursive fractal distribution pattern to a linear distribution pattern in response to addition of new storage nodes, thereby enabling scaling in increments of single storage nodes.
- a method comprises subdividing storage capacity of local storage of a plurality of homogeneous storage nodes that are interconnected via a network into indexed same-size cells; creating and distributing members of protection groups across the storage nodes in the cells in a recursive fractal distribution pattern; and metamorphosing distribution of sets of the members of the protection groups from the recursive fractal distribution pattern to a linear distribution pattern in response to addition of new storage nodes, thereby enabling scaling in increments of single storage nodes.
- a non-transitory computer-readable storage medium stores instructions that when executed by a computer cause the computer to perform a method comprising subdividing storage capacity of local storage of a plurality of homogeneous storage nodes that are interconnected via a network into indexed same-size cells; creating and distributing members of protection groups across the storage nodes in the cells in a recursive fractal distribution pattern; and metamorphosing distribution of sets of the members of the protection groups from the recursive fractal distribution pattern to a linear distribution pattern in response to addition of new storage nodes, thereby enabling scaling in increments of single storage nodes.
- FIG. 1 illustrates a software-defined, server-based storage area network (SAN) with single node granular scaling and adaptive RAID width capabilities.
- SAN storage area network
- FIG. 2 illustrates matrices that represent organization of aggregate storage in RAID groups characterized by recursive fractal patterns.
- FIGS. 3 A and 3 B illustrate scaling by metamorphosing from recursive fractal distribution to linear distribution.
- FIGS. 4 A and 4 B illustrate scaling by metamorphosing from linear distribution to recursive fractal distribution.
- FIGS. 5 A, 5 B, 6 A, 6 B, 7 A, 7 B, and 8 illustrate RAID width adaptation.
- FIG. 9 illustrates a method for granular scaling and RAID width adaptation.
- Some aspects, features and implementations described herein may include machines such as computers, electronic components, optical components, and processes such as computer-implemented steps. It will be apparent to those of ordinary skill in the art that the computer-implemented steps may be stored as computer-executable instructions on a non-transitory computer-readable medium. Furthermore, it will be understood by those of ordinary skill in the art that the computer-executable instructions may be executed on a variety of tangible processor devices. For ease of exposition, not every step, device or component that may be part of a computer or data storage system is described herein. Those of ordinary skill in the art will recognize such steps, devices, and components in view of the teachings of the present disclosure and the knowledge generally available to those of ordinary skill in the art. The corresponding machines and processes are therefore enabled and within the scope of the disclosure.
- logical and “virtual” are used to refer to features that are abstractions of other features, e.g., and without limitation abstractions of tangible features.
- physical is used to refer to tangible features. For example, multiple virtual computing devices could operate simultaneously on one physical computing device.
- logic is used to refer to special purpose physical circuit elements and software instructions that are stored on a non-transitory computer-readable medium and implemented by multi-purpose tangible processors.
- disk and “drive” are used interchangeably and are not intended to be limited to a particular type of non-volatile data storage media.
- FIG. 1 illustrates a software-defined, server-based storage area network (SAN) with single node granular scaling and adaptive RAID width capabilities.
- the software-defined SAN includes multiple homogeneous storage nodes 104 - 1 , 104 - 2 , 104 - 3 , 104 - 4 that are interconnected via an Internet Protocol (IP) network 112 .
- IP Internet Protocol
- Each storage node includes a server 100 and local storage 106 .
- Each server 100 includes a multi-core CPU 102 , a storage controller 108 , and a memory bank 110 .
- the CPU 102 includes L1 onboard cache.
- Each memory bank 110 includes L2/L3 cache and main memory implemented with one or both of volatile memory components such as Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and non-volatile memory (NVM) such as storage class memory (SCM).
- the local storage 106 includes one or more non-volatile disk drives.
- the local storage may include solid-state drives (SSDs) based on electrically erasable programmable read-only memory (EEPROM) technology such as NAND and NOR flash memory and hard disk drives (HDDs) with spinning disk magnetic storage media.
- SSDs solid-state drives
- EEPROM electrically erasable programmable read-only memory
- HDDs hard disk drives
- Each local storage has the same type and quantity of local storage devices with matching storage capacity, so each storage node has the same storage capacity.
- the storage controllers 108 manage organization of the directly-attached local storage 106 .
- the local storage is subdivided into equal size units of storage capacity referred to as cells.
- the cells are indexed, e.g., 1, 2, 3, . . . , and the cell indices are used uniformly for all storage nodes.
- Aggregate storage capacity of the SAN is organized into RAID protection groups with members located in cells that are distributed across different storage nodes.
- the number of cells may be an integer multiple of the target RAID width.
- Storage system efficiency and data availability are determined by the number of data and parity members per RAID protection group, which is referred to as RAID width. A large RAID width corresponds to lower parity overhead cost and greater storage efficiency.
- the storage system may initially be implemented with too few storage nodes for a large RAID width because storage nodes in a modular storage system are purchased and scaled based on evolving organizational needs.
- the aggregate storage capacity of the disclosed storage system can advantageously be scaled in increments of single storage nodes regardless of the implemented RAID level and the RAID width and distribution can be automatically adapted to the number of storage nodes in the storage system such that new RAID groups with larger RAID widths are formed to improve storage efficiency or data availability when scaling provides a sufficient number of storage nodes in the storage system.
- recursive fractal storage configurations supporting RAID widths 2, 4, and 8 for RAID-1 (1+1), RAID-5 (3+1), and RAID-6 (6+2) may be viewed as 2 ⁇ 2, 4 ⁇ 4, and 8 ⁇ 8 matrices, respectively, where each row corresponds to a storage node and each column corresponds to a cell index.
- the reference number within each position of a matrix represents the RAID group of the member that occupies that position. Every RAID group comprises members distributed across different rows because the members are distributed across different storage nodes to protect against storage node failure.
- RAID-6 (6+2) group 0 has members on a diagonal of the 8 ⁇ 8 matrix.
- RAID protection group distribution in the illustrated matrices, and thus aggregate storage is characterized by recursive fractal patterns.
- the 4 ⁇ 4 matrix is a 2 ⁇ 2 matrix of 4-cell patterns, where each pattern is a 2 ⁇ 2 submatrix.
- the 8 ⁇ 8 matrix is patterned as four 4 ⁇ 4 and/or eight 2 ⁇ 2 submatrices. Larger (by powers of 2) matrices can be recursively patterned accordingly.
- recursive fractal distribution of protection group members enables granular scaling and RAID width adaptation. For example, a storage system could initially be configured according to the 2 ⁇ 2 matrix and evolve into the distribution of the 4 ⁇ 4 matrix and then the 8 ⁇ 8 matrix.
- FIGS. 3 A and 3 B illustrate scaling by metamorphosing from recursive fractal distribution to linear distribution in the context of RAID-5 (3+1) groups (a, b, c, d) starting with a 4 ⁇ 4 matrix.
- RAID-5 (3+1) groups (a, b, c, d) starting with a 4 ⁇ 4 matrix.
- members of RAID groups (a, b, c, d) from four same-index cells in the first column are groupwise-rotationally relocated from the original 4 ⁇ 4 matrix into the cells of the new row corresponding to storage node 5.
- the relocation frees space for creation of a new RAID-5 (3+1) group (A) that is linearly distributed in the vacated same-index cells.
- FIGS. 4 A and 4 B illustrate scaling by metamorphosing from linear distribution to recursive fractal distribution.
- the 4 ⁇ 4 recursive fractal matrix described above is superimposed on one of the 4 ⁇ 4 linearly distributed subsets (nodes 1-4) that was split-off as described above.
- New RAID groups are distributed according to the distribution pattern of the matching size recursive fractal matrix. For example, when a new row 5 is added, the RAID group members on the diagonal cells superimposed with number 0 are relocated to the new row. A new RAID group (a) is created using the vacated cells. The procedure is iterated with numbers 1, 2, and 3, in sequence.
- FIGS. 5 A, 5 B, 6 A, 6 B, 7 A, 7 B, and 8 illustrate RAID width adaptation.
- the addition of storage nodes enables implementation of larger RAID widths.
- RAID distribution can be automatically metamorphosed such that new RAID groups are formed with larger RAID width.
- a system of RAID-5 (3+1) may evolve into RAID-6 (6+2) as the storage system grows from 4 nodes to more than 8 nodes.
- the metamorphosis methods adapt to “power of 2” RAID widths with alternating patterns of recursive fractal distribution and linear distribution.
- RAID groups characterized by width 4 can be distributed by repeating the 4 ⁇ 4 recursive fractal matrix pattern in adjacent 4-cell groups. Each 4-cell group is separately metamorphosed as described above.
- New RAID groups (A, E) are formed with column-wise linear distribution as the system grows with the addition of storage node 5. All RAID groups will be linearly aligned (per column) as the system grows to 8 storage nodes by metamorphosing from recursive fractal to linear distribution as shown in FIG. 5 B .
- new RAID groups are formed by metamorphosing from linear distribution according to the superimposed 8 ⁇ 8 recursive fractal matrix.
- the vacated cells are allocated to a new RAID group (a).
- original RAID group members from cells marked in subsequent sequential numbers of the 8 ⁇ 8 recursive fractal matrix are relocated to the new storage nodes.
- the vacated cells are allocated to new RAID groups (b, c, d, e, f, g, h).
- the storage system is split into two subsets after the addition of eight new storage nodes.
- the first subset of nodes 9-16 has RAID width 4 groups in a linear distribution pattern. There are two RAID width 4 groups per column.
- RAID width 4 groups per column.
- same-cell-index pairs of smaller RAID width 4 groups are combined into single larger RAID width 8 groups. Parity is recomputed, e.g., from RAID-5 (3+1) to RAID-6 (6+2).
- the subset of storage nodes 1-8 includes RAID width 8 groups in a recursive fractal distribution pattern. This subset will support subsequent metamorphosis growth and split cycles. For example, RAID members in the first column will be relocated to a new storage node 17, making room for a new RAID group (i) to be formed. RAID members in subsequent columns may be relocated to successive new storage nodes for new RAID groups to be formed.
- the original subset will again metamorphose and split into two subsets, with a linear distribution of all RAID groups.
- the system may continue to grow in metamorphosis cycles, with alternating patterns of recursive fractal and linear distribution.
- the system may support larger (power of 2) RAID widths, using the recursive fractals and metamorphosis as described above.
- RAID 5 3+1) or RAID 6 (6+2)
- 87.5% RAID 5 (7+1) or RAID 6 (14+2)
- 93.75% RAID 5 (15+1) or RAID 6 (30+2), etc.
- a larger RAID width may also increase the complexity of RAID recovery in case of a storage node failure. Therefore, a target maximum RAID width may be selected so that automated RAID width adaptation does not increase RAID width indefinitely.
- FIG. 9 illustrates a method for granular scaling and RAID width adaptation.
- Step 900 is subdividing the storage capacity of a group of homogeneous storage nodes into indexed same-size cells.
- the storage nodes may be parts of a software-defined, server-based SAN, but that should not be viewed as a limitation because other types of storage systems and nodes could be used.
- Step 902 is creating and distributing RAID groups across the nodes in a recursive fractal distribution pattern.
- the RAID level may be selected based on the number of storage nodes, e.g., with the RAID width being selected as a function of the number of storage nodes. No more than one member of any RAID group is located on the same storage node.
- Step 904 is adding new storage nodes by metamorphosing from the recursive fractal distribution pattern to a linear distribution pattern.
- the nodes may be added individually. For example, same-cell-index members may be groupwise-rotated to cells on a new storage node, iteratively for each new storage node with sequentially indexed cells until there are enough nodes to increase the RAID width. After there are enough nodes to increase the RAID width as determined in step 906 , step 908 determines whether the target RAID width has been reached. If the target RAID width has not been reached, then step 910 is adding new storage nodes and adding new larger width RAID groups in the recursive fractal distribution pattern by metamorphosing from the linear distribution pattern.
- the step may be performed automatically, and the new RAID level may be selected based on the number of storage nodes in the system. Parity is recomputed. This is iterated until there are enough nodes for a split as determined in step 912 . When there are enough storage nodes for a split, a split is implemented in step 914 and the flow returns to step 904 .
- the matrix of cells into two subsets: a first subset that is distributed in a linear pattern and a second subset that is distributed in a recursive fractal pattern. If the target RAID width has been reached as determined in step 908 , then flow exits from the existing loops of steps and new RAID groups can still be added by metamorphosing between recursive fractal and linear distribution patterns in step 916 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Storage Device Security (AREA)
Abstract
A software-defined, server-based storage system is configured to support single node granular scaling and adaptive RAID width capabilities. The storage system includes multiple homogeneous storage nodes, each including a server and local storage. Aggregate storage is organized into same-size cells. RAID group members are distributed in cells across storage nodes in a recursive fractal pattern. The storage system is scaled by metamorphosing between recursive fractal distribution of the RAID groups and linear distribution of the RAID groups and splitting matrices of cells. When a sufficient number of new storage nodes have been added, new larger width RAID groups will be formed.
Description
The subject matter of this disclosure is generally related to data storage systems.
Scalable data storage systems for organizations are designed to have low initial cost and be reconfigurable to grow to meet future needs. For example, homogeneous software-defined, server-based storage area network (SAN) storage nodes can be used as modular storage system building blocks. Storage system resiliency is based on redundant array of independent disks (RAID) protection groups having members distributed across different storage nodes. However, the width of the implemented RAID level can limit the granularity at which storage capacity can be scaled. Moreover, reconfiguring the storage system to a different RAID level may be difficult, which is problematic because modular storage systems may initially only be capable of being configured with a smaller RAID width but eventually include enough storage nodes to be configured for a more efficient larger RAID width.
The examples described herein are not intended to be limiting. All examples, aspects, and features mentioned in this document can be combined in any technically possible way.
In accordance with some aspects, an apparatus comprises a plurality of homogeneous storage nodes that are interconnected via a network, each storage node comprising a server and local storage, each server comprising a multi-core processor, memory, and a storage controller configured to: subdivide storage capacity of the local storage into indexed same-size cells; create and distribute members of protection groups across the storage nodes in the cells in a recursive fractal distribution pattern; and metamorphose distribution of sets of the members of the protection groups from the recursive fractal distribution pattern to a linear distribution pattern in response to addition of new storage nodes, thereby enabling scaling in increments of single storage nodes.
In accordance with some aspects, a method comprises subdividing storage capacity of local storage of a plurality of homogeneous storage nodes that are interconnected via a network into indexed same-size cells; creating and distributing members of protection groups across the storage nodes in the cells in a recursive fractal distribution pattern; and metamorphosing distribution of sets of the members of the protection groups from the recursive fractal distribution pattern to a linear distribution pattern in response to addition of new storage nodes, thereby enabling scaling in increments of single storage nodes.
In accordance with some aspects, a non-transitory computer-readable storage medium stores instructions that when executed by a computer cause the computer to perform a method comprising subdividing storage capacity of local storage of a plurality of homogeneous storage nodes that are interconnected via a network into indexed same-size cells; creating and distributing members of protection groups across the storage nodes in the cells in a recursive fractal distribution pattern; and metamorphosing distribution of sets of the members of the protection groups from the recursive fractal distribution pattern to a linear distribution pattern in response to addition of new storage nodes, thereby enabling scaling in increments of single storage nodes.
Some aspects, features and implementations described herein may include machines such as computers, electronic components, optical components, and processes such as computer-implemented steps. It will be apparent to those of ordinary skill in the art that the computer-implemented steps may be stored as computer-executable instructions on a non-transitory computer-readable medium. Furthermore, it will be understood by those of ordinary skill in the art that the computer-executable instructions may be executed on a variety of tangible processor devices. For ease of exposition, not every step, device or component that may be part of a computer or data storage system is described herein. Those of ordinary skill in the art will recognize such steps, devices, and components in view of the teachings of the present disclosure and the knowledge generally available to those of ordinary skill in the art. The corresponding machines and processes are therefore enabled and within the scope of the disclosure.
The terminology used in this disclosure is intended to be interpreted broadly within the limits of subject matter eligibility. The terms “logical” and “virtual” are used to refer to features that are abstractions of other features, e.g., and without limitation abstractions of tangible features. The term “physical” is used to refer to tangible features. For example, multiple virtual computing devices could operate simultaneously on one physical computing device. The term “logic” is used to refer to special purpose physical circuit elements and software instructions that are stored on a non-transitory computer-readable medium and implemented by multi-purpose tangible processors. The terms “disk” and “drive” are used interchangeably and are not intended to be limited to a particular type of non-volatile data storage media.
The storage controllers 108 manage organization of the directly-attached local storage 106. The local storage is subdivided into equal size units of storage capacity referred to as cells. The cells are indexed, e.g., 1, 2, 3, . . . , and the cell indices are used uniformly for all storage nodes. Aggregate storage capacity of the SAN is organized into RAID protection groups with members located in cells that are distributed across different storage nodes. The number of cells may be an integer multiple of the target RAID width. Storage system efficiency and data availability are determined by the number of data and parity members per RAID protection group, which is referred to as RAID width. A large RAID width corresponds to lower parity overhead cost and greater storage efficiency. However, the storage system may initially be implemented with too few storage nodes for a large RAID width because storage nodes in a modular storage system are purchased and scaled based on evolving organizational needs. As will be explained in greater detail below, the aggregate storage capacity of the disclosed storage system can advantageously be scaled in increments of single storage nodes regardless of the implemented RAID level and the RAID width and distribution can be automatically adapted to the number of storage nodes in the storage system such that new RAID groups with larger RAID widths are formed to improve storage efficiency or data availability when scaling provides a sufficient number of storage nodes in the storage system.
As shown in FIG. 2 , recursive fractal storage configurations supporting RAID widths 2, 4, and 8 for RAID-1 (1+1), RAID-5 (3+1), and RAID-6 (6+2), for example, may be viewed as 2×2, 4×4, and 8×8 matrices, respectively, where each row corresponds to a storage node and each column corresponds to a cell index. The reference number within each position of a matrix represents the RAID group of the member that occupies that position. Every RAID group comprises members distributed across different rows because the members are distributed across different storage nodes to protect against storage node failure. For example, RAID-6 (6+2) group 0 has members on a diagonal of the 8×8 matrix. RAID protection group distribution in the illustrated matrices, and thus aggregate storage, is characterized by recursive fractal patterns. For example, the 4×4 matrix is a 2×2 matrix of 4-cell patterns, where each pattern is a 2×2 submatrix. Similarly, the 8×8 matrix is patterned as four 4×4 and/or eight 2×2 submatrices. Larger (by powers of 2) matrices can be recursively patterned accordingly. As will be explained below, recursive fractal distribution of protection group members enables granular scaling and RAID width adaptation. For example, a storage system could initially be configured according to the 2×2 matrix and evolve into the distribution of the 4×4 matrix and then the 8×8 matrix.
Referring to FIG. 5A , if the storage system is initially configured with 4 storage nodes with 8 cells per node, RAID groups characterized by width 4 can be distributed by repeating the 4×4 recursive fractal matrix pattern in adjacent 4-cell groups. Each 4-cell group is separately metamorphosed as described above. New RAID groups (A, E) are formed with column-wise linear distribution as the system grows with the addition of storage node 5. All RAID groups will be linearly aligned (per column) as the system grows to 8 storage nodes by metamorphosing from recursive fractal to linear distribution as shown in FIG. 5B .
Referring to FIGS. 6A and 6B , as the storage system grows beyond 8 storage nodes, new RAID groups are formed by metamorphosing from linear distribution according to the superimposed 8×8 recursive fractal matrix. After relocating RAID group members from the diagonal cells (marked 0) to the new node 9, the vacated cells are allocated to a new RAID group (a). As more storage nodes are added to the system, original RAID group members from cells marked in subsequent sequential numbers of the 8×8 recursive fractal matrix are relocated to the new storage nodes. The vacated cells are allocated to new RAID groups (b, c, d, e, f, g, h).
Referring to FIG. 7A , the storage system is split into two subsets after the addition of eight new storage nodes. The first subset of nodes 9-16 has RAID width 4 groups in a linear distribution pattern. There are two RAID width 4 groups per column. Referring to FIG. 7B , same-cell-index pairs of smaller RAID width 4 groups are combined into single larger RAID width 8 groups. Parity is recomputed, e.g., from RAID-5 (3+1) to RAID-6 (6+2). The subset of storage nodes 1-8 includes RAID width 8 groups in a recursive fractal distribution pattern. This subset will support subsequent metamorphosis growth and split cycles. For example, RAID members in the first column will be relocated to a new storage node 17, making room for a new RAID group (i) to be formed. RAID members in subsequent columns may be relocated to successive new storage nodes for new RAID groups to be formed.
Referring to FIG. 8 , after adding eight new storage nodes, the original subset will again metamorphose and split into two subsets, with a linear distribution of all RAID groups. The system may continue to grow in metamorphosis cycles, with alternating patterns of recursive fractal and linear distribution.
If the storage space of each storage node is subdivided into more cells, the system may support larger (power of 2) RAID widths, using the recursive fractals and metamorphosis as described above. However, the storage efficiency is subject to diminishing returns because storage efficiency will be 75% with RAID 5 (3+1) or RAID 6 (6+2), 87.5% with RAID 5 (7+1) or RAID 6 (14+2), and 93.75% with RAID 5 (15+1) or RAID 6 (30+2), etc. A larger RAID width may also increase the complexity of RAID recovery in case of a storage node failure. Therefore, a target maximum RAID width may be selected so that automated RAID width adaptation does not increase RAID width indefinitely.
A number of features, aspects, embodiments, and implementations have been described. Nevertheless, it will be understood that a wide variety of modifications and combinations may be made without departing from the scope of the inventive concepts described herein. Accordingly, those modifications and combinations are within the scope of the following claims.
Claims (20)
1. An apparatus comprising:
a plurality of homogeneous storage nodes that are interconnected via a network, each storage node comprising a server and local storage, each server comprising a multi-core processor, memory, and a storage controller configured to:
subdivide storage capacity of the local storage into indexed same-size cells;
create and distribute members of protection groups across the storage nodes in the cells in a recursive fractal distribution pattern; and
metamorphose distribution of sets of the members of the protection groups from the recursive fractal distribution pattern to a linear distribution pattern in response to addition of new storage nodes, thereby enabling scaling in increments of single storage nodes.
2. The apparatus of claim 1 further comprising the storage controller being configured to metamorphose distribution of the protection groups from the recursive fractal distribution pattern to the linear distribution pattern until all protection groups are linearly distributed.
3. The apparatus of claim 2 further comprising the storage controller being configured to metamorphose distribution of the protection groups from the recursive fractal distribution pattern to the linear distribution pattern via group-wise rotation of same-cell-index protection group members to cells on a new storage node, iteratively for each new storage node with sequentially indexed cells.
4. The apparatus of claim 3 further comprising the storage controller being configured to split the protection groups into two subsets and metamorphose distribution of the protection groups of one of the subsets from the linear distribution pattern to the recursive fractal distribution pattern in response to addition of new storage nodes.
5. The apparatus of claim 4 further comprising the storage controller being configured to metamorphose distribution of the protection groups of one of the subsets from the linear distribution pattern to the recursive fractal distribution pattern by selecting protection group members corresponding to superimposition of a recursive fractal matrix on the linear distribution pattern.
6. The apparatus of claim 5 further comprising the storage controller being configured to split the protection groups into two subsets: a first subset that is distributed in a linear pattern and a second subset that is distributed in a recursive fractal pattern.
7. The apparatus of claim 6 further comprising the storage controller being configured to combine pairs of protection groups with the same cell index into single wider protection groups responsive to addition of a sufficient number of storage nodes.
8. A method comprising:
subdividing storage capacity of local storage of a plurality of homogeneous storage nodes that are interconnected via a network into indexed same-size cells;
creating and distributing members of protection groups across the storage nodes in the cells in a recursive fractal distribution pattern; and
metamorphosing distribution of sets of the members of the protection groups from the recursive fractal distribution pattern to a linear distribution pattern in response to addition of new storage nodes, thereby enabling scaling in increments of single storage nodes.
9. The method of claim 8 further comprising metamorphosing distribution of the protection groups from the recursive fractal distribution pattern to the linear distribution pattern until all protection groups are linearly distributed.
10. The method of claim 9 further comprising metamorphosing distribution of the protection groups from the recursive fractal distribution pattern to the linear distribution pattern by group-wise rotating same-cell-index protection group members to cells on a new storage node, iteratively for each new storage node with sequentially indexed cells.
11. The method of claim 10 further comprising splitting the protection groups into two subsets and metamorphosing distribution of the protection groups of one of the subsets from the linear distribution pattern to the recursive fractal distribution pattern in response to addition of new storage nodes.
12. The method of claim 11 further comprising metamorphosing distribution of the protection groups of one of the subsets from the linear distribution pattern to the recursive fractal distribution pattern by selecting protection group members corresponding to superimposition of a recursive fractal matrix on the linear distribution pattern.
13. The method of claim 12 further comprising splitting the protection groups into two subsets: a first subset that is distributed in a linear pattern and a second subset that is distributed in a recursive fractal pattern.
14. The method of claim 13 further comprising combining pairs of protection groups with the same cell index into single wider protection groups responsive to addition of a sufficient number of storage nodes.
15. A non-transitory computer-readable storage medium storing instructions that are executed by a computer to perform a method comprising:
subdividing storage capacity of local storage of a plurality of homogeneous storage nodes that are interconnected via a network into indexed same-size cells;
creating and distributing members of protection groups across the storage nodes in the cells in a recursive fractal distribution pattern; and
metamorphosing distribution of sets of the members of the protection groups from the recursive fractal distribution pattern to a linear distribution pattern in response to addition of new storage nodes, thereby enabling scaling in increments of single storage nodes.
16. The non-transitory computer-readable storage medium of claim 15 in which the method further comprises metamorphosing distribution of the protection groups from the recursive fractal distribution pattern to the linear distribution pattern until all protection groups are linearly distributed.
17. The non-transitory computer-readable storage medium of claim 16 in which the method further comprises metamorphosing distribution of the protection groups from the recursive fractal distribution pattern to the linear distribution pattern by group-wise rotating same-cell-index protection group members to cells on a new storage node, iteratively for each new storage node with sequentially indexed cells.
18. The non-transitory computer-readable storage medium of claim 17 in which the method further comprises splitting the protection groups into two subsets and metamorphosing distribution of the protection groups of one of the subsets from the linear distribution pattern to the recursive fractal distribution pattern in response to addition of new storage nodes.
19. The non-transitory computer-readable storage medium of claim 18 in which the method further comprises metamorphosing distribution of the protection groups of one of the subsets from the linear distribution pattern to the recursive fractal distribution pattern by selecting protection group members corresponding to superimposition of a recursive fractal matrix on the linear distribution pattern.
20. The non-transitory computer-readable storage medium of claim 19 in which the method further comprises splitting the protection groups into a first subset that is distributed in a linear pattern and a second subset that is distributed in a recursive fractal pattern and combining pairs of protection groups with the same cell index into single wider protection groups responsive to addition of a sufficient number of storage nodes.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/302,837 US12008271B1 (en) | 2023-04-19 | 2023-04-19 | Adaptive raid width and distribution for flexible storage |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/302,837 US12008271B1 (en) | 2023-04-19 | 2023-04-19 | Adaptive raid width and distribution for flexible storage |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US12008271B1 true US12008271B1 (en) | 2024-06-11 |
Family
ID=91382629
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/302,837 Active US12008271B1 (en) | 2023-04-19 | 2023-04-19 | Adaptive raid width and distribution for flexible storage |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US12008271B1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020078276A1 (en) * | 2000-12-20 | 2002-06-20 | Ming-Li Hung | RAID controller with IDE interfaces |
| US20050063217A1 (en) * | 2003-09-24 | 2005-03-24 | Nec Corporation | Disk array device, method of extending storage capacity and computer program |
| US20080276057A1 (en) * | 2007-05-01 | 2008-11-06 | International Business Machines Corporation | Data storage array scaling method and system with minimal data movement |
| US20200401340A1 (en) * | 2017-06-19 | 2020-12-24 | Hitachi, Ltd. | Distributed storage system |
| US20220391359A1 (en) * | 2021-06-07 | 2022-12-08 | Netapp, Inc. | Distributed File System that Provides Scalability and Resiliency |
-
2023
- 2023-04-19 US US18/302,837 patent/US12008271B1/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020078276A1 (en) * | 2000-12-20 | 2002-06-20 | Ming-Li Hung | RAID controller with IDE interfaces |
| US20050063217A1 (en) * | 2003-09-24 | 2005-03-24 | Nec Corporation | Disk array device, method of extending storage capacity and computer program |
| US20080276057A1 (en) * | 2007-05-01 | 2008-11-06 | International Business Machines Corporation | Data storage array scaling method and system with minimal data movement |
| US20200401340A1 (en) * | 2017-06-19 | 2020-12-24 | Hitachi, Ltd. | Distributed storage system |
| US20220391359A1 (en) * | 2021-06-07 | 2022-12-08 | Netapp, Inc. | Distributed File System that Provides Scalability and Resiliency |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11449226B2 (en) | Reorganizing disks and raid members to split a disk array during capacity expansion | |
| US10558383B2 (en) | Storage system | |
| CN108052655B (en) | Data writing and reading method | |
| US11144396B1 (en) | Raid reliability with a provisional spare disk | |
| US9436394B2 (en) | RAID random distribution scheme | |
| US7093069B2 (en) | Integration of a RAID controller with a disk drive module | |
| US10817376B2 (en) | RAID with heterogeneous combinations of segments | |
| US20160048342A1 (en) | Reducing read/write overhead in a storage array | |
| US11340789B2 (en) | Predictive redistribution of capacity in a flexible RAID system | |
| US20180246793A1 (en) | Data stripping, allocation and reconstruction | |
| CN107025066A (en) | The method and apparatus that data storage is write in the storage medium based on flash memory | |
| US11314608B1 (en) | Creating and distributing spare capacity of a disk array | |
| US11983414B2 (en) | Successive raid distribution for single disk expansion with efficient and balanced spare capacity | |
| US11474901B2 (en) | Reliable RAID system with embedded spare capacity and flexible growth | |
| CN119336536A (en) | A data reconstruction method, device, storage medium and program product | |
| US11327666B2 (en) | RAID member distribution for granular disk array growth | |
| US10146619B2 (en) | Assigning redundancy in encoding data onto crossbar memory arrays | |
| US11507287B1 (en) | Adding single disks to an array by relocating raid members | |
| US11256428B2 (en) | Scaling raid-based storage by redistributing splits | |
| WO2018235132A1 (en) | DISTRIBUTED STORAGE SYSTEM | |
| US20210389896A1 (en) | Flexible raid sparing using disk splits | |
| US12008271B1 (en) | Adaptive raid width and distribution for flexible storage | |
| Li et al. | Relieving both storage and recovery burdens in big data clusters with R-STAIR codes | |
| US11403022B2 (en) | Growing and splitting a disk array by moving RAID group members | |
| US20200336157A1 (en) | Systematic and xor-based coding technique for distributed storage systems |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |