US11599283B2 - Power reduction in distributed storage systems - Google Patents
Power reduction in distributed storage systems Download PDFInfo
- Publication number
- US11599283B2 US11599283B2 US16/667,670 US201916667670A US11599283B2 US 11599283 B2 US11599283 B2 US 11599283B2 US 201916667670 A US201916667670 A US 201916667670A US 11599283 B2 US11599283 B2 US 11599283B2
- Authority
- US
- United States
- Prior art keywords
- storage system
- data
- node
- distributed storage
- hierarchy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 230000009467 reduction Effects 0.000 title description 22
- 230000007480 spreading Effects 0.000 claims abstract description 129
- 238000013500 data storage Methods 0.000 claims abstract description 78
- 238000000034 method Methods 0.000 claims abstract description 46
- 230000001939 inductive effect Effects 0.000 claims 3
- 230000015654 memory Effects 0.000 description 36
- 238000005516 engineering process Methods 0.000 description 22
- 238000004891 communication Methods 0.000 description 19
- 238000010586 diagram Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000012546 transfer Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 101710171219 30S ribosomal protein S13 Proteins 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000010267 cellular communication Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0625—Power saving in storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0634—Configuration or reconfiguration of storage systems by changing the state or mode of one or more devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present disclosure relates to a distributed data storage system.
- the present disclosure relates to distributing data in the distributed data storage system for redundancy.
- Some data distribution algorithms used in the storage systems allow a user to define a protection level by describing failure scenarios that can be tolerated, such that data can still be recovered even after such a failure occurs.
- Data recovery in a distributed storage system often requires the creation of additional data which results in the need to provide and power additional storage resources in order to reconstruct data when malfunctions or failures occur.
- the additional storage resources also consume additional power, which creates additional costs, heat, and leads to earlier failures. While efforts could be applied to improve hardware solutions, such solutions typically evolve over product iterations any may only result in incremental improvements. Alternative improvements may utilize characteristics of the data coding and placement in the storage system.
- the present disclosure relates, in some embodiments, to reducing power consumption in a distributed data storage system using a hierarchy rule that is generated based on a spreading policy and a set of tolerable failures specified by a user.
- the subject matter described in this disclosure may be embodied in computer-implemented methods that include distributing erasure-encoded data of a first data object across first and second portions of a distributed storage system using a hierarchy rule corresponding to a spreading policy based on a set of tolerable failures from which the first data object can be recovered; and disabling the first portion of the distributed storage system that includes a first portion of the erasure-encoded data of the first data object.
- the first portion of the distributed storage system is determined according to the spreading policy and the hierarchy rule identifying the set of tolerable failures, and the second portion of the distributed storage system includes portions of the erasure-encoded data configured to recreate the first data object.
- the methods may include disabling a first portion of the distributed storage system by suspending power to the first portion of the distributed storage system.
- the distributed storage system may include a hierarchy of one or more of data centers, racks, nodes, and devices, and wherein the first portion that is disabled may be at least one of the nodes.
- the distributed storage system may include at least a first node and a second node, and disabling the first portion alternates between the first node and the second node.
- the hierarchy may further include one or more sub-nodes located between the nodes and the devices in the hierarchy.
- the first portion includes at least one of the sub-nodes.
- the distributed storage system may include at least a first sub-node and a second sub-node, and the disabling the first portion alternates between the first sub-node and the second sub-node.
- the spreading policy and the hierarchy rule are selected to allow the first portion of the distributed storage system to be disabled without affecting the distributed storage system from reading or writing the first data.
- a distributed storage system includes a set of non-volatile data storage devices; and a controller node having a memory and one or more processors configured to execute instructions stored in the memory.
- the controller node is configured to perform operations comprising: distributing erasure-encoded data of a first data object across first and second portions of a distributed storage system using a hierarchy rule corresponding to a spreading policy based on a set of tolerable failures from which the first data object can be recovered; and disabling the first portion of the distributed storage system that includes a first portion of the erasure-encoded data of the first data object.
- the first portion of the distributed storage system is determined according to the spreading policy and the hierarchy rule identifies the set of tolerable failures.
- the second portion of the distributed storage system includes portions of the erasure-encoded data configured to recreate the first data object.
- the distributed storage system may include disabling a first portion of the distributed storage system by suspending power to the first portion of the distributed storage system.
- the distributed storage system may include a hierarchy of one or more of data centers, racks, nodes, and devices, and wherein the first portion that is disabled may be at least one of the nodes.
- the distributed storage system may include at least a first node and a second node, and disabling the first portion alternates between the first node and the second node.
- the hierarchy may further include one or more sub-nodes located between the nodes and the devices in the hierarchy.
- the first portion includes at least one of the sub-nodes.
- the distributed storage system may include at least a first sub-node and a second sub-node, and disabling the first portion alternates between the first sub-node and the second sub-node.
- the spreading policy and the hierarchy rule are selected to allow the first portion of the distributed storage system to be disabled without affecting the distributed storage system from reading or writing the first data.
- a distributed storage system includes a means for distributing erasure-encoded data of a first data object across first and second portions of a distributed storage system using a hierarchy rule corresponding to a spreading policy based on a set of tolerable failures from which the first data object can be recovered; and a means for disabling the first portion of the distributed storage system that includes a first portion of the erasure-encoded data of the first data object.
- the first portion of the distributed storage system is determined according to the spreading policy and the hierarchy rule identifying the set of tolerable failures, and the second portion of the distributed storage system includes portions of the erasure-encoded data configured to recreate the first data object.
- the disabling a first portion of the distributed storage system includes suspending power to the first portion of the distributed storage system.
- the spreading policy and the hierarchy rule are selected to allow the first portion of the distributed storage system to be disabled without affecting the distributed storage system from reading or writing the first data.
- FIG. 1 is a high-level block diagram illustrating an example distributed storage system.
- FIG. 2 is a block diagram illustrating an example controller node of the distributed storage system configured to implement the techniques introduced herein.
- FIG. 3 is a block diagram illustrating an example hierarchic tree structure of the distributed storage system.
- FIG. 4 is a block diagram illustrating an example hierarchic tree structure of the distributed storage system based on a hierarchic tree structure.
- FIG. 5 is a flowchart of an example method for reducing power consumption in a distributed storage system implementing a hierarchical rule and spreading policy, according to the techniques described herein.
- Fault-tolerant systems usually require a form of redundancy or encoding that results in the duplication or increase in the hardware elements necessary to store the increased data. Such increased storage results in an overall increase in the power consumed by the resulting distributed storage system. However, with the increased data, the system can tolerate a certain level of hardware failures.
- the present embodiments contemplate reducing the power consumed in the system by selectively disabling or powering-down one or more selected portions of the distributed storage system that will not affect the reading or writing of the data based on the hierarchy rule that is generated based on a spreading policy and a set of tolerable failures.
- a distributed storage system may include a methodology where spreading width (W), which is the number of erasure-encoded chunks of a single object/file that is written to the storage system. Each chunk is generally written to a different data storage device and is erasure-encoded into a plurality of portions of data (e.g., “chunks”).
- An algorithm determines the spreading of the different erasure-encoded data chunks into different hardware elements distributed between different data centers, different racks, different storage servers, and different storage devices (e.g., hard disk drives (HDDs) or solid-state drives (SSDs)).
- HDDs hard disk drives
- SSDs solid-state drives
- the spreading algorithm takes a spreading policy into account, which includes two main items.
- a spreading width (W) which is the number of erasure-encoded chunks of a single object/file that is written to the storage system. Each chunk is generally written to a different data storage device.
- a maximum concurrent fault tolerance (F) which is the maximum number of data storage devices that are allowed to fail concurrently.
- the spreading policy is sometimes written as W/F.
- the erasure-encoding overhead of the spreading policy is also a very important aspect to take into account. In a type of erasure coding known as a maximum distance separable (MDS) code, for a spreading policy W/F, the overhead can be calculated as W/(W ⁇ F).
- MDS maximum distance separable
- the erasure-encoding overhead is 18/13 ⁇ 1.38. This means that for every byte of incoming data that needs to be stored in the storage system, about 1.38 bytes are stored on the HDDs of the storage system. Other variations are also possible and contemplated. For other erasure coding algorithms, the overhead will be larger, and the calculation will be different.
- FIG. 1 is a high-level block diagram illustrating an example distributed storage system 100 that is accessed by an application 102 .
- the application 102 is a software application running on a computing device that interacts with the system 100 .
- the computer device may be, for example, a laptop computer, a desktop computer, a tablet computer, a mobile telephone, a personal digital assistant, a mobile email device, a portable game player, a portable music player, a television with one or more processors embedded therein or coupled thereto or any other electronic device capable of making requests to the system 100 and receiving responses from the system 100 .
- the application 102 comprises a file system that enables a general-purpose software application to interface with the system 100 or an Application Programming Interface library.
- the application 102 provides the data for storage in the system 100 .
- the application 102 also requests the data stored in the system 100 .
- the application 102 may be a file transfer application that requests to store a first set of data in the system 100 and to read or write a second set of data from the system 100 .
- the data is in the form of a data object.
- the data object comprises the data (e.g., 128-megabyte binary data) and a data object identifier.
- the data object identifier is a universally unique identifier used for identifying and addressing the data object.
- Storing data in the form of a data object also referred to as object storage, is more advantageous than conventional file or block-based storage on scalability and flexibility, which are of particular importance to large scale redundant storage in a distributed storage system as shown in FIG. 1 .
- the distributed storage system 100 as depicted in FIG. 1 includes a controller node 104 , and storage nodes 106 a - 106 n , 108 a - 108 n , and 110 a - 110 n .
- the controller node 104 may be a computing device configured to make some or all of the storage space for storage of the data provided by the application 102 .
- the controller node 104 generates rules for distributing data of a data object based on user input and determines where to store the data of the data object based on the rules.
- the controller node 104 is physically located at a data center, where the controller node 104 along with a plurality of storage nodes 106 a - 106 n , 108 a - 108 n , and 110 a - 110 n are arranged in modular racks as described below.
- the storage nodes 106 a - 106 n , 108 a - 108 n , and 110 a - 110 n are computer devices configured to store the data.
- the storage nodes 106 a - 106 n , 108 a - 108 n , and 110 a - 110 n comprise a plurality of storage elements (e.g., data storage devices or block stores) for storing the data.
- the storage nodes 106 a - 106 n , 108 a - 108 n , and 110 a - 110 n are divided into groups based on, for example, whether the storage nodes are housed in a single rack. In the example of FIG.
- the storage nodes 106 a - 106 n are grouped into rack 112
- the storage nodes 108 a - 108 n are grouped into rack 114
- the storage nodes 110 a - 110 n are grouped into rack 116 .
- the controller node 104 is also located in rack 114 as indicated by the dash-lined box of rack 114 .
- the racks can be geographically dispersed across different data centers, for example, racks 112 and 114 can be located at a data center in Europe, while rack 116 can be located at a data center in the United States.
- racks 112 and 114 can be located at a data center in Europe
- rack 116 can be located at a data center in the United States.
- FIG. 1 a single controller node 104 and storage nodes of three racks are shown in FIG. 1 , it should be understood that there may be any number of controller nodes 104 , storage nodes, or racks.
- the storage nodes 106 a - 106 n may be collectively referred to as storage nodes 106 .
- the storage nodes 108 a - 108 n and 110 a - 110 n may be respectively referred to as storage nodes 108 and 110 .
- the application 102 , the controller node 104 , and the storage nodes 106 , 108 , 110 are interconnected in a data communication network for distributing data of a data object.
- the data communication network can be a conventional type, wired or wireless, and may have numerous different configurations including a star configuration, token ring configuration, or other configurations.
- the data communication network may include a local area network (LAN), a wide area network (WAN) (e.g., the internet), and/or other interconnected data paths across which multiple devices (e.g., a computing device comprising the application 102 , the controller node 104 , the storage nodes, etc.) may communicate.
- LAN local area network
- WAN wide area network
- multiple devices e.g., a computing device comprising the application 102 , the controller node 104 , the storage nodes, etc.
- the data communication network may be a peer-to-peer network.
- the data communication network may also be coupled with or include portions of a telecommunications network for sending data using a variety of different communication protocols.
- the data communication network may include Bluetooth (or Bluetooth low energy) communication networks or a cellular communications network for sending and receiving data, including direct socket communication (e.g., Transmission Control Protocol/Internet Protocol (TCP/IP) sockets) among software modules, remote procedure calls, User Datagram Protocol (UDP) broadcasts and receipts, Hypertext Transfer Protocol (HTTP) connections, function or procedure calls, a direct data connection, etc.
- direct socket communication e.g., Transmission Control Protocol/Internet Protocol (TCP/IP) sockets
- TCP/IP Transmission Control Protocol/Internet Protocol
- UDP User Datagram Protocol
- HTTP Hypertext Transfer Protocol
- any or all of the communication could be secure (Secure Shell (SSH), HTTP Secure (HTTPS), etc.).
- FIG. 2 is a block diagram illustrating an example controller node 104 of the distributed storage system 100 in FIG. 1 .
- the controller node 104 includes a processor 202 , a memory 204 , a network interface (I/F) module 206 , and an optional storage element interface 208 .
- the components of the controller node 104 are communicatively coupled to a bus or software communication mechanism 222 for communication with each other.
- the processor 202 may include an arithmetic logic unit, a microprocessor, a general-purpose controller, or some other processor array to perform computations and provide electronic display signals to a display device.
- the processor 202 is a hardware processor having one or more processing cores.
- the processor 202 is coupled to the bus 222 for communication with the other components.
- Processor 202 processes data signals and may include various computing architectures including a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, or an architecture implementing a combination of instruction sets.
- CISC complex instruction set computer
- RISC reduced instruction set computer
- FIG. 2 multiple processors and/or processing cores may be included. It should be understood that other processor configurations are possible.
- the memory 204 stores instructions and/or data that may be executed by the processor 202 .
- the memory 204 includes an encoding module 212 , a rules engine 214 , a spreading module 216 , a user interface engine 218 , and a power reduction module 220 .
- the memory 204 is coupled to the bus 222 for communication with the other components of the controller node 104 .
- the instructions and/or data stored in the memory 204 may include code for performing the techniques described herein.
- the memory 204 may be, for example, non-transitory memory such as a dynamic random-access memory (DRAM) device, a static random-access memory (SRAM) device, flash memory, or some other memory device.
- DRAM dynamic random-access memory
- SRAM static random-access memory
- the memory 204 also includes a non-volatile memory or similar permanent storage device and media, for example, a hard disk drive, a floppy disk drive, a compact disc read-only memory (CD-ROM) device, a digital versatile disc read-only memory (DVD-ROM) device, a digital versatile disc random-access memories (DVD-RAM) device, a digital versatile disc rewritable (DVD-RW) device, a flash memory device, or some other non-volatile storage device.
- a non-volatile memory or similar permanent storage device and media for example, a hard disk drive, a floppy disk drive, a compact disc read-only memory (CD-ROM) device, a digital versatile disc read-only memory (DVD-ROM) device, a digital versatile disc random-access memories (DVD-RAM) device, a digital versatile disc rewritable (DVD-RW) device, a flash memory device, or some other non-volatile storage device.
- CD-ROM compact disc read-
- the network interface module 206 is configured to connect the controller node 104 to a data communication network.
- the network interface module 208 may enable communication through one or more of the Internet, cable networks, and wired networks.
- the network interface module 206 links the processor 202 to the data communication network that may, in turn, be coupled to other processing systems.
- the network interface module 206 also provides other conventional connections to the data communication network for distribution and/or retrieval of data objects (e.g., files and/or media objects) using standard network protocols such as Transmission Control Protocol/Internet Protocol (TCP/IP), Hypertext Transfer Protocol (HTTP), Secure Hypertext Transfer Protocol (HTTPS), and Simple Mail Transfer Protocol (SMTP) as will be understood.
- TCP/IP Transmission Control Protocol/Internet Protocol
- HTTP Hypertext Transfer Protocol
- HTTPS Secure Hypertext Transfer Protocol
- SMTP Simple Mail Transfer Protocol
- the network interface module 206 includes a transceiver for sending and receiving signals using Bluetooth®, or cellular communications for wireless communication.
- the controller node 104 may include or be included in one of the storage nodes 106 , 108 , or 110 that performs both the function of a controller node and a storage node.
- the controller node 104 includes a storage element interface 208 and one or more storage elements 210 a - 210 n connected via the storage element interface 208 to perform the functions of a storage node.
- the storage element interface 208 may comprise a Serial Advanced Technology Attachment (SATA) interface or a Small Computer System Interface (SCSI) for connecting the storage elements 210 a - 210 n (e.g., ten 2 terabyte (TB) SATA-II disk drives) to other components of the controller node 104 .
- SATA Serial Advanced Technology Attachment
- SCSI Small Computer System Interface
- the storage element interface 208 is configured to control the reading and writing of data to/from the storage elements 210 a - 210 n .
- the controller node 104 can use the storage element interface 208 to retrieve the data requested by the application 102 from the storage elements 210 a - 210 n that store the data.
- the distributed storage system 100 in FIG. 1 includes redundant and independently operated storage elements 210 such that, if one particular storage element fails, the function of the failed storage element can easily be taken on by another storage element.
- the types, capacity, manufacturers, hardware technology, storage interfaces, etc. of the storage elements can be different based on the storage elements being redundant and independently operated, which benefits the scalability and flexibility of the distributed storage system 100 .
- a storage element can be easily added or removed without correlating to other storage elements already in use in the distributed storage system 100 .
- a protection level applies to the data that was already stored on the storage elements of the system 100 .
- the protection level includes a set of failures that can be tolerated (“tolerable failures”), such that a data object can still be recovered even after such a failure occurs.
- a protection level can provide that a data object stored on storage elements of storage nodes 106 , 108 , and 110 can be recovered from two concurrent data storage device failures.
- Software communication mechanism 222 may be an object bus, direct socket communication among software modules, remote procedure calls, UDP broadcasts and receipts, HTTP connections, function, or procedure calls, etc.
- the software communication mechanism 222 can be implemented on any underlying hardware, for example, a network, the Internet, a bus, a combination thereof, etc.
- the controller node 104 comprises an encoding module 212 , a rule engine 214 , a spreading module 216 , a user interface engine 218 , and a power reduction module 220 .
- the controller node 104 , encoding module 212 , rule engine 214 , spreading module 216 , user interface engine 218 , and/or power reduction module 220 may comprise software and/or hardware.
- one or more hardware logic modules such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other suitable hardware logic, may be employed in place of, or as a supplement to, the software and/or firmware in the memory 720 to perform one or more of the functions or acts of the controller node 104 comprises an encoding module 212 , a rule engine 214 , a spreading module 216 , a user interface engine 218 , and a power reduction module 220 .
- ASICs application-specific integrated circuits
- FPGAs field-programmable gate arrays
- Other configurations are also possible and contemplated.
- the encoding module 212 which may be stored in the memory 204 and configured to be executed by the processor 202 in some embodiments, disassembles a data object received from the application 102 into a predetermined number of redundant sub-blocks or pieces to be stored across storage elements of the distributed storage system.
- a distributed storage system not only stores a data object on a plurality of storage elements, but also guarantees that the data object can be correctly retrieved when a certain number of the plurality of storage elements are unavailable (e.g., inaccessible, damaged).
- the encoding module 212 uses erasure-encoding techniques to disassemble a data object to achieve acceptable reliability with considerably less overhead than a standard replication scheme.
- the encoding module 212 disassembles the data object into data pieces based on a spreading policy included in a storage request.
- the spreading policy may be defined as a spreading width (W) over a maximum concurrent failure tolerance (F).
- the spreading width indicates the number of data storage devices that store the pieces of the data object, where each data storage device stores a piece of the data object.
- the maximum concurrent failure tolerance (F) indicates a number of data storage devices that store the pieces of the data object that are allowed to fail concurrently.
- the encoding module 212 using a W/F encoding scheme greatly reduces the overhead as compared to standard replication schemes.
- the encoding module 212 communicates with the user interface engine 218 to receive a spreading policy from a user.
- the user specifies a spreading policy for an individual object or a group of objects.
- a user may specify a spreading policy for a group of objects for simplicity.
- a decoding module (not shown) assembles the data pieces of a data object based on a unique object identifier associated with each piece of the data object to recover the data object.
- the rules engine 214 which may be stored in the memory 204 and configured to be executed by the processor 202 in some embodiments, generates a hierarchy rule corresponding to a spreading policy.
- the hierarchy rule identifies a maximum number of data storage devices on each hierarchy level of a hierarchic tree structure of the distributed storage system 100 for spreading the data of the data object.
- the user interface engine 218 which may be stored in the memory 204 and configured to be executed by the processor 202 in some embodiments, generates graphical data for displaying a user interface.
- the user interface engine 218 communicates with the rules engine 214 and the spreading module 216 to generate graphical data for displaying predefined spreading policies and protection levels including a set of tolerable failure scenarios to a user.
- the user interface engine 218 generates a user interface for receiving a selection of a spreading policy and a protection level from a user.
- the user interface engine 218 receives instructions from the rules engine 214 to generate a user interface to notify the user to modify a spreading policy, a protection policy, and/or a set of tolerable failure scenarios.
- the user interface engine 218 may also communicate with the spreading module 216 to generate a user interface to notify the user of incompatibility between a hierarchy rule and an actual hierarchical deployment configuration of a distributed storage system, and instruct the user to modify a spreading policy and/or a protection level such that a hierarchy rule that is generated based on the modified spreading policy and/or the protection level is compatible with the hierarchical deployment configuration.
- FIG. 3 an example hierarchic tree structure 300 of a distributed storage system is shown.
- the four levels of a hierarchical configuration of the distributed storage system form a tree structure.
- the virtual root in 302 is not part of the four levels as it represents an interface to access the data that is stored or retrieved to/from the distributed storage system.
- the storage elements e.g., data storage devices or block stores (BS)
- BS block stores
- fourteen data storage devices are grouped into seven storage nodes at the node level.
- the storage nodes are grouped according to their respective racks. In the example of FIG. 3 , seven storage nodes are grouped into three racks at the rack level. At the top level of the hierarchy 300 (i.e., the data center level), the racks are grouped according to their respective data centers. In the example of FIG. 3 , three racks are grouped into two data centers at the data center level.
- Each entity in the hierarchy 300 has a unique name and a unique identifier.
- An entity can be a data center, a rack, a storage node, a data storage device, etc.
- a data center 304 at the top level has a name “Data Center at Location 1” and an identifier “0”
- a rack 306 at the middle level has a name “Rack 1” and an identifier “1”
- a storage node 308 at the bottom level has a name “Node 4” and an identifier “2.”
- a data storage device has a hierarchy identifier comprising an array of integers.
- Each integer of a hierarchy identifier corresponds to an identifier of a data center, a rack, and a node at a respective level of the hierarchy 300 .
- These entities form a branch of the tree that ends up at the data storage device.
- the rightmost data storage device at the bottom level has a name “BS14” and a hierarchy identifier “[1,1,7].”
- the numbers “1,” “1,” and “7” from left to right respectively correspond to identifiers of the “Data Center at Location 2,” “Rack 2” and “Node 7” from top to bottom of a branch that ends at the data storage device BS14.
- the name and the identifier of an entity is separated by a colon.
- a name of an entity is unique so that no two entities have the same name.
- An identifier of a data center at the top level is unique so that no data centers have the same identifier.
- data centers in FIG. 3 are given unique identifiers 0 and 1.
- An identifier of a rack or a node is unique within the next higher level.
- racks are given unique identifiers within a data center.
- the racks i.e., Rack 0 and Rack 1
- the rack i.e., Rack 2
- Each rack has a unique number within a specific data center.
- the nodes are given unique identifiers within a rack (i.e., every node has a unique number within a specific rack).
- the unique names and identifiers associated with entities provide layout information of a distributed storage system, which is useful for distributing a data object in the system.
- the hierarchy rule is a list of integer numbers where each number corresponds to the maximum number of storage devices that can be selected to store data for each element of that hierarchy level.
- a hierarchy rule is in the form of [n1, n2, n3, n4], where the numbers n1, n2, n3, and n4 respectively indicate a maximum number of data storage devices on each hierarchy level of the hierarchic tree structure (e.g., the data center level, the rack level, the node level, and the device level), for spreading data of a data object to.
- the hierarchy rule is [6, 100, 2, 1], it means that a maximum number of W pieces (e.g., “chunks”) of the encoded file may be stored at each level.
- a maximum of 6 chunks may be stored in each data center
- a maximum of 100 chunks may be stored in a rack
- a maximum of 2 chunks may be stored in each node (e.g., just a bunch of disks (JBOD))
- JBOD just a bunch of disks
- a maximum of 1 chunk may be stored in each device (e.g., hard-disk driver (HDD) or solid-state drive (SSD)).
- the rule engine 214 generates the hierarchy rule based on a spreading policy and a set of tolerable failures specified by a user.
- the spreading policy W/F determines that W pieces of the data object need to be stored on W data storage devices with a tolerance for F concurrent data storage device failures.
- the protection level includes a set of tolerable failure scenarios specified by the user.
- the 18 data pieces of a data object can be put on a single data center and a single rack because each data center and each rack can store at most 100 data pieces.
- the rule engine 214 determines that not all data can be spread on a single entity of this hierarchy level.
- the number 4 at the node level indicates that a maximum 4 data storage device per node can be used to store the data, and therefore the rule engine 214 determines that the data object should be distributed to 18 nodes, with each node storing zero, one, or more than one pieces of data.
- the number 1 at the node level indicates that a maximum 1 data storage device per node can be used to store the data, and therefore the rule engine 214 determines that the data object should be distributed to 18 devices, with each device storing one piece of data.
- the rule engine 214 determines that, in addition to a failure of a single entity at this hierarchy level, at least one further concurrent data storage device failure can be tolerated.
- the number 1 in the hierarchy rule [100, 100, 4, 1] indicates that each device stores only one piece of data (e.g., on a single data storage device of the node). Therefore, in this case, in addition to a single device failure, up to four other devices can fail until a total of five data pieces are lost. Assuming that the hierarchy rule is changed to [100, 100, 2, 1].
- the number 2 indicates that at most two data pieces can be stored on a single node (e.g., one piece each on two data storage devices of the node). In this case, in addition to a single node failure (i.e., two data storage device failures) up to three other data storage devices or one other node can fail until a total of five data pieces are lost.
- the rule engine 214 receives a set of tolerable failure scenarios specified by a user.
- the rule engine 214 determines that there is no restriction.
- the 18 data pieces of the data object can be stored on a single rack.
- the rule engine 214 determines that the hierarchy rules [100, 100, 1, 1] and [100, 100, 3, 1] do not fulfill the user's requirement.
- the rule engine 214 determines that the hierarchy rule [100, 2, 100, 1] is sufficient when the user requires that up to two rack failures be tolerated.
- determining a spreading policy, a failure tolerance, and/or a hierarchy rule does not require information about the layout or deployment of a distributed storage system. For example, a user may specify what kind of failures from which a data object can survive without knowledge of the hierarchical tree structure of the storage system and save the time associated with retrieving the extensive deployment information.
- the user may be provided a set of predefined policies that cover most common use cases and a description about what kind of failures that a data object may survive, which further simplifies the user's task for specifying failures and minimizes the user's need for knowledge about how the storage system works.
- the spreading module 216 which may be stored in the memory 204 and configured to be executed by the processor 202 in some embodiments, selects data storage devices in the distributed storage system 100 to store a data object using a hierarchy rule and a spreading policy.
- the rule engine 214 determines a hierarchy rule [n1, n2, n3, n4] based on a spreading policy W/F and a protection level included in the request, and transfers the hierarchy rule to the spreading module 216 .
- the spreading module 216 identifies a hierarchical deployment configuration of the system 100 , and determines whether the hierarchy rule is compatible with the hierarchical deployment configuration. Responsive to the hierarchy rule being compatible with the hierarchical deployment configuration, the spreading module 216 identifies which data storage devices in the system 100 should be used for storing the data and transfers the data to the identified data storage devices for storing.
- the rule engine 214 Given a number of data pieces for a data object W and a number of data pieces that can be lost F, and given a set of tolerable failure scenarios that the data object is able to survive, the rule engine 214 generates the maximum number of data centers, racks, and nodes used to store a data object (i.e., the hierarchy rule).
- the generation of the hierarchy rule does not relate to the hierarchical deployment configuration of a distributed storage system. Therefore, it is possible that the hierarchy rule may not be compatible with the hierarchical deployment configuration.
- the spreading module 216 receives this hierarchy rule and identifies that the actual hierarchical deployment configuration of the distributed storage system includes only two data centers to fulfill this rule. Because of the maximum number of 5 on the data center level of the hierarchy rule, at most 10 data pieces can be stored between the two data centers, which is less than 18 data pieces selected in the spreading policy. As a result, the spreading module 216 determines that the hierarchy rule is incompatible with the hierarchical deployment of the system because the entire data object cannot be stored according to the user-selected protection level.
- the spreading module 216 determines that the hierarchy rule corresponding to a spreading policy cannot be fulfilled by the hierarchical layout or deployment of the distributed storage system, the spreading module 216 communicates with the user interface engine 218 to notify the user of the incompatibility and instruct the user to modify at least one of the spreading policy and the protection level such that the hierarchy rule is compatible with the hierarchical deployment configuration.
- the user also specifies a protection level that the first data object survive a single node failure.
- the rule engine 214 determines that the hierarchy rule is [100, 100, 2, 1].
- the number 2 associated with the node level in [100, 100, 2] indicates that each node can use up to two data storage devices to store two pieces of data.
- the spreading module 216 determines that multiple ways to store the pieces of the first data object are possible. For example, the spreading module 216 may determine, to use both data centers at the data center level, the three racks at the rack level, and any six of the seven nodes at the node level to store the six pieces of data with each node storing one piece of data.
- the spreading module 216 may determine to use the two data centers at the data center level, two racks out of the three racks (e.g., Rack 0 and Rack 2) at the rack level, and any six data storage devices of four nodes at the node level to store the six pieces of data (e.g., BS1 and BS2 of Node 0, BS3 of Node 1, BS8 and BS9 of Node 4, and BS14 of Node 7).
- This example illustrates one advantage of the distributing algorithm described herein, that is, a user can require the data to be stored under a certain protection level (e.g., failure scenarios) regardless of the actual layout of the storage system and where the data is stored.
- the spreading module 216 cooperates with other modules/engines to fulfill the user's requirement by actually distributing the data in the system and providing the required protection to the data.
- Another advantage of the distributing algorithm described herein is that expansion of the system 100 will not invalidate the protection level applied to the data already stored in the system 100 .
- the spreading module 216 may be configured as a means for distributing erasure-encoded data of a first data object across first and second portions of a distributed storage system using a hierarchy rule corresponding to a spreading policy based on a set of tolerable failures from which the first data object can be recovered.
- the tolerable failures may be induced by intentionally disabling or power-off one or more portions of the distributed storage system as further described below.
- FIG. 4 illustrates an example hierarchic tree structure 400 of a distributed storage system is shown.
- the example of FIG. 4 includes an architecture of storage elements similar to structured storage systems having a specific quantity of nodes per rack and a specific quantity of storage devices per node.
- One such example may include an ActiveScaleTM X100 by Western Digital Technologies, Inc., which may be configured as a single rack including six nodes, with each node including ninety-eight storage devices.
- each group of fourteen storage devices may be configured in a group known as a ‘sled.’
- the four levels of a hierarchical configuration of the distributed storage system form a tree structure, as described above.
- an alternative embodiment described herein further includes a grouping or “sled” of, for example, fourteen devices, BS1-BS14, BS15-BS28, BS29-BS42, BS43-BS56, BS57-BS70, BS71-BS84, and BS85-BS98.
- the storage elements e.g., data storage devices or block stores
- fourteen data storage devices are grouped into seven sub-nodes 420 . 1 - 420 . 42 .
- next upper-middle level includes six nodes 418 . 1 - 418 . 6 , with each connected to seven sub-nodes 420 (e.g., sleds).
- the storage nodes 418 are grouped according to their respective racks 406 .
- six storage nodes are grouped into one rack 406 . 1 at the rack level.
- the racks 406 are not grouped according to their respective data centers 404 .
- Each entity in the hierarchy 400 has a unique name and a unique identifier.
- An entity can be a data center, a rack, a storage node, sub-node, and a data storage device, etc.
- the distributed storage system 400 includes a single rack 406 . 1 , which includes six nodes (e.g., JBODs) 418 .
- Each of the nodes 418 includes ninety-eight devices (BS) spread over seven sub-nodes (e.g., sleds) 420 , with each sub-node including fourteen devices BS.
- BS ninety-eight devices
- the total power consumption is the sum of the power consumption of all the individual units at the various levels (e.g., racks, nodes, sub-nodes, devices, storage servers, storage devices, networking switches, etc.).
- a spreading policy (spreading width (W) and maximum concurrent fault tolerance (F)) of W/F or 18/5, and a hierarchy rule are selected in such a way that a first part (e.g., one or more of the racks, nodes, sub-nodes, or devices) of the distributed storage system 400 may be disabled or powered-off, while not affecting the durability and availability of the stored data. Powering-off a portion of the distributed storage system 400 using the node disable signal 430 or the sub-node disable signal 440 can result in a reduction of the total power consumption of the storage system.
- the ‘4’ at storage node level in the hierarchy rule means that each storage node (e.g., JBOD) 418 can have a maximum of 4 chunks of the encoded object data stored therein. Given that the maximum concurrent fault tolerance (F) is equal to 5, this means that we are still able to read or write all of the data when one storage node (e.g., JBOD) 418 is powered-off by the node disable signal 430 .
- F concurrent fault tolerance
- the node disable signal 430 and the sub-node disable signal 440 may be controlled by the power reduction module 220 of FIG. 2 .
- the power reduction module 220 which may include instructions stored in memory 204 and executed by processor 202 in some embodiments, may cause control signals to be generated over bus or software communication mechanism 222 .
- the power reduction module 220 may disable the first portion of the distributed storage system that includes a first portion of the erasure-encoded data of the first data object, wherein the first portion of the distributed storage system is determined according to the spreading policy and the hierarchy rule identifying the set of tolerable failures, the second portion of the distributed storage system including portions of the erasure-encoded data configured to recreate the first data object.
- one storage node 418 such as storage node 418 . 1
- the storage node 418 . 1 may be powered-on with a different one of the storage nodes 418 . 2 - 418 . 6 being powered-off.
- This sequencing of powering-off a different one of the storage nodes 418 may occur according to an advantageous time period based on various factors including traffic load on the distributed storage system 400 , age or calculated reliability of specific ones of the storage nodes 418 , and other factors that contribute to the overall reliability of the distributed storage system 400 .
- choosing a spreading policy and a hierarchy rule to allow a first portion of the distributed storage system to be disabled or powered-off is closely related to the erasure-encoding overhead.
- a portion of the distributed storage system while maintaining data durability and availability may include a modification to the spreading policy, which in turn, may further increase the encoding overhead resulting in an increase in power consumption. Accordingly, a practical selection of a spreading policy and hierarchy rule that results in an overall power reduction that is greater than the increase in power consumption by the increased overhead is useful.
- the distributed storage system 400 may disable (e.g., power-down) one of the storage nodes 418 without modifying storage overhead.
- the net power reduction in the distributed storage system 400 results in an approximate power savings, for example, of about 10% or more.
- an exemplary spreading policy and an exemplary hierarchy rule may be determined to accommodate the concurrent disabling (e.g., powering-off) of two storage nodes 418 in the distributed storage system 400 .
- Such an arrangement in the above-identified ActiveScaleTM X100 system may result in a reduction in power consumption of approximately 24%-26% when compared to a ‘fully powered on’ exemplary configuration.
- a system would need to use a 16/8 spreading policy and a hierarchy rule of [100, 100, 4, 1]. This allows writing 4 chunks to each of the remaining storage nodes 418 when two of the storage nodes 418 are powered off.
- Such a configuration also ensures that a maximum of eight chunks are unavailable when two storage nodes 418 are powered off, as required to ensure data availability.
- the storage overhead of such an erasure-encoding policy is 2.00. This means that we are increasing the storage overhead by ⁇ 45% (1.38 to 2.00), to accomplish a power reduction of 24-26%.
- Such a configuration in fact, increases power consumption and would not be an acceptable solution.
- improved net power reduction can typically be achieved by constructing a spreading policy and hierarchy rule that allows multiple orthogonal/independent parts of a distributed storage system to be concurrently powered off.
- power consumption may be reduced by 12%-13% without changing the storage overhead by disabling (powering-off) one storage node 418 at a time, as described above.
- Further power reduction may be reduced by disabling one sub-node 420 (e.g., one sled) at a time in the remaining powered storage nodes 418 .
- a different spreading policy for example, a spreading policy of 14/4, which has a storage overhead of 1.4.
- Such a configuration also requires a different hierarchy rule, and may even use different hierarchy levels. In the previous examples, there were four hierarchy levels: data center, rack, storage nodes, and storage devices. With the inclusion of sub-nodes 420 (e.g., sleds), the encoded data may be evenly spread across the sub-nodes. Therefore, the hierarchy then includes five levels: data center, rack, storage nodes, sub-nodes, and devices.
- a hierarchy rule may become [100, 100, 3, 1, 1].
- Such a configuration increases the storage overhead by 1.4%, but results in a 23% reduction of power consumption when compared to the ‘always on’ case. Accordingly, the latter example of a power consumption reduction of 23% using a spreading policy of 14/4, and a five-level hierarchy rule, results in an additional 12% of power reduction. The reduction of power consumption is clearly higher than the increase in storage overhead resulting in an appreciable net gain to the system.
- FIG. 5 is a flowchart of an example method 500 for reducing power consumption in a distributed data storage system using a hierarchy rule that is generated based on a spreading policy and a set of tolerable failures which may be induced for the purposes of reducing power consumption.
- the system and method distribute erasure-encoded data of a first data object in a distributed storage system using a hierarchy rule corresponding to a spreading policy based on a set of tolerable failures from which the first data object can be recovered.
- the hierarchy rule is based on a hierarchy of one or more of data centers, racks, nodes, and devices in the distributed storage system.
- the hierarchy further includes one or more sub-nodes located between the nodes and the devices in the hierarchy, and the first portion includes at least one of the sub-nodes.
- the system and method disable a first portion of the distributed storage system that includes a first portion of the erasure-encoded data, wherein the first portion is determined according to the spreading policy and the hierarchy rule identifying the set of tolerable failures.
- disabling a first portion of the distributed storage system includes suspending power to the first portion of the distributed storage system.
- the first portion includes at least one of the nodes.
- the distributed storage system includes at least a first node and a second node, and the disabling the first portion alternates between the first node and the second node.
- the distributed storage system includes at least a first sub-node and a second sub-node, and the disabling the first portion alternates between the first sub-node and the second sub-node.
- a system and/or method may operate to distribute erasure-encoded data of a first data object in a distributed storage system using a hierarchy rule corresponding to a spreading policy based on a set of tolerable failures induced by powering-down one or more portions of the distributed storage system. The remaining portions of the distributed storage system may be used to recover the first data object.
- a first portion of the distributed storage system may include a first portion of the erasure-encoded data. The first portion may be determined according to the spreading policy and the hierarchy rule identifying the set of tolerable failures. The first data may be read from or written to a second portion of the distributed storage system that remains enabled, wherein the second portion including portions of the erasure-encoded data configured to recreate the first data.
- a process can generally be considered a self-consistent sequence of steps leading to a result.
- the steps may involve physical manipulations of physical quantities. These quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. These signals may be referred to as being in the form of bits, values, elements, symbols, characters, terms, numbers or the like.
- the disclosed technologies may also relate to an apparatus for performing the operations herein.
- This apparatus may be specially constructed for the required purposes, or it may include a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
- a computer program may be stored in a computer-readable storage medium, for example, but is not limited to, any type of data storage device including floppy disks, optical disks, compact disc read-only memories (CD-ROMs), and magnetic disks, read-only memories (ROMs), random-access memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memories including universal serial bus (USB) keys with non-volatile memory or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
- USB universal serial bus
- the disclosed technologies can take the form of a hardware implementation, a software implementation or an implementation containing both hardware and software elements.
- the technology is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- a computing system or data processing system suitable for storing and/or executing program code will include at least one processor (e.g., a hardware processor) coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
- I/O devices including but not limited to keyboards, displays, pointing devices, etc.
- I/O controllers can be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
- Modems, cable modems and Ethernet cards are just a few of the currently available types of network adapters.
- modules, routines, features, attributes, methodologies, and other aspects of the present technology can be implemented as software, hardware, firmware, or any combination of the three.
- a component an example of which is a module
- the component can be implemented as a standalone program, as part of a larger program, as a plurality of separate programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in every and any other way known now or in the future in computer programming.
- the present techniques and technologies are in no way limited to implementation in any specific programming language, or for any specific operating system or environment. Accordingly, the disclosure of the present techniques and technologies is intended to be illustrative, but not limiting.
Abstract
Description
Claims (18)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/667,670 US11599283B2 (en) | 2019-10-29 | 2019-10-29 | Power reduction in distributed storage systems |
PCT/US2020/024792 WO2021086434A1 (en) | 2019-10-29 | 2020-03-25 | Power reduction in distributed storage systems |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/667,670 US11599283B2 (en) | 2019-10-29 | 2019-10-29 | Power reduction in distributed storage systems |
Publications (2)
Publication Number | Publication Date |
---|---|
US20210124513A1 US20210124513A1 (en) | 2021-04-29 |
US11599283B2 true US11599283B2 (en) | 2023-03-07 |
Family
ID=75585988
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/667,670 Active 2040-09-27 US11599283B2 (en) | 2019-10-29 | 2019-10-29 | Power reduction in distributed storage systems |
Country Status (2)
Country | Link |
---|---|
US (1) | US11599283B2 (en) |
WO (1) | WO2021086434A1 (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7146524B2 (en) | 2001-08-03 | 2006-12-05 | Isilon Systems, Inc. | Systems and methods for providing a distributed file system incorporating a virtual hot spare |
US7685126B2 (en) | 2001-08-03 | 2010-03-23 | Isilon Systems, Inc. | System and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system |
US7747736B2 (en) | 2006-06-05 | 2010-06-29 | International Business Machines Corporation | Rule and policy promotion within a policy hierarchy |
US20110213994A1 (en) | 2010-02-26 | 2011-09-01 | Microsoft Corporation | Reducing Power Consumption of Distributed Storage Systems |
US8156368B2 (en) | 2010-02-22 | 2012-04-10 | International Business Machines Corporation | Rebuilding lost data in a distributed redundancy data storage system |
US20120166726A1 (en) * | 2010-12-27 | 2012-06-28 | Frederik De Schrijver | Hierarchical, distributed object storage system |
US20130060999A1 (en) | 2011-09-01 | 2013-03-07 | Waremax Electronics Corp. | System and method for increasing read and write speeds of hybrid storage unit |
US8572330B2 (en) | 2005-12-19 | 2013-10-29 | Commvault Systems, Inc. | Systems and methods for granular resource management in a storage network |
US20130286579A1 (en) * | 2010-12-27 | 2013-10-31 | Amplidata Nv | Distributed object storage system comprising low power storage nodes |
US20160070617A1 (en) | 2014-09-08 | 2016-03-10 | Cleversafe, Inc. | Maintaining a desired number of storage units |
US20160246676A1 (en) | 2015-02-20 | 2016-08-25 | Netapp, Inc. | Methods for policy-based hierarchical data protection and devices thereof |
US9690660B1 (en) | 2015-06-03 | 2017-06-27 | EMC IP Holding Company LLC | Spare selection in a declustered RAID system |
US9716617B1 (en) | 2016-06-14 | 2017-07-25 | ShieldX Networks, Inc. | Dynamic, load-based, auto-scaling network security microservices architecture |
US10078552B2 (en) | 2016-12-29 | 2018-09-18 | Western Digital Technologies, Inc. | Hierarchic storage policy for distributed object storage systems |
-
2019
- 2019-10-29 US US16/667,670 patent/US11599283B2/en active Active
-
2020
- 2020-03-25 WO PCT/US2020/024792 patent/WO2021086434A1/en active Application Filing
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7685126B2 (en) | 2001-08-03 | 2010-03-23 | Isilon Systems, Inc. | System and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system |
US8112395B2 (en) | 2001-08-03 | 2012-02-07 | Emc Corporation | Systems and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system |
US7146524B2 (en) | 2001-08-03 | 2006-12-05 | Isilon Systems, Inc. | Systems and methods for providing a distributed file system incorporating a virtual hot spare |
US8572330B2 (en) | 2005-12-19 | 2013-10-29 | Commvault Systems, Inc. | Systems and methods for granular resource management in a storage network |
US7747736B2 (en) | 2006-06-05 | 2010-06-29 | International Business Machines Corporation | Rule and policy promotion within a policy hierarchy |
US8156368B2 (en) | 2010-02-22 | 2012-04-10 | International Business Machines Corporation | Rebuilding lost data in a distributed redundancy data storage system |
US20110213994A1 (en) | 2010-02-26 | 2011-09-01 | Microsoft Corporation | Reducing Power Consumption of Distributed Storage Systems |
US8370672B2 (en) | 2010-02-26 | 2013-02-05 | Microsoft Corporation | Reducing power consumption of distributed storage systems |
US20130286579A1 (en) * | 2010-12-27 | 2013-10-31 | Amplidata Nv | Distributed object storage system comprising low power storage nodes |
US20120166726A1 (en) * | 2010-12-27 | 2012-06-28 | Frederik De Schrijver | Hierarchical, distributed object storage system |
US20130060999A1 (en) | 2011-09-01 | 2013-03-07 | Waremax Electronics Corp. | System and method for increasing read and write speeds of hybrid storage unit |
US20160070617A1 (en) | 2014-09-08 | 2016-03-10 | Cleversafe, Inc. | Maintaining a desired number of storage units |
US20160246676A1 (en) | 2015-02-20 | 2016-08-25 | Netapp, Inc. | Methods for policy-based hierarchical data protection and devices thereof |
US9690660B1 (en) | 2015-06-03 | 2017-06-27 | EMC IP Holding Company LLC | Spare selection in a declustered RAID system |
US9716617B1 (en) | 2016-06-14 | 2017-07-25 | ShieldX Networks, Inc. | Dynamic, load-based, auto-scaling network security microservices architecture |
US10078552B2 (en) | 2016-12-29 | 2018-09-18 | Western Digital Technologies, Inc. | Hierarchic storage policy for distributed object storage systems |
Non-Patent Citations (6)
Title |
---|
A Distributed Spin-Down Algorithm for an Object-Based Storage Device withWrite Redirection, Timothy Bisson, Published Jan. 2006, available at https://users.soe.ucsc.edu/˜sbrandt/papers/WDAS06b.pdf. |
B2 Resiliency, Durability and Availability, Christopher, Published Jul. 17, 2018, available at https://help.backblaze.com/hc/en-us/articles/218485257-B2-Resiliency-Durability-and-Availability. |
Erasure coding for cold storage, Packt Hub, Nick Frisk, Published Jun. 7, 2017, available at https://hub.packtpub.com/erasure-coding-cold-storage/. |
International Search Report and Written Opinion, PCT/US2017/049507, dated Nov. 8, 2017 (14 pages). |
International Search Report and Written Opinion, PCT/US2020/024792, dated Jun. 23, 2020 (9 pages). |
Mastering Ceph: Redefine your storage system, Nick Frisk, Published May 2017 by Packt Publishing Ltd. |
Also Published As
Publication number | Publication date |
---|---|
WO2021086434A1 (en) | 2021-05-06 |
US20210124513A1 (en) | 2021-04-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10379951B2 (en) | Hierarchic storage policy for distributed object storage systems | |
US11954002B1 (en) | Automatically provisioning mediation services for a storage system | |
US10318189B2 (en) | Determining respective mappings for logically defined dispersed storage units | |
US20170075741A1 (en) | Prioritizing Data Reconstruction in Distributed Storage Systems | |
US10176045B2 (en) | Internet based shared memory in a distributed computing system | |
CN105335168A (en) | System, method and apparatus for remotely configuring operating system | |
US20230254127A1 (en) | Sharing Encryption Information Amongst Storage Devices In A Storage System | |
US20230195444A1 (en) | Software Application Deployment Across Clusters | |
US20230108184A1 (en) | Storage Modification Process for a Set of Encoded Data Slices | |
CN110419029B (en) | Method for partially updating data content in distributed storage network | |
US10169392B2 (en) | Persistent data structures on a dispersed storage network memory | |
US10469406B2 (en) | Partial task execution in a dispersed storage network | |
US10474395B2 (en) | Abstracting namespace mapping in a dispersed storage network through multiple hierarchies | |
US10289326B2 (en) | Optimized data layout for object store system | |
US11599283B2 (en) | Power reduction in distributed storage systems | |
US20230205591A1 (en) | System Having Dynamic Power Management | |
US10540120B2 (en) | Contention avoidance on associative commutative updates | |
US10057351B2 (en) | Modifying information dispersal algorithm configurations in a dispersed storage network | |
CN115202589A (en) | Placement group member selection method, device, equipment and readable storage medium | |
US11061834B2 (en) | Method and system for facilitating an improved storage system by decoupling the controller from the storage medium | |
Datta et al. | Storage codes: Managing big data with small overheads | |
US10585715B2 (en) | Partial task allocation in a dispersed storage network | |
US20230195535A1 (en) | Containerized Application Deployment to Use Multi-Cluster Computing Resources | |
US20230244569A1 (en) | Recover Corrupted Data Through Speculative Bitflip And Cross-Validation | |
Yin et al. | P-Schedule: Erasure Coding Schedule Strategy in Big Data Storage System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: WESTERN DIGITAL TECHNOLOGIES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BLYWEERT, STIJN;REEL/FRAME:051040/0779 Effective date: 20191029 |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., AS AGENT, ILLINOIS Free format text: SECURITY INTEREST;ASSIGNOR:WESTERN DIGITAL TECHNOLOGIES, INC.;REEL/FRAME:051717/0716 Effective date: 20191114 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PRE-INTERVIEW COMMUNICATION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
AS | Assignment |
Owner name: WESTERN DIGITAL TECHNOLOGIES, INC., CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST AT REEL 051717 FRAME 0716;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:058965/0385 Effective date: 20220203 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., ILLINOIS Free format text: PATENT COLLATERAL AGREEMENT - A&R LOAN AGREEMENT;ASSIGNOR:WESTERN DIGITAL TECHNOLOGIES, INC.;REEL/FRAME:064715/0001 Effective date: 20230818 |