WO2017107095A1 - Technologies for adaptive erasure code - Google Patents

Technologies for adaptive erasure code Download PDF

Info

Publication number
WO2017107095A1
WO2017107095A1 PCT/CN2015/098419 CN2015098419W WO2017107095A1 WO 2017107095 A1 WO2017107095 A1 WO 2017107095A1 CN 2015098419 W CN2015098419 W CN 2015098419W WO 2017107095 A1 WO2017107095 A1 WO 2017107095A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage nodes
storage
node
subset
computing node
Prior art date
Application number
PCT/CN2015/098419
Other languages
French (fr)
Inventor
Hongzhou Zhang
Qihua DAI
Xiaodong Liu
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to PCT/CN2015/098419 priority Critical patent/WO2017107095A1/en
Publication of WO2017107095A1 publication Critical patent/WO2017107095A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems

Definitions

  • Distributed data object storage clusters typically utilize a plurality of storage nodes (i.e., computing devices capable of storing a plurality of data objects) to provide enhanced performance and availability.
  • Such storage clusters can be used for data object replication (i.e., data object redundancy/backup) , for example.
  • the storage clusters are not visible to a client computing device that is either transmitting data objects to be stored on the storage nodes or receiving stored data objects from the storage nodes.
  • incoming requests e.g., network packets including data objects or data object requests
  • the proxy may be a computing device that is configured to act as an intermediary for the incoming client requests.
  • the proxy computing node can be configured to select which storage node to retrieve the requested data object from.
  • Erasure code (a. k. a. forward error correction (FEC) code) is a data protection method usable to guarantee data availability/durability for cloud storage by transforming a message (i.e., data) into a longer message such that the original message can be recovered from a subset of symbols in which the longer message is broken into.
  • the data object to be replicated is broken into several data fragments that are encoded into a number of parity pieces based on the implemented erasure code.
  • the data fragments and a number of redundant parity pieces are then stored across a plurality of different locations (e.g., storage nodes, storage disks, etc. ) . Accordingly, in the event of data corruption, the corrupted data can be rebuilt by using information form the data fragments and parity pieces (i.e., erasure code elements) .
  • proxy computing nodes treat all of the storage nodes the same and attempt to distribute the erasure code elements evenly across the storage nodes, despite each of the storage nodes generally having different capabilities (e.g., processor capabilities, memory capacity, disk type, configurations, bandwidth, etc. ) and geographic locations.
  • storage nodes with a higher capacity can end up receiving more requests when traditional request distribution techniques are used (e.g., random selection of one of the storage nodes, round-robin across chosen storage nodes, etc. ) .
  • request distribution may lead to a performance bottleneck and/or leave other storage nodes underutilized or in an idle state.
  • FIG. 1 is a simplified block diagram of at least one embodiment of a system for implementing adaptive erasure code that includes a proxy computing node communicatively coupled to a storage node cluster;
  • FIG. 2 is a simplified block diagram of at least one embodiment of the proxy computing node of the system of FIG. 1;
  • FIG. 3 is a simplified block diagram of at least one embodiment of the storage node of the system of FIG. 1;
  • FIG. 4 is a simplified block diagram of at least one embodiment of an environment that may be established by the proxy computing node of FIGS. 1 and 2;
  • FIG. 5 is a simplified block diagram of at least one embodiment of an environment that may be established by one of the storage nodes of FIGS. 1 and 3;
  • FIG. 6 is a simplified flow diagram of at least one embodiment of a method for determining latency in a storage node cluster that may be executed by one of the storage nodes of FIGS. 1 and 3;
  • FIG. 7 is a simplified block diagram of at least one embodiment of a storage node cluster
  • FIG. 8 is a simplified illustration of at least one embodiment of a latency table usable to select a subset of storage nodes from the storage node cluster of FIG. 7;
  • FIG. 9 is a simplified block diagram of another embodiment of a storage node cluster
  • FIG. 10 is a simplified illustration of at least one embodiment of a latency table usable to select a subset of storage nodes from the storage node cluster of FIG. 9;
  • FIG. 11 is a simplified flow diagram of at least one embodiment of a method for determining a subset of storage nodes of a storage node cluster at which to store an erasure code element that may be executed by the proxy computing node of FIGS. 1 and 2, or by one of the storage nodes of FIGS. 1 and 3; and
  • references in the specification to “one embodiment, ” “an embodiment, ” “an illustrative embodiment, ” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include that particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
  • items included in a list in the form of “at least one of A, B, and C” can mean (A) ; (B) ; (C) : (A and B) ; (Aand C) ; (B and C) ; or (A, B, and C) .
  • items listed in the form of “at least one of A, B, or C” can mean (A) ; (B) ; (C) : (Aand B) ; (Aand C) ; (B and C) ; or (A, B, and C) .
  • the disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof.
  • the disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media (e.g., memory, data storage, etc. ) , which may be read and executed by one or more processors.
  • a machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device) .
  • a system 100 for implementing adaptive erasure code includes an endpoint computing node 102 communicatively coupled to a proxy computing node 106 of a storage node cluster 108 via a network 104. As shown, the proxy computing node 106 is further communicatively coupled to a plurality of distributed storage nodes 110. It should be appreciated that the proxy computing node 106 and the storage nodes 110 of the storage node cluster 108 may be architected as any type of distributed storage cluster, such as a distributed block storage cluster, a distributed file storage cluster, a distributed object storage cluster, etc.
  • the storage nodes 110 includes a plurality of storage nodes (see, e.g., storage node (1) 112, storage node (2) 114, and storage node (n) 116) capable of storing data objects (e.g., replicas of the data objects for providing redundant backup) amongst two or more of the storage nodes 110.
  • data objects e.g., replicas of the data objects for providing redundant backup
  • the endpoint computing node 102 transmits a network packet (i.e., via the network 104) to the proxy computing node 106.
  • the network packet includes a data object (i.e., in a payload of the network packet) to be stored at a plurality of the storage nodes 110.
  • the data object may include any type of data, such as a file, a block of data, an object, etc.
  • the proxy computing node 106 determines a subset of the storage nodes 110 at which to store at least a portion the received data object.
  • the proxy computing node 106 is configured to determine the subset of storage nodes based on the erasure code being implemented to encode the data object and one or more criteria, including a latency associated with each of the storage nodes 110 and/or a priority associated with each of the storage nodes 110. It should be appreciated that the total number of storage nodes 110 of the subset is determined based on the erasure code.
  • each of the storage nodes 110 may determine a latency value (e.g., a round-trip-time) between themselves and the other storage nodes 110 of the cluster 108, which is usable to determine the subset of the storage nodes 110.
  • the latencies determined by each of the storage nodes 110 may be stored in a latency table stored locally at each of the storage nodes 110 and/or aggregated (i.e., the latencies stored) at the proxy computing node 106 into a master latency table that includes the latency values for all of the storage nodes 110 of the cluster 108 communicatively coupled via the proxy computing node 106.
  • the proxy computing node 106 checks against the latencies of the latency table to identify which of the storage nodes 110 (i.e., the subset) should be selected to store at least a portion of the data object.
  • a priority table including a priority value assigned to each of the storage nodes 110 may be referenced to identify which of the storage nodes 110 (i.e., one of the storage nodes 110 having a priority higher than or equal to the other storage nodes 110) should be selected to store at least a portion of the data object.
  • the network 104 may be embodied as any type of wired and/or wireless communication network, including cellular networks (e.g., Global System for Mobile Communications (GSM) , Long-Term Evolution (LTE) , etc. ) , telephony networks, digital subscriber line (DSL) networks, cable networks, local area networks (LANs) or wide area networks (WANs) , global networks (e.g., the Internet) , or any combination thereof. It should be appreciated that the network 104 may serve as a centralized network and, in some embodiments, may be communicatively coupled to another network (e.g., the Internet) .
  • GSM Global System for Mobile Communications
  • LTE Long-Term Evolution
  • DSL digital subscriber line
  • LANs local area networks
  • WANs wide area networks
  • the Internet global networks
  • the network 104 may include a variety of other network devices (not shown) , virtual and physical, such as routers, switches, network hubs, servers, storage devices, compute devices, etc., as needed to facilitate communication between the endpoint computing node 102 and the proxy computing node 106 via the network 104.
  • network devices not shown
  • virtual and physical such as routers, switches, network hubs, servers, storage devices, compute devices, etc.
  • the illustrative proxy computing node 106 includes a processor 202, an input/output (I/O) subsystem 204, a memory 206, a data storage device 208, and communication circuitry 210.
  • the proxy computing node 106 may include other or additional components, such as those commonly found in a computing device (e.g., one or more input/output peripheral devices) , in other embodiments.
  • one or more of the illustrative components may be incorporated in, or otherwise form a portion of, another component.
  • the memory 206, or portions thereof may be incorporated in the processor 202 in some embodiments.
  • one or more of the illustrative components may be omitted from the proxy computing node 106.
  • the processor 202 may be embodied as any type of processor capable of performing the functions described herein.
  • the processor 202 may be embodied as a single or multi-core processor, digital signal processor (DSP) , microcontroller, or other processor or processing/controlling circuit.
  • DSP digital signal processor
  • the memory 206 may be embodied as any type of volatile or non-volatile memory or data storage capable of performing the functions described herein. In operation, the memory 206 may store various data and software used during operation of the proxy computing node 106, such as operating systems, applications, programs, libraries, and drivers.
  • the memory 206 is communicatively coupled to the processor 202 via the I/O subsystem 204, which may be embodied as circuitry and/or components to facilitate input/output operations with the processor 202, the memory 206, and other components of the proxy computing node 106.
  • the I/O subsystem 204 may be embodied as, or otherwise include, memory controller hubs, input/output control hubs, firmware devices, communication links (i.e., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc. ) and/or other components and subsystems to facilitate the input/output operations.
  • the I/O subsystem 204 may form a portion of a system-on-a-chip (SoC) and be incorporated, along with the processor 202, the memory 206, and other components of the proxy computing node 106, on a single integrated circuit chip.
  • SoC system-on-a-chip
  • the data storage device 208 may be embodied as any type of device or devices configured for short-term or long-term storage of data such as, for example, memory devices and circuits, memory cards, hard disk drives, solid-state drives, or other data storage devices. It should be appreciated that the data storage device 208 and/or the memory 206 (e.g., the computer-readable storage media) may store various data as described herein, including operating systems, applications, programs, libraries, drivers, instructions, etc., capable of being executed by a processor (e.g., the processor 202) of the proxy computing node 106.
  • a processor e.g., the processor 202
  • the illustrative communication circuitry 210 includes a network interface controller (NIC) 212.
  • the NIC 212 may be embodied as one or more add-in-boards, daughtercards, network interface cards, controller chips, chipsets, or other devices that may be used by the proxy computing node 106.
  • the NIC 212 may be integrated with the processor 202, embodied as an expansion card coupled to the I/O subsystem 204 over an expansion bus (e.g., PCI Express) , part of an SoC that includes one or more processors, or included on a multichip package that also contains one or more processors.
  • PCI Express Peripheral Component Interconnect Express
  • the NIC 212 may include a local processor (not shown) and/or a local memory (not shown) that are both local to the NIC 212.
  • the local processor of the NIC 212 may be capable of performing one or more of the functions (e.g., replication, network packet processing, etc. ) as described herein.
  • the local memory of the NIC 212 may be capable of storing data local to the NIC 212. It should be appreciated that the functionality of the NIC 212 may be integrated into one or more components of the proxy computing node 106 at the board level, socket level, chip level, and/or other levels.
  • the endpoint computing node 102 may be embodied as any type of computation or computing device capable of performing the functions described herein, including, without limitation, a computer, a mobile computing device (e.g., smartphone, tablet, laptop, notebook, wearable, etc. ) , a server (e.g., stand-alone, rack-mounted, blade, etc. ) , a network appliance (e.g., physical or virtual) , a web appliance, a distributed computing system, a processor-based system, and/or a multiprocessor system. Similar to the illustrative proxy computing node 106 of FIG.
  • the endpoint computing node 102 may include a processor, an I/O subsystem, a memory, a data storage device, and/or communication circuitry, which are not shown for clarity of the description. As such, further descriptions of the like components are not repeated herein with the understanding that the description of the corresponding components provided above in regard to the proxy computing node 106 applies equally to the corresponding components of the endpoint computing node 102.
  • the illustrative storage nodes 110 includes a first storage node, which is designated as storage node (1) 112, a second storage node, which is designated as storage node (2) 114, and a third storage node, which is designated as storage node (N) 116 (i.e., the “Nth” storage node of the storage nodes 110, wherein “N” is a positive integer and designates one or more additional storage nodes) .
  • Each of the storage nodes 110 may be embodied as any type of storage device capable of performing the functions described herein, including, without limitation, a server (e.g., stand-alone, rack-mounted, blade, etc. ) , a network appliance (e.g., physical or virtual) , a high-performance computing device, a web appliance, a distributed computing system, a computer, a processor-based system, and/or a multiprocessor system.
  • a server e.g., stand-alone, rack-mounted, blade, etc.
  • the illustrative one of the storage nodes 110 similar to the illustrative proxy computing node 106 of FIG. 2, includes a processor 302, an I/O subsystem 304, a memory 306, a data storage device 308, and communication circuitry 310 that includes a NIC 312. Accordingly, further descriptions of the like components are not repeated herein with the understanding that the description of the corresponding components provided above in regard to the proxy computing node 106 applies equally to the corresponding components of each of the storage nodes 110.
  • the proxy computing node 106 and/or one or more of the storage nodes 110 may perform the functions described herein.
  • the proxy computing node 106 may function as a monitor server, saving the master latency table that includes latency values for each of the storage nodes 110 of the cluster 108.
  • the proxy computing node 106 may be configured to determine the subset of storage nodes 110 at which to store at least a portion of the data object (i.e., an encoded portion of the data object) , as described below in FIG. 4.
  • each storage node has its own latency table that only includes a latency value between that storage node and the other storage nodes 110 of the cluster 108.
  • the proxy computing node 106 may be configured transmit the received data object to be stored to one of the storage nodes 110, which may then determine the subset of storage nodes 110 at which to store at least a portion of the data object (i.e., an encoded portion of the data object) , as described below in FIG. 5.
  • the proxy computing node 106 establishes an environment 400 during operation.
  • the illustrative environment 400 includes a network communication module 410, a storage node identification module 420, an erasure code implementation module 430, and a storage node determination module 440.
  • Each of the modules, logic, and other components of the environment 400 may be embodied as hardware, software, firmware, or a combination thereof.
  • each of the modules, logic, and other components of the environment 400 may form a portion of, or otherwise be established by, the processor 202, the communication circuitry 210 (e.g., the NIC 212) , and/or other hardware components of the proxy computing node 106.
  • one or more of the modules of the environment 400 may be embodied as circuitry or a collection of electrical devices (e.g., network communication circuitry 410, storage node identification circuitry 420, erasure code implementation circuitry 430, storage node determination circuitry 440, etc. ) .
  • electrical devices e.g., network communication circuitry 410, storage node identification circuitry 420, erasure code implementation circuitry 430, storage node determination circuitry 440, etc.
  • the proxy computing node 106 includes storage node data 402, erasure code data 404, latency data 406, and priority data 408, each of which may be accessed by the various modules and/or sub-modules of the proxy computing node 106. It should be appreciated that the proxy computing node 106 may include other components, sub-components, modules, sub-modules, and/or devices commonly found in a computing node, which are not illustrated in FIG. 4 for clarity of the description.
  • the network communication module 410 is configured to facilitate inbound and outbound network communications (e.g., network traffic, network packets, network flows, etc. ) to and from the proxy computing node 106. To do so, the network communication module 410 is configured to receive and process network packets from other computing devices (e.g., the endpoint computing node 102, the storage nodes 110, and/or other computing device (s) communicatively coupled via the network 104) . Additionally, the network communication module 410 is configured to prepare and transmit network packets to another computing device (e.g., the endpoint computing node 102, the storage nodes 110, and/or other computing device (s) communicatively coupled via the network 104) . Accordingly, in some embodiments, at least a portion of the functionality of the network communication module 410 may be performed by the communication circuitry 210, and more specifically by the NIC 212.
  • the network communication module 410 may be performed by the communication circuitry 210, and more specifically by the NIC 212.
  • the storage node identification module 420 is configured to identify each of the storage nodes 110 communicatively coupled via the proxy computing node 106. In other words, the storage node identification module 420 is configured to identify a topology of the cluster 108 (see, e.g., the storage node clusters 700 and 900 of FIGS. 7 and 9, respectively) . It should be appreciated that any known technology for identifying the storage nodes 110 of the cluster 108 may be implemented by the storage node identification module 420. In some embodiments, the identified storage nodes (e.g., the topology) may be stored in the storage node data 402.
  • the erasure code implementation module 430 is configured to implement an erasure code to the data object received for storage.
  • the erasure code implementation module 430 is configured to transform or otherwise encode the received data object based on the erasure code, as well as rebuild the data object from the encoded data objects.
  • an erasure code 10 of 15 configuration, or EC 10/15 includes fifteen erasure code elements, or symbols, comprised of ten data elements, or base symbols, and five parity elements, or extra symbols.
  • the erasure code implementation module 430 is configured to encode the erasure code elements, each of which is to be stored at a different one of the subset of storage nodes 110.
  • each of the fifteen erasure code elements would be stored across fifteen different storage nodes of subset of storage nodes 110.
  • the erasure code implementation module 430 is configured to rebuild the data object (e.g., as a result of a detected corruption) .
  • the original data of the data object could be reconstructed from ten of the verified erasure code elements, or fragments.
  • the erasure code may be stored in the erasure code data 404.
  • the latency analysis module 442 may be configured to select the storage nodes 110 having the smallest latency (i.e., indicating a more proximate location as compared to the other storage nodes) to keep the erasure code elements as close as possible to a particular one of the storage nodes 110.
  • the latency table may be stored in the latency data 406.
  • the storage node determination module 440 may first perform a latency analysis (e.g., via the latency analysis module 442) to identify candidate storage nodes from the storage nodes 110 of the cluster 108, and further refine the subset of storage nodes 110 from the identified candidate storage nodes based on the priority value (e.g., via the priority analysis module 444) , such as by serving as a tie-breaker when more than one of the storage nodes 110 has an equal latency value.
  • a latency analysis e.g., via the latency analysis module 442
  • the priority value e.g., via the priority analysis module 444
  • the storage node includes storage node data 502, erasure code data 504, latency data 506, and priority data 508, each of which may be accessed by the various modules and/or sub-modules of the storage node. It should be appreciated that the storage node may include other components, sub-components, modules, sub-modules, and/or devices commonly found in a computing node, which are not illustrated in FIG. 5 for clarity of the description.
  • the erasure code implementation module 530 is configured to implement an erasure code to the data object received for storage.
  • the erasure code implementation module 530 is configured to encode the received data object based on the erasure code, as well as rebuild the data object from the encoded data objects.
  • an erasure code 4 of 6 configuration (a. k. a. EC 4/6) includes six erasure code elements comprised of four data elements and two parity elements.
  • two storage nodes could be lost or unavailable, and the original data (e.g., a file) could be recoverable for the other four storage nodes.
  • the erasure code implementation module 430 is configured to encode the erasure code elements, each of which is to be stored at a different one of six of the storage nodes 110 (i.e., the subset of storage nodes 110) . Additionally, the erasure code implementation module 430 is configured to rebuild the data object (e.g., as a result of a detected corruption) from four of the erasure code elements. In some embodiments, the erasure code may be stored in the erasure code data 504.
  • the storage node determination module 540 is configured to determine which of the storage nodes 110 at which to store each of the erasure code elements. In other words, the storage node determination module 540 is configured to determine a subset from the storage nodes 110, each of which is to receive and store a different erasure code element.
  • the storage node determination module 540 is configured to determine the subset of storage nodes 110 having a total number of storage nodes equal to the number of erasure components (i.e., six, in the above example) minus one (i.e., five, in the above example) . To determine which of the storage nodes 110 to store each of the erasure code elements, the illustrative storage node determination module 540 includes a latency analysis module 542 and/or a priority analysis module 544.
  • the latency analysis module 542 may be configured to select the storage nodes 110 having the smallest latency (i.e., indicating a more proximate location as compared to the other storage nodes) to keep the erasure code elements as close as possible to a particular one of the storage nodes 110. It should be appreciated that, in some embodiments, a copy of the latency table may be transmitted to and stored at the proxy computing node 106, such as in a master latency table maintained by the proxy computing node 106. In some embodiments, the latency table may be stored in the latency data 506.
  • the priority analysis module 544 is configured to analyze a priority table (e.g., retrieved from a policy) and determine the subset of storage nodes 110 at which to store the erasure code elements based on the priority values. For example, the priority analysis module 544 may be configured to determine the subset of the storage nodes 110 based on a priority (e.g., ranked between one and ten) ranked between high priority (e.g., a low value, the lowest being one) and a low priority (e.g., a high value, the highest being ten) .
  • a priority e.g., ranked between one and ten
  • high priority e.g., a low value, the lowest being one
  • a low priority e.g., a high value, the highest being ten
  • the storage node determination module 540 may first perform a latency analysis (e.g., via the latency analysis module 542) to identify candidate storage nodes from the storage nodes 110 of the cluster, and further refine the subset of storage nodes 110 from the identified candidate storage nodes based on the priority value (e.g., via the priority analysis module 544) , such as by serving as a tie-breaker when more than one of the storage nodes 110 has an equal latency value.
  • a latency analysis e.g., via the latency analysis module 542
  • the priority value e.g., via the priority analysis module 544
  • the latencies between each of the storage nodes 712, 714, 716, 718 at the first rack 710 may be less than the latency between one of the storage nodes 712, 714, 716, 718 at the first rack 710 and one or more of the storage nodes 722, 724, 726, 728 at the second rack 720, due to the proximate location (i.e., shorter network packet path distance) of the first rack 710 relative to the second rack 720.
  • proximate location i.e., shorter network packet path distance
  • FIG. 10 Another such embodiment of a cluster of storage nodes 900 is shown in FIG. 10.
  • the cluster of storage nodes 900 includes four additional storage nodes (i.e., storage node (H) 902, storage node (I) 904, storage node (J) 906, and storage node (K) 908) in the first rack 710 of storage nodes, each of which are also communicatively coupled to the proxy computing node 106.
  • additional storage nodes i.e., storage node (H) 902, storage node (I) 904, storage node (J) 906, and storage node (K) 908 in the first rack 710 of storage nodes, each of which are also communicatively coupled to the proxy computing node 106.
  • latency values as shown in FIG.
  • the storage node determines the latency for each of the other storage nodes 110 of the cluster 108, such as may be embodied in a like architecture of the cluster of storage nodes 700, 900.
  • the storage node may, in block 608, generate a message for each of the other storage nodes 110 determined in block 604.
  • the storage node may transmit each of the messages generated in block 608 to a corresponding one of the other storage nodes.
  • the storage node may determine the latency value for each of the storage nodes based on a measured duration of time between the storage node having transmitted the message and received a response to the message from that other storage node.
  • the storage node may use a clock of the storage node to log timestamps corresponding to when the message was transmitted from the storage node and the response to the message was received by the storage node from one of the other storage nodes that received the message. Accordingly, a round-trip-time may be determined as a function of the comparison between the logged timestamps. It should be appreciated that, in some embodiments, the latency may be determined for a single storage node of the cluster 108, such as when a new storage node is added to the storage node cluster 108, rather than for each of the other storage nodes of the cluster 108.
  • the storage node stores the determined latencies.
  • the proxy computing node 106 and/or one or more of the storage nodes 110 may perform certain functions described herein (see, e.g., the method 1100 of FIG. 11) .
  • the storage node may, in block 616, store the latencies local to the storage node, and/or the storage node may, in block 618, store the latencies in a location remote of the storage node (e.g., the proxy computing node 106) before the method 600 returns to block 602 to determine whether another latency check was initiated.
  • the proxy computing node 106 or one of the storage nodes in a storage node cluster may execute a method 1100 for determining a subset of storage nodes of a cluster at which to store an erasure code element. It should be appreciated that, as described previously, depending on the distributed storage setup of the storage nodes (e.g., symmetric versus non-symmetric distribution) , the proxy computing node 106 and/or one or more of the storage nodes of the cluster may perform the functions of the method 1100.
  • the functions of the method 1100 as described herein will be described from the perspective of one of the storage nodes (i.e., storage node (A) 712 as used herein) of the storage nodes (see, e.g., the storage node cluster 700 of FIG. 7 and the storage node cluster 900 of FIG. 9) performing the functions of the method 1100 as described herein.
  • the storage node (A) 712 may only have a list of the latency values determined by the storage node (A) 712, as indicated by the highlighted first column/row of FIG. 10, as opposed to a master latency table that includes the latency values for each of the storage nodes of the cluster.
  • the method 1100 may be embodied as various instructions stored on a computer-readable media, which may be executed by the processor 302, the NIC 312, and/or other components of the storage node (A) 712 to cause the storage node (A) 712 to perform the method 1100.
  • the computer-readable media may be embodied as any type of media capable of being read by the storage node (A) 712 including, but not limited to, the memory 306, the data storage device 308, a local memory of the NIC 312, other memory or data storage devices of the storage node (A) 712, portable media readable by a peripheral device of the storage node (A) 712, and/or other media.
  • the method 1100 begins with block 1102, in which the storage node (A) 712 determines whether a data object to be stored was received, such as via the proxy computing node 106. If so, the method 1100 continues to block 1104, wherein the storage node (A) 712 determines an erasure code to implement. It should be appreciated that, in some embodiments, the particular erasure code implemented may depend on one or more characteristics of the data associated with the data object, such as a workload type, a flow, a tuple of identifying elements, etc.
  • the storage node (A) 712 determines the subset of storage nodes based on a latency value determined between the storage node (A) 712 and each of the other storage nodes of the cluster (see, e.g., the method 600 of FIG. 6) .
  • the storage node (A) 712 is configured to select five other storage nodes of the storage nodes. In such an embodiment, based on the latency values of the master latency table 800 shown in FIG. 8 for the storage node cluster 700 of FIG.
  • the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, storage node (D) 718, storage node (G) 726, and storage node (H) 728, since each of storage node (E) 722 and storage node (F) 724 have latency values (i.e., 50ms) that exceed the other available storage nodes of the cluster.
  • the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, and storage node (D) 718, as well as two storage nodes from storage node (I) 902, storage node (J) 904, storage node (K) 906, and storage node (L) 908, since each of storage node (E) 722, storage node (F) 724, storage node (G) 726, and storage node (H) 728 have latency values (i.e., 45-50ms) that exceed the other available storage nodes of the cluster.
  • latency values i.e., 45-50ms
  • the storage node (A) 712 may only have those latency values determined by the storage node (A) 712, as indicated by the highlighted first column/row of FIGS. 8 and 10, rather than the entirety of latency values as shown in the master latency tables 800 and 1000.
  • a storage node that has 2 years or less remaining in operation may be given a priority value of 10
  • a storage node that has 1 year or less remaining in operation may be assigned a value of 5
  • a storage node that is not anticipated to be taken out of the storage node cluster may be assigned a value of 1.
  • a lower value priority indicates a higher preference of use. Accordingly, in an embodiment determining the subset of storage nodes based on the priority values will select those storage nodes with a lower priority value (i.e., a higher preference) over those storage nodes with a higher priority value (i.e., a lower preference) .
  • the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, storage node (E) 722, and storage node (F) 724, each of which having a priority value equal to one, as well as one storage nodes from storage node (G) 726 and storage node (H) 728, since each have the same priority value (i.e., 5) , as the five storage nodes of the subset to transfer an erasure code element to.
  • Storage node (D) 718, having a priority value (i.e., 10) higher than the other available storage nodes of the cluster would not be in consideration for the subset.
  • the latency determination in block 1110 and the priority determination in block 1112 may both be used.
  • the storage node (A) 712 is using an EC 3/5 configuration (i.e., the storage node (A) 712 needs to identify four other storage nodes at which to store an erasure code element)
  • the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, and storage node (D) 718, since each of those storage nodes have the three lowest latency values (i.e., 5ms, 10ms, and 10ms, respectively) .
  • Example 1 includes a proxy computing node for implementing adaptive erasure code, the proxy computing node comprising one or more processors; and one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, cause the proxy computing node to receive a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is further communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node; determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes; transform the data object based on the erasure code; and transmit a different portion of the transformed data object to each of the subset of storage nodes.
  • Example 2 includes the subject matter of Example 1, and wherein the plurality of instructions further cause the proxy computing node to receive the latency value for each of the storage nodes from each of the storage nodes.
  • Example 3 includes the subject matter of any of Examples 1 and 2, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
  • Example 4 includes the subject matter of any of Examples 1-3, and wherein to select each storage node of the subset of the plurality of storage nodes comprises to select each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
  • Example 6 includes the subject matter of any of Examples 1-5, and wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
  • Example 7 includes the subject matter of any of Examples 1-6, and wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
  • Example 8 includes the subject matter of any of Examples 1-7, and wherein the plurality of instructions further cause the proxy computing node to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
  • Example 9 includes the subject matter of any of Examples 1-8, and wherein the plurality of instructions further cause the proxy computing node to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
  • Example 10 includes the subject matter of any of Examples 1-9, and wherein to determine the subset of the plurality of storage nodes based on priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
  • Example 11 includes a method for implementing adaptive erasure code, the method comprising receiving, by a proxy computing node, a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node; determining, by the proxy computing node, a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes; transforming, by the proxy computing node, the data object based on the erasure code; and transmitting, by the proxy computing node, a different portion of the transformed data object to each of the subset of storage nodes.
  • Example 12 includes the subject matter of Example 11, and further including receiving, by the proxy computing node, the latency value for each of the storage nodes from each of the storage nodes.
  • Example 13 includes the subject matter of any of Examples 11 and 12, and wherein determining the subset of the plurality of storage nodes further comprises selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
  • Example 16 includes the subject matter of any of Examples 11-15, and wherein determining the subset of the plurality of storage nodes further comprises determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
  • Example 17 includes the subject matter of any of Examples 11-16, and wherein determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes
  • Example 18 includes the subject matter of any of Examples 11-17, and further including determining, by the proxy computing node, the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
  • Example 19 includes the subject matter of any of Examples 11-18, and further including determining, by the proxy computing node, the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
  • Example 20 includes the subject matter of any of Examples 11-19, and wherein determining the subset of the plurality of storage nodes based on priority value comprises comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
  • Example 21 includes a proxy computing node comprising a processor; and a memory having stored therein a plurality of instructions that when executed by the processor cause the proxy computing node to perform the method of any of Examples 11-20.
  • Example 23 includes a proxy computing node for implementing adaptive erasure code, the proxy computing node comprising network communication circuitry to receive a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is further communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node; storage node determination circuitry to determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes; and erasure code implementation circuitry to transform the data object based on the erasure code, wherein the network communication circuitry is further to transmit a different portion of the transformed data object to each of the subset of storage nodes.
  • Example 25 includes the subject matter of any of Examples 23 and 24, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
  • Example 29 includes the subject matter of any of Examples 23-28, and wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
  • Example 30 includes the subject matter of any of Examples 23-29, and, wherein the storage node determination circuitry is further to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
  • Example 32 includes the subject matter of any of Examples 23-31, and wherein to determine the subset of the plurality of storage nodes based on priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
  • Example 35 includes the subject matter of any of Examples 33 and 34, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
  • Example 36 includes the subject matter of any of Examples 33-35, and wherein the means for selecting each storage node of the subset of the plurality of storage nodes comprises means for selecting each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
  • Example 40 includes the subject matter of any of Examples 33-39, and further including means for determining the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
  • Example 43 includes a storage node for implementing adaptive erasure code, the storage node comprising one or more processors; and one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, cause the storage node to receive a data object from a proxy computing node communicatively coupled to the storage node in a storage node cluster, wherein the storage node is communicatively coupled to a plurality of other storage nodes of the storage node cluster; determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between the storage node and a corresponding other storage node of the plurality of storage nodes; transform the data object into a plurality of erasure code elements based on the erasure code; and transmit each of the erasure code elements to a corresponding one of the subset of the plurality of storage nodes.
  • Example 44 includes the subject matter of Example 43, and wherein the plurality of instructions further cause the storage node to (i) determine the latency value for each of the plurality of storage nodes and (ii) store, local to the storage node, the determined latency value for each of the plurality of storage nodes.
  • Example 53 includes the subject matter of any of Examples 43-52, and wherein to determine the subset of the plurality of storage nodes based on priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
  • Example 61 includes the subject matter of any of Examples 54-60, and wherein determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
  • Example 69 includes the subject matter of any of Examples 67 and 68, and wherein to determine the latency value comprises to (i) generate a message for one of the plurality of storage nodes, (ii) broadcast the generated messages to the one of the plurality of storage nodes, (iii) receive an acknowledgment from the one of the plurality of storage nodes, and (iv) determine the latency value for the one of the plurality of storage nodes based at least in part on a round-trip-time associated with an elapsed duration of time between having broadcasted the generated message and having received the acknowledgment.
  • Example 70 includes the subject matter of any of Examples 67-69, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node.
  • Example 72 includes the subject matter of any of Examples 67-71, and wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.
  • Example 77 includes the subject matter of any of Examples 67-76, and wherein to determine the subset of the plurality of storage nodes based on the priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
  • Example 79 includes the subject matter of Example 78, and further including means for (i) determining the latency value for each of the plurality of storage nodes and (ii) storing, local to the storage node, the determined latency value for each of the plurality of storage nodes.
  • Example 80 includes the subject matter of any of Examples 78 and 79, and wherein the means for determining the latency value comprises means for (i) generating a message for one of the plurality of storage nodes, (ii) broadcasting the generated messages to the one of the plurality of storage nodes, (iii) receiving an acknowledgment from the one of the plurality of storage nodes, and (iv) determining the latency value for the one of the plurality of storage nodes based at least in part on a round-trip-time associated with an elapsed duration of time between having broadcasted the generated message and having received the acknowledgment.
  • Example 83 includes the subject matter of any of Examples 78-82, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for determining a total number of the subset based on the erasure code.
  • Example 84 includes the subject matter of any of Examples 78-83, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Technologies for adaptive erasure code include a plurality of storage nodes of a storage node cluster communicatively coupled to a proxy computing node. Each of the storage nodes are configured to receive a data object from the proxy computing node and determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes. The latency value indicates a communication latency between the storage node and another storage node of the plurality of storage nodes. The storage node having received the data object is configured to transform the data object into a plurality of erasure code elements based on the erasure code and transmit each of the erasure code elements to a corresponding one of the subset of the plurality of storage nodes. Other embodiments are described and claimed.

Description

TECHNOLOGIES FOR ADAPTIVE ERASURE CODE BACKGROUND
Distributed data object storage clusters typically utilize a plurality of storage nodes (i.e., computing devices capable of storing a plurality of data objects) to provide enhanced performance and availability. Such storage clusters can be used for data object replication (i.e., data object redundancy/backup) , for example. Generally, the storage clusters are not visible to a client computing device that is either transmitting data objects to be stored on the storage nodes or receiving stored data objects from the storage nodes. Accordingly, in some distributed storage cluster embodiments, incoming requests (e.g., network packets including data objects or data object requests) are queued at an entry point of the storage cluster, commonly referred to as a proxy (e.g., a proxy server) . As such, the proxy may be a computing device that is configured to act as an intermediary for the incoming client requests. Additionally, the proxy computing node can be configured to select which storage node to retrieve the requested data object from.
Erasure code (a. k. a. forward error correction (FEC) code) is a data protection method usable to guarantee data availability/durability for cloud storage by transforming a message (i.e., data) into a longer message such that the original message can be recovered from a subset of symbols in which the longer message is broken into. In other words, the data object to be replicated is broken into several data fragments that are encoded into a number of parity pieces based on the implemented erasure code. The data fragments and a number of redundant parity pieces are then stored across a plurality of different locations (e.g., storage nodes, storage disks, etc. ) . Accordingly, in the event of data corruption, the corrupted data can be rebuilt by using information form the data fragments and parity pieces (i.e., erasure code elements) .
However, conventional proxy computing nodes treat all of the storage nodes the same and attempt to distribute the erasure code elements evenly across the storage nodes, despite each of the storage nodes generally having different capabilities (e.g., processor capabilities, memory capacity, disk type, configurations, bandwidth, etc. ) and geographic locations. As a result, for example, storage nodes with a higher capacity can end up receiving more requests when traditional request distribution techniques are used (e.g., random selection of one of the storage  nodes, round-robin across chosen storage nodes, etc. ) . Such request distribution may lead to a performance bottleneck and/or leave other storage nodes underutilized or in an idle state.
BRIEF DESCRIPTION OF THE DRAWINGS
The concepts described herein are illustrated by way of example and not by way of limitation in the accompanying figures. For simplicity and clarity of illustration, elements illustrated in the figures are not necessarily drawn to scale. Where considered appropriate, reference labels have been repeated among the figures to indicate corresponding or analogous elements.
FIG. 1 is a simplified block diagram of at least one embodiment of a system for implementing adaptive erasure code that includes a proxy computing node communicatively coupled to a storage node cluster;
FIG. 2 is a simplified block diagram of at least one embodiment of the proxy computing node of the system of FIG. 1;
FIG. 3 is a simplified block diagram of at least one embodiment of the storage node of the system of FIG. 1;
FIG. 4 is a simplified block diagram of at least one embodiment of an environment that may be established by the proxy computing node of FIGS. 1 and 2;
FIG. 5 is a simplified block diagram of at least one embodiment of an environment that may be established by one of the storage nodes of FIGS. 1 and 3;
FIG. 6 is a simplified flow diagram of at least one embodiment of a method for determining latency in a storage node cluster that may be executed by one of the storage nodes of FIGS. 1 and 3;
FIG. 7 is a simplified block diagram of at least one embodiment of a storage node cluster;
FIG. 8 is a simplified illustration of at least one embodiment of a latency table usable to select a subset of storage nodes from the storage node cluster of FIG. 7;
FIG. 9 is a simplified block diagram of another embodiment of a storage node cluster;
FIG. 10 is a simplified illustration of at least one embodiment of a latency table usable to select a subset of storage nodes from the storage node cluster of FIG. 9;
FIG. 11 is a simplified flow diagram of at least one embodiment of a method for determining a subset of storage nodes of a storage node cluster at which to store an erasure code  element that may be executed by the proxy computing node of FIGS. 1 and 2, or by one of the storage nodes of FIGS. 1 and 3; and
FIG. 12 is a simplified illustration of at least one embodiment of a priority table usable to select a subset of storage nodes from the storage node cluster of FIG. 7.
DETAILED DESCRIPTION
While the concepts of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will be described herein in detail. It should be understood, however, that there is no intent to limit the concepts of the present disclosure to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives consistent with the present disclosure and the appended claims.
References in the specification to “one embodiment, ” “an embodiment, ” “an illustrative embodiment, ” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include that particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described. Additionally, it should be appreciated that items included in a list in the form of “at least one of A, B, and C” can mean (A) ; (B) ; (C) : (A and B) ; (Aand C) ; (B and C) ; or (A, B, and C) . Similarly, items listed in the form of “at least one of A, B, or C” can mean (A) ; (B) ; (C) : (Aand B) ; (Aand C) ; (B and C) ; or (A, B, and C) .
The disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media (e.g., memory, data storage, etc. ) , which may be read and executed by one or more processors. A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device) .
In the drawings, some structural or method features may be shown in specific arrangements and/or orderings. However, it should be appreciated that such specific arrangements and/or orderings may not be required. Rather, in some embodiments, such features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of a structural or method feature in a particular figure is not meant to imply that such feature is required in all embodiments and, in some embodiments, may not be included or may be combined with other features.
Referring now to FIG. 1, in an illustrative embodiment, a system 100 for implementing adaptive erasure code includes an endpoint computing node 102 communicatively coupled to a proxy computing node 106 of a storage node cluster 108 via a network 104. As shown, the proxy computing node 106 is further communicatively coupled to a plurality of distributed storage nodes 110. It should be appreciated that the proxy computing node 106 and the storage nodes 110 of the storage node cluster 108 may be architected as any type of distributed storage cluster, such as a distributed block storage cluster, a distributed file storage cluster, a distributed object storage cluster, etc. As will be described in further detail below, the storage nodes 110 includes a plurality of storage nodes (see, e.g., storage node (1) 112, storage node (2) 114, and storage node (n) 116) capable of storing data objects (e.g., replicas of the data objects for providing redundant backup) amongst two or more of the storage nodes 110.
In use, the endpoint computing node 102 transmits a network packet (i.e., via the network 104) to the proxy computing node 106. The network packet includes a data object (i.e., in a payload of the network packet) to be stored at a plurality of the storage nodes 110. It should be appreciated that the data object may include any type of data, such as a file, a block of data, an object, etc. Upon having received the network packet (i.e., the data object) , the proxy computing node 106 determines a subset of the storage nodes 110 at which to store at least a portion the received data object. The proxy computing node 106 is configured to determine the subset of storage nodes based on the erasure code being implemented to encode the data object and one or more criteria, including a latency associated with each of the storage nodes 110 and/or a priority associated with each of the storage nodes 110. It should be appreciated that the total number of storage nodes 110 of the subset is determined based on the erasure code.
For example, each of the storage nodes 110 may determine a latency value (e.g., a round-trip-time) between themselves and the other storage nodes 110 of the cluster 108, which is usable to determine the subset of the storage nodes 110. In some embodiments, the latencies  determined by each of the storage nodes 110 may be stored in a latency table stored locally at each of the storage nodes 110 and/or aggregated (i.e., the latencies stored) at the proxy computing node 106 into a master latency table that includes the latency values for all of the storage nodes 110 of the cluster 108 communicatively coupled via the proxy computing node 106. Accordingly, it should be appreciated that, in some embodiments, the determination of the subset of storage nodes 110 may be made by one of the storage nodes 110 (e.g., using the latency table stored at that one of the storage nodes 110) and/or the proxy computing node 106 (e.g., using the master latency table) .
To determine the subset of the storage nodes 110, the proxy computing node 106, or the one of the storage nodes 110, depending on the embodiment, checks against the latencies of the latency table to identify which of the storage nodes 110 (i.e., the subset) should be selected to store at least a portion of the data object. Similarly, in embodiments wherein the priority is used in addition or alternative to the latency, a priority table including a priority value assigned to each of the storage nodes 110 may be referenced to identify which of the storage nodes 110 (i.e., one of the storage nodes 110 having a priority higher than or equal to the other storage nodes 110) should be selected to store at least a portion of the data object.
The network 104 may be embodied as any type of wired and/or wireless communication network, including cellular networks (e.g., Global System for Mobile Communications (GSM) , Long-Term Evolution (LTE) , etc. ) , telephony networks, digital subscriber line (DSL) networks, cable networks, local area networks (LANs) or wide area networks (WANs) , global networks (e.g., the Internet) , or any combination thereof. It should be appreciated that the network 104 may serve as a centralized network and, in some embodiments, may be communicatively coupled to another network (e.g., the Internet) . Accordingly, the network 104 may include a variety of other network devices (not shown) , virtual and physical, such as routers, switches, network hubs, servers, storage devices, compute devices, etc., as needed to facilitate communication between the endpoint computing node 102 and the proxy computing node 106 via the network 104.
The proxy computing node 106 may be embodied as any type of computing device that is capable of performing the functions described herein, such as, without limitation, a server (e.g., stand-alone, rack-mounted, blade, etc. ) , a switch (e.g., rack-mounted, standalone, fully/partially managed, full-duplex/half-duplex communication mode enabled, etc. ) , a network appliance (e.g., physical or virtual) , a web appliance, a distributed computing system, a processor-based system,  and/or a multiprocessor system. For example, depending on the distribution architecture of the storage nodes 110 (e.g., symmetric or non-symmetric) , the proxy computing node 106 may perform as a function server (e.g., a web server, a database server, an email server, a file server, etc. ) and/or as a monitor server. It should be appreciated that, in some embodiments, at least a portion of the functions of the proxy computing node 106 described herein may be performed in a software layer of one or more of the storage nodes 110.
As shown in FIG. 2, the illustrative proxy computing node 106 includes a processor 202, an input/output (I/O) subsystem 204, a memory 206, a data storage device 208, and communication circuitry 210. Of course, the proxy computing node 106 may include other or additional components, such as those commonly found in a computing device (e.g., one or more input/output peripheral devices) , in other embodiments. Additionally, in some embodiments, one or more of the illustrative components may be incorporated in, or otherwise form a portion of, another component. For example, the memory 206, or portions thereof, may be incorporated in the processor 202 in some embodiments. Further, in some embodiments, one or more of the illustrative components may be omitted from the proxy computing node 106.
The processor 202 may be embodied as any type of processor capable of performing the functions described herein. For example, the processor 202 may be embodied as a single or multi-core processor, digital signal processor (DSP) , microcontroller, or other processor or processing/controlling circuit. Similarly, the memory 206 may be embodied as any type of volatile or non-volatile memory or data storage capable of performing the functions described herein. In operation, the memory 206 may store various data and software used during operation of the proxy computing node 106, such as operating systems, applications, programs, libraries, and drivers.
The memory 206 is communicatively coupled to the processor 202 via the I/O subsystem 204, which may be embodied as circuitry and/or components to facilitate input/output operations with the processor 202, the memory 206, and other components of the proxy computing node 106. For example, the I/O subsystem 204 may be embodied as, or otherwise include, memory controller hubs, input/output control hubs, firmware devices, communication links (i.e., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc. ) and/or other components and subsystems to facilitate the input/output operations. In some embodiments, the I/O subsystem 204 may form a portion of a system-on-a-chip (SoC) and be  incorporated, along with the processor 202, the memory 206, and other components of the proxy computing node 106, on a single integrated circuit chip.
The data storage device 208 may be embodied as any type of device or devices configured for short-term or long-term storage of data such as, for example, memory devices and circuits, memory cards, hard disk drives, solid-state drives, or other data storage devices. It should be appreciated that the data storage device 208 and/or the memory 206 (e.g., the computer-readable storage media) may store various data as described herein, including operating systems, applications, programs, libraries, drivers, instructions, etc., capable of being executed by a processor (e.g., the processor 202) of the proxy computing node 106.
The communication circuitry 210 may be embodied as any communication circuit, device, or collection thereof, capable of enabling communications between the proxy computing node 106 and other computing devices (e.g., the endpoint computing node 102, the storage nodes 110, etc. ) over a network (e.g., the network 104) . The communication circuitry 210 may be configured to use any one or more wireless and/or wired communication technologies and associated protocols (e.g., Ethernet,
Figure PCTCN2015098419-appb-000001
WiMAX, LTE, 5G, etc. ) to effect such communication.
The illustrative communication circuitry 210 includes a network interface controller (NIC) 212. The NIC 212 may be embodied as one or more add-in-boards, daughtercards, network interface cards, controller chips, chipsets, or other devices that may be used by the proxy computing node 106. For example, in some embodiments, the NIC 212 may be integrated with the processor 202, embodied as an expansion card coupled to the I/O subsystem 204 over an expansion bus (e.g., PCI Express) , part of an SoC that includes one or more processors, or included on a multichip package that also contains one or more processors.
Alternatively, in some embodiments, the NIC 212 may include a local processor (not shown) and/or a local memory (not shown) that are both local to the NIC 212. In such embodiments, the local processor of the NIC 212 may be capable of performing one or more of the functions (e.g., replication, network packet processing, etc. ) as described herein. In some embodiments, the local memory of the NIC 212 may be capable of storing data local to the NIC 212. It should be appreciated that the functionality of the NIC 212 may be integrated into one or more components of the proxy computing node 106 at the board level, socket level, chip level, and/or other levels.
Referring again to FIG. 1, the endpoint computing node 102 may be embodied as any type of computation or computing device capable of performing the functions described herein, including, without limitation, a computer, a mobile computing device (e.g., smartphone, tablet, laptop, notebook, wearable, etc. ) , a server (e.g., stand-alone, rack-mounted, blade, etc. ) , a network appliance (e.g., physical or virtual) , a web appliance, a distributed computing system, a processor-based system, and/or a multiprocessor system. Similar to the illustrative proxy computing node 106 of FIG. 2, the endpoint computing node 102 may include a processor, an I/O subsystem, a memory, a data storage device, and/or communication circuitry, which are not shown for clarity of the description. As such, further descriptions of the like components are not repeated herein with the understanding that the description of the corresponding components provided above in regard to the proxy computing node 106 applies equally to the corresponding components of the endpoint computing node 102.
The illustrative storage nodes 110 includes a first storage node, which is designated as storage node (1) 112, a second storage node, which is designated as storage node (2) 114, and a third storage node, which is designated as storage node (N) 116 (i.e., the “Nth” storage node of the storage nodes 110, wherein “N” is a positive integer and designates one or more additional storage nodes) . Each of the storage nodes 110 may be embodied as any type of storage device capable of performing the functions described herein, including, without limitation, a server (e.g., stand-alone, rack-mounted, blade, etc. ) , a network appliance (e.g., physical or virtual) , a high-performance computing device, a web appliance, a distributed computing system, a computer, a processor-based system, and/or a multiprocessor system.
Referring now to FIG. 3, the illustrative one of the storage nodes 110, similar to the illustrative proxy computing node 106 of FIG. 2, includes a processor 302, an I/O subsystem 304, a memory 306, a data storage device 308, and communication circuitry 310 that includes a NIC 312. Accordingly, further descriptions of the like components are not repeated herein with the understanding that the description of the corresponding components provided above in regard to the proxy computing node 106 applies equally to the corresponding components of each of the storage nodes 110.
As described previously, depending on the distributed storage setup of the storage nodes 110, the proxy computing node 106 and/or one or more of the storage nodes 110 may perform the functions described herein. For example, in a non-symmetric distributed storage platform, such as Ceph, the proxy computing node 106 may function as a monitor server, saving  the master latency table that includes latency values for each of the storage nodes 110 of the cluster 108. In such embodiments, the proxy computing node 106 may be configured to determine the subset of storage nodes 110 at which to store at least a portion of the data object (i.e., an encoded portion of the data object) , as described below in FIG. 4. In another example, in a symmetric distributed storage platform, such as Sheepdog, each storage node has its own latency table that only includes a latency value between that storage node and the other storage nodes 110 of the cluster 108. In such embodiments, the proxy computing node 106 may be configured transmit the received data object to be stored to one of the storage nodes 110, which may then determine the subset of storage nodes 110 at which to store at least a portion of the data object (i.e., an encoded portion of the data object) , as described below in FIG. 5.
Referring now to FIG. 4, in an illustrative embodiment, the proxy computing node 106 establishes an environment 400 during operation. The illustrative environment 400 includes a network communication module 410, a storage node identification module 420, an erasure code implementation module 430, and a storage node determination module 440. Each of the modules, logic, and other components of the environment 400 may be embodied as hardware, software, firmware, or a combination thereof. For example, each of the modules, logic, and other components of the environment 400 may form a portion of, or otherwise be established by, the processor 202, the communication circuitry 210 (e.g., the NIC 212) , and/or other hardware components of the proxy computing node 106. As such, in some embodiments, one or more of the modules of the environment 400 may be embodied as circuitry or a collection of electrical devices (e.g., network communication circuitry 410, storage node identification circuitry 420, erasure code implementation circuitry 430, storage node determination circuitry 440, etc. ) .
In the illustrative environment 400, the proxy computing node 106 includes storage node data 402, erasure code data 404, latency data 406, and priority data 408, each of which may be accessed by the various modules and/or sub-modules of the proxy computing node 106. It should be appreciated that the proxy computing node 106 may include other components, sub-components, modules, sub-modules, and/or devices commonly found in a computing node, which are not illustrated in FIG. 4 for clarity of the description.
The network communication module 410 is configured to facilitate inbound and outbound network communications (e.g., network traffic, network packets, network flows, etc. ) to and from the proxy computing node 106. To do so, the network communication module 410 is configured to receive and process network packets from other computing devices (e.g., the  endpoint computing node 102, the storage nodes 110, and/or other computing device (s) communicatively coupled via the network 104) . Additionally, the network communication module 410 is configured to prepare and transmit network packets to another computing device (e.g., the endpoint computing node 102, the storage nodes 110, and/or other computing device (s) communicatively coupled via the network 104) . Accordingly, in some embodiments, at least a portion of the functionality of the network communication module 410 may be performed by the communication circuitry 210, and more specifically by the NIC 212.
The storage node identification module 420 is configured to identify each of the storage nodes 110 communicatively coupled via the proxy computing node 106. In other words, the storage node identification module 420 is configured to identify a topology of the cluster 108 (see, e.g., the  storage node clusters  700 and 900 of FIGS. 7 and 9, respectively) . It should be appreciated that any known technology for identifying the storage nodes 110 of the cluster 108 may be implemented by the storage node identification module 420. In some embodiments, the identified storage nodes (e.g., the topology) may be stored in the storage node data 402.
The erasure code implementation module 430 is configured to implement an erasure code to the data object received for storage. In other words, the erasure code implementation module 430 is configured to transform or otherwise encode the received data object based on the erasure code, as well as rebuild the data object from the encoded data objects. For example, an erasure code 10 of 15 configuration, or EC 10/15, includes fifteen erasure code elements, or symbols, comprised of ten data elements, or base symbols, and five parity elements, or extra symbols. Accordingly, the erasure code implementation module 430 is configured to encode the erasure code elements, each of which is to be stored at a different one of the subset of storage nodes 110. In such an embodiment, each of the fifteen erasure code elements would be stored across fifteen different storage nodes of subset of storage nodes 110. Additionally, the erasure code implementation module 430 is configured to rebuild the data object (e.g., as a result of a detected corruption) . In furtherance of the previous example, the original data of the data object could be reconstructed from ten of the verified erasure code elements, or fragments. In some embodiments, the erasure code may be stored in the erasure code data 404.
The storage node determination module 440 is configured to determine which of the storage nodes 110 at which to store each of the erasure code elements. In other words, the storage node determination module 440 is configured to determine a subset from the storage nodes 110, each of which is to receive and store a different erasure code element. To do so, the  illustrative storage node determination module 440 includes a latency analysis module 442 and/or a priority analysis module 444.
The latency analysis module 442 is configured to analyze a latency table that includes a latency value between each of the storage nodes 110 and determine the subset of storage nodes 110 at which to store the erasure code elements based on the latency values. To do so, the latency analysis module 442 may be configured to query or otherwise receive the latency values determined at each of the storage nodes 110. For example, the latency analysis module 442 may be configured to determine the subset of the storage nodes 110 based on a distance-aware policy. In such an embodiment, the latency analysis module 442 may be configured to select the storage nodes 110 having the smallest latency (i.e., indicating a more proximate location as compared to the other storage nodes) to keep the erasure code elements as close as possible to a particular one of the storage nodes 110. In some embodiments, the latency table may be stored in the latency data 406.
The priority analysis module 444 is configured to analyze a priority table (e.g., retrieved from a policy) and determine the subset of storage nodes 110 at which to store the erasure code elements based on the priority values. For example, the priority analysis module 444 may be configured to determine the subset of the storage nodes 110 based on a priority ranked between high priority (e.g., a low value) and a low priority (e.g., a high value) . In some embodiments, an administrator of the storage nodes 110 may assign the priority to indicate a preference to use one storage node over another storage node, for example when one storage node has a shorter end-of-life (i.e., scheduled to be replaced in a shorter period of time) than another storage node. Accordingly, the priority analysis module 444 may determine to store the erasure code elements at the higher priority storage nodes, rather than the lower priority storage nodes. In some embodiments, the priority table may be stored in the priority data 408.
It should be appreciated that, in some embodiments, the storage node determination module 440 may first perform a latency analysis (e.g., via the latency analysis module 442) to identify candidate storage nodes from the storage nodes 110 of the cluster 108, and further refine the subset of storage nodes 110 from the identified candidate storage nodes based on the priority value (e.g., via the priority analysis module 444) , such as by serving as a tie-breaker when more than one of the storage nodes 110 has an equal latency value. Alternatively, in some embodiments, the storage node determination module 440 may first perform a priority analysis (e.g., via the priority analysis module 444) to identify the candidate storage nodes from the  storage nodes 110 of the cluster 108, and further refine the subset of storage nodes 110 from the identified candidate storage nodes based on the latency value (e.g., via the latency analysis module 442) , such as by serving as a tie-breaker when more than one of the storage nodes 110 has an equal priority value.
Referring now to FIG. 5, in an illustrative embodiment, one of the storage nodes 110 establishes an environment 500 during operation. The illustrative environment 500 includes a network communication module 510, a latency determination module 520, an erasure code implementation module 530, and a storage node determination module 540. Each of the modules, logic, and other components of the environment 500 may be embodied as hardware, software, firmware, or a combination thereof. For example, each of the modules, logic, and other components of the environment 500 may form a portion of, or otherwise be established by, the processor 302, the communication circuitry 310 (e.g., the NIC 312) , and/or other hardware components of the storage node. As such, in some embodiments, one or more of the modules of the environment 500 may be embodied as circuitry or a collection of electrical devices (e.g., network communication circuitry 510, latency determination circuitry 520, erasure code implementation circuitry 530, storage node determination circuitry 540, etc. ) .
In the illustrative environment 500, similar to the illustrative environment 400 of the proxy computing node 106, the storage node includes storage node data 502, erasure code data 504, latency data 506, and priority data 508, each of which may be accessed by the various modules and/or sub-modules of the storage node. It should be appreciated that the storage node may include other components, sub-components, modules, sub-modules, and/or devices commonly found in a computing node, which are not illustrated in FIG. 5 for clarity of the description.
The network communication module 510, similar to the network communication module 410 of the proxy computing node 106 of FIG. 4, is configured to facilitate inbound and outbound network communications (e.g., network traffic, network packets, network flows, etc. ) to and from the storage node. To do so, the network communication module 510 is configured to receive and process network packets from other computing devices (e.g., the proxy computing node 106, other storage nodes 110, and/or other computing device (s) communicatively coupled via the network 104) . Additionally, the network communication module 510 is configured to prepare and transmit network packets to another computing device (e.g., the proxy computing node 106, other storage nodes 110, and/or other computing device (s) communicatively coupled  via the network 104) . Accordingly, in some embodiments, at least a portion of the functionality of the network communication module 510 may be performed by the communication circuitry 310, and more specifically by the NIC 312.
The latency determination module 520 is configured to determine a latency value between the storage node and each of the other storage nodes 110 of the cluster 108. To do so, the latency determination module 520 is configured to transmit a network packet to each of the other storage nodes 110 and receive a response network packet (i.e., an acknowledgement) from each of the other storage nodes 110. Accordingly, the latency value can be determined relative to an amount of time measured to have passed between transmitting and receiving the latency determination network packets.
The erasure code implementation module 530, similar to the erasure code implementation module 430 of the proxy computing node 106 of FIG. 4, is configured to implement an erasure code to the data object received for storage. In other words, the erasure code implementation module 530 is configured to encode the received data object based on the erasure code, as well as rebuild the data object from the encoded data objects. For example, an erasure code 4 of 6 configuration (a. k. a. EC 4/6) includes six erasure code elements comprised of four data elements and two parity elements. In other words, two storage nodes could be lost or unavailable, and the original data (e.g., a file) could be recoverable for the other four storage nodes. Accordingly, in such a configuration, the erasure code implementation module 430 is configured to encode the erasure code elements, each of which is to be stored at a different one of six of the storage nodes 110 (i.e., the subset of storage nodes 110) . Additionally, the erasure code implementation module 430 is configured to rebuild the data object (e.g., as a result of a detected corruption) from four of the erasure code elements. In some embodiments, the erasure code may be stored in the erasure code data 504.
The storage node determination module 540, similar to the storage node determination module 440 of the proxy computing node 106 of FIG. 4, is configured to determine which of the storage nodes 110 at which to store each of the erasure code elements. In other words, the storage node determination module 540 is configured to determine a subset from the storage nodes 110, each of which is to receive and store a different erasure code element. It should be appreciated that, since one of the erasure components is to be stored local to the storage node, the storage node determination module 540 is configured to determine the subset of storage nodes 110 having a total number of storage nodes equal to the number of erasure components  (i.e., six, in the above example) minus one (i.e., five, in the above example) . To determine which of the storage nodes 110 to store each of the erasure code elements, the illustrative storage node determination module 540 includes a latency analysis module 542 and/or a priority analysis module 544.
The latency analysis module 542, similar to the latency analysis module 442 of the proxy computing node 106 of FIG. 4, is configured to analyze a latency table that includes a latency value between the storage node and each of the other storage nodes 110 of the cluster 108, as well as determine the subset of storage nodes 110 at which to store the erasure code elements based on the latency values. For example, the latency analysis module 542 may be configured to determine the subset of the storage nodes 110 based on a distance-aware policy. In such an embodiment, the latency analysis module 542 may be configured to select the storage nodes 110 having the smallest latency (i.e., indicating a more proximate location as compared to the other storage nodes) to keep the erasure code elements as close as possible to a particular one of the storage nodes 110. It should be appreciated that, in some embodiments, a copy of the latency table may be transmitted to and stored at the proxy computing node 106, such as in a master latency table maintained by the proxy computing node 106. In some embodiments, the latency table may be stored in the latency data 506.
The priority analysis module 544, similar to the priority analysis module 444 of the proxy computing node 106 of FIG. 4, is configured to analyze a priority table (e.g., retrieved from a policy) and determine the subset of storage nodes 110 at which to store the erasure code elements based on the priority values. For example, the priority analysis module 544 may be configured to determine the subset of the storage nodes 110 based on a priority (e.g., ranked between one and ten) ranked between high priority (e.g., a low value, the lowest being one) and a low priority (e.g., a high value, the highest being ten) . In some embodiments, an administrator of the storage nodes 110 may assign the priority to indicate a preference to use one storage node over another storage node, for example when one storage node has a shorter end-of-life (i.e., scheduled to be replaced in a shorter period of time) than another storage node. Accordingly, the priority analysis module 544 may determine to store the erasure code elements at the higher priority storage nodes, rather than the lower priority storage nodes. In some embodiments, the priority table may be stored in the priority data 508.
It should be appreciated that, in some embodiments, the storage node determination module 540 may first perform a latency analysis (e.g., via the latency analysis module 542) to  identify candidate storage nodes from the storage nodes 110 of the cluster, and further refine the subset of storage nodes 110 from the identified candidate storage nodes based on the priority value (e.g., via the priority analysis module 544) , such as by serving as a tie-breaker when more than one of the storage nodes 110 has an equal latency value. Alternatively, in some embodiments, the storage node determination module 540 may first perform a priority analysis (e.g., via the priority analysis module 544) to identify the candidate storage nodes from the storage nodes 110 of the cluster 108, and further refine the subset of storage nodes 110 from the identified candidate storage nodes based on the latency value (e.g., via the latency analysis module 542) , such as by serving as a tie-breaker when more than one of the storage nodes 110 has an equal priority value.
Referring now to FIG. 6, in use, one of the storage nodes 110 may execute a method 600 for determining latency in a storage node cluster (e.g., the storage node cluster 108 of FIG. 1) . It should be appreciated that the method 600 may be executed upon initiation by an administrator, detection of a new storage node 110 to the storage node cluster 108, and/or any other actionable event configured to trigger the latency determination, which may be triggered automatically. It should be further appreciated that, in some embodiments, the method 600 may be embodied as various instructions stored on a computer-readable media, which may be executed by the processor 302, the NIC 312, and/or other components of the storage node to cause the storage node to perform the method 600. The computer-readable media may be embodied as any type of media capable of being read by the storage node including, but not limited to, the memory 306, the data storage device 308, a local memory of the NIC 312, other memory or data storage devices of the storage node, portable media readable by a peripheral device of the storage node, and/or other media.
The method 600 begins with block 602, in which the storage node determines whether a latency check was initiated. If so, the method 600 proceeds to block 604, wherein the storage node determines the other storage nodes of the storage node cluster 108 in which the storage node performing the latency check resides. One such embodiment of a cluster of storage nodes 700 is shown in FIG. 7, which includes a first rack 710 of storage nodes (i.e., storage node (A) 712, storage node (B) 714, storage node (C) 716, and storage node (D) 718) and a second rack 720 of storage nodes (i.e., storage node (E) 722, storage node (F) 724, storage node (G) 726, and storage node (H) 728) , each of which are communicatively coupled to the proxy computing node 106.
It should be appreciated that the first and  second racks  710, 720, respectively, may be located in different nodes, racks, geographical locations, etc. In some embodiments, the other storage nodes may be determined by the storage node (e.g., via a message broadcast and responses received from the other storage nodes) or by an external computing device, such as the proxy computing node 106 or other monitoring/administration computing node (e.g., a controller/orchestrator computing node) communicatively coupled to the storage node.
Accordingly, the latencies between each of the  storage nodes  712, 714, 716, 718 at the first rack 710 may be less than the latency between one of the  storage nodes  712, 714, 716, 718 at the first rack 710 and one or more of the  storage nodes  722, 724, 726, 728 at the second rack 720, due to the proximate location (i.e., shorter network packet path distance) of the first rack 710 relative to the second rack 720. For example, as shown in an illustrative master latency table 800 of FIG. 8, the latency value (in milliseconds for the present example) between storage node (A) 712 and storage node (B) 714 is 5ms, while the latency value between storage node (A) 712 and storage node (C) 716 is 10ms and the latency value between storage node (A) 712 and storage node (D) 718 is also 10ms. As also shown in the master latency table 800, the latency value between storage node (A) 712 and storage node (E) 722 and storage node (F) 724 is 50ms, while the latency value between storage node (A) 712 and storage node (G) 726 and storage node (H) 728 is 45ms.
Another such embodiment of a cluster of storage nodes 900 is shown in FIG. 10. The cluster of storage nodes 900 includes four additional storage nodes (i.e., storage node (H) 902, storage node (I) 904, storage node (J) 906, and storage node (K) 908) in the first rack 710 of storage nodes, each of which are also communicatively coupled to the proxy computing node 106. In another example of latency values, as shown in FIG. 10, an illustrative master latency table 1000 indicates the latency value (in milliseconds for the present example) between storage node (A) 712 and each of storage node (H) 902, storage node (I) 904, storage node (J) 906, and storage node (K) 908 is 10ms, likely indicating the additional storage nodes (i.e., storage node (H) 902, storage node (I) 904, storage node (J) 906, and storage node (K) 908) are more proximately located to the original storage nodes (i.e., storage node (A) 712, storage node (B) 714, storage node (C) 716, and storage node (D) 718) of the first rack 710.
Referring again to FIG. 6, in block 606, the storage node determines the latency for each of the other storage nodes 110 of the cluster 108, such as may be embodied in a like architecture of the cluster of  storage nodes  700, 900. For example, the storage node may, in  block 608, generate a message for each of the other storage nodes 110 determined in block 604. Additionally, in block 610, the storage node may transmit each of the messages generated in block 608 to a corresponding one of the other storage nodes. Further, in block 612, the storage node may determine the latency value for each of the storage nodes based on a measured duration of time between the storage node having transmitted the message and received a response to the message from that other storage node.
For example, the storage node may use a clock of the storage node to log timestamps corresponding to when the message was transmitted from the storage node and the response to the message was received by the storage node from one of the other storage nodes that received the message. Accordingly, a round-trip-time may be determined as a function of the comparison between the logged timestamps. It should be appreciated that, in some embodiments, the latency may be determined for a single storage node of the cluster 108, such as when a new storage node is added to the storage node cluster 108, rather than for each of the other storage nodes of the cluster 108.
In block 614, the storage node stores the determined latencies. As described previously, depending on the distributed storage setup of the storage nodes 110 (e.g., symmetric versus non-symmetric distribution) , the proxy computing node 106 and/or one or more of the storage nodes 110 may perform certain functions described herein (see, e.g., the method 1100 of FIG. 11) . Accordingly, the storage node may, in block 616, store the latencies local to the storage node, and/or the storage node may, in block 618, store the latencies in a location remote of the storage node (e.g., the proxy computing node 106) before the method 600 returns to block 602 to determine whether another latency check was initiated.
Referring now to FIG. 11, in use, the proxy computing node 106 or one of the storage nodes in a storage node cluster (e.g., one of the storage nodes 110 of the storage node cluster 108) , depending on the embodiment, may execute a method 1100 for determining a subset of storage nodes of a cluster at which to store an erasure code element. It should be appreciated that, as described previously, depending on the distributed storage setup of the storage nodes (e.g., symmetric versus non-symmetric distribution) , the proxy computing node 106 and/or one or more of the storage nodes of the cluster may perform the functions of the method 1100. However, to preserve clarity of the description, the functions of the method 1100 as described herein will be described from the perspective of one of the storage nodes (i.e., storage node (A) 712 as used herein) of the storage nodes (see, e.g., the storage node cluster 700 of FIG. 7 and the  storage node cluster 900 of FIG. 9) performing the functions of the method 1100 as described herein. Accordingly, the storage node (A) 712 may only have a list of the latency values determined by the storage node (A) 712, as indicated by the highlighted first column/row of FIG. 10, as opposed to a master latency table that includes the latency values for each of the storage nodes of the cluster.
It should be further appreciated that, in some embodiments, the method 1100 may be embodied as various instructions stored on a computer-readable media, which may be executed by the processor 302, the NIC 312, and/or other components of the storage node (A) 712 to cause the storage node (A) 712 to perform the method 1100. The computer-readable media may be embodied as any type of media capable of being read by the storage node (A) 712 including, but not limited to, the memory 306, the data storage device 308, a local memory of the NIC 312, other memory or data storage devices of the storage node (A) 712, portable media readable by a peripheral device of the storage node (A) 712, and/or other media.
The method 1100 begins with block 1102, in which the storage node (A) 712 determines whether a data object to be stored was received, such as via the proxy computing node 106. If so, the method 1100 continues to block 1104, wherein the storage node (A) 712 determines an erasure code to implement. It should be appreciated that, in some embodiments, the particular erasure code implemented may depend on one or more characteristics of the data associated with the data object, such as a workload type, a flow, a tuple of identifying elements, etc.
In block 1106, the storage node (A) 712 determines a subset of storage nodes from the storage nodes of the cluster at which to store the received data. To do so, in block 1108, the storage node (A) 712 determines a total number of the subset of storage nodes based on the erasure code determined in block 1104. For example, the storage node (A) 712 may have determined to implement an erasure code 4 of 6, or EC 4/6, configuration. Accordingly, in such an embodiment, the total number of the subset of storage nodes is five (i.e., six minus one, since one of the erasure code elements is to be stored at the storage node (A) 712) .
Additionally, in block 1110, the storage node (A) 712 determines the subset of storage nodes based on a latency value determined between the storage node (A) 712 and each of the other storage nodes of the cluster (see, e.g., the method 600 of FIG. 6) . In furtherance of the above example in which the storage node (A) 712 is implementing an EC 4/6 configuration, the storage node (A) 712 is configured to select five other storage nodes of the storage nodes. In  such an embodiment, based on the latency values of the master latency table 800 shown in FIG. 8 for the storage node cluster 700 of FIG. 7, the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, storage node (D) 718, storage node (G) 726, and storage node (H) 728, since each of storage node (E) 722 and storage node (F) 724 have latency values (i.e., 50ms) that exceed the other available storage nodes of the cluster.
Alternatively, in another such embodiment, based on the latency values of the master latency table 1000 of FIG. 10 for the storage node cluster 900 of FIG. 9, the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, and storage node (D) 718, as well as two storage nodes from storage node (I) 902, storage node (J) 904, storage node (K) 906, and storage node (L) 908, since each of storage node (E) 722, storage node (F) 724, storage node (G) 726, and storage node (H) 728 have latency values (i.e., 45-50ms) that exceed the other available storage nodes of the cluster. As noted previously, it should be appreciated that although the master latency tables 800 and 1000 are referenced above, the storage node (A) 712 may only have those latency values determined by the storage node (A) 712, as indicated by the highlighted first column/row of FIGS. 8 and 10, rather than the entirety of latency values as shown in the master latency tables 800 and 1000.
In some embodiments, in addition or alternative to the latency determination, in block 1112, the storage node (A) 712 may determine the subset of storage nodes based on a priority value assigned to each of the other storage nodes of the cluster. For example, as shown in FIG. 12, an illustrative master priority table 1200 includes a priority value for each of the storage nodes of the storage node cluster 700 of FIG. 7. As described previously, each of the priority values indicate a preference of use for that particular storage node. For example, a storage node that has 2 years or less remaining in operation (e.g., before it is removed from the storage node cluster) may be given a priority value of 10, while a storage node that has 1 year or less remaining in operation may be assigned a value of 5 and a storage node that is not anticipated to be taken out of the storage node cluster may be assigned a value of 1. In other words, a lower value priority indicates a higher preference of use. Accordingly, in an embodiment determining the subset of storage nodes based on the priority values will select those storage nodes with a lower priority value (i.e., a higher preference) over those storage nodes with a higher priority value (i.e., a lower preference) .
For example, in an embodiment implementing the EC 4/6 configuration described above, the storage node (A) 712 would select storage node (B) 714, storage node (C) 716,  storage node (E) 722, and storage node (F) 724, each of which having a priority value equal to one, as well as one storage nodes from storage node (G) 726 and storage node (H) 728, since each have the same priority value (i.e., 5) , as the five storage nodes of the subset to transfer an erasure code element to. Storage node (D) 718, having a priority value (i.e., 10) higher than the other available storage nodes of the cluster would not be in consideration for the subset.
It should be appreciated that, in some embodiments, the latency determination in block 1110 and the priority determination in block 1112 may both be used. For example, referring again to the master latency table 800 of FIG. 8, wherein the storage node (A) 712 is using an EC 3/5 configuration (i.e., the storage node (A) 712 needs to identify four other storage nodes at which to store an erasure code element) , the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, and storage node (D) 718, since each of those storage nodes have the three lowest latency values (i.e., 5ms, 10ms, and 10ms, respectively) . Additionally, storage node (A) 712 would select one storage node from storage node (G) 726 and storage node (H) 728, since both of those storage nodes have the next lowest latency value (i.e., 45ms) . Accordingly, in such an embodiment, the storage node (A) 712 may determine the last storage node from between the storage node (G) 726 and the storage node (H) 728 based on the priority value assigned to each storage node. As such, as shown in FIG. 12, the storage node (A) 712 would select storage node (G) 726, as it has the lowest priority value (i.e., the higher preference of use) between the storage node (G) 726 (i.e., a priority value of 1) and the storage node (H) 728 (i.e., a priority value of 5) .
Referring back to FIG. 11, in block 1114 the storage node (A) 712 encodes the received data object based on the erasure code configuration determined in block 1104. In other words, the storage node (A) 712 transforms the data of the data object into a number of erasure code elements, as described above, according to the implemented erasure code configuration. In block 1116, the storage node (A) 712 stores one of the erasure code elements local to the storage node (A) 712 (e.g., in memory 306) . In block 1118, the storage node (A) 712 transmits each of the remaining erasure code elements to corresponding storage nodes of the subset of storage nodes determined in block 1106 before the method returns to block 1102 to determine whether another data object was received.
EXAMPLES
Illustrative examples of the technologies disclosed herein are provided below. An embodiment of the technologies may include any one or more, and any combination of, the examples described below.
Example 1 includes a proxy computing node for implementing adaptive erasure code, the proxy computing node comprising one or more processors; and one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, cause the proxy computing node to receive a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is further communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node; determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes; transform the data object based on the erasure code; and transmit a different portion of the transformed data object to each of the subset of storage nodes.
Example 2 includes the subject matter of Example 1, and wherein the plurality of instructions further cause the proxy computing node to receive the latency value for each of the storage nodes from each of the storage nodes.
Example 3 includes the subject matter of any of Examples 1 and 2, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
Example 4 includes the subject matter of any of Examples 1-3, and wherein to select each storage node of the subset of the plurality of storage nodes comprises to select each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
Example 5 includes the subject matter of any of Examples 1-4, and wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.
Example 6 includes the subject matter of any of Examples 1-5, and wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset  of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
Example 7 includes the subject matter of any of Examples 1-6, and wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
Example 8 includes the subject matter of any of Examples 1-7, and wherein the plurality of instructions further cause the proxy computing node to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
Example 9 includes the subject matter of any of Examples 1-8, and wherein the plurality of instructions further cause the proxy computing node to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
Example 10 includes the subject matter of any of Examples 1-9, and wherein to determine the subset of the plurality of storage nodes based on priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
Example 11 includes a method for implementing adaptive erasure code, the method comprising receiving, by a proxy computing node, a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node; determining, by the proxy computing node, a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes; transforming, by the proxy computing node, the data object based on the erasure code; and transmitting, by the proxy computing node, a different portion of the transformed data object to each of the subset of storage nodes.
Example 12 includes the subject matter of Example 11, and further including receiving, by the proxy computing node, the latency value for each of the storage nodes from each of the storage nodes.
Example 13 includes the subject matter of any of Examples 11 and 12, and wherein determining the subset of the plurality of storage nodes further comprises selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
Example 14 includes the subject matter of any of Examples 11-13, and wherein selecting each storage node of the subset of the plurality of storage nodes comprises selecting each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
Example 15 includes the subject matter of any of Examples 11-14, and wherein determining the subset of the plurality of storage nodes further comprises determining a total number of the subset based on the erasure code.
Example 16 includes the subject matter of any of Examples 11-15, and wherein determining the subset of the plurality of storage nodes further comprises determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
Example 17 includes the subject matter of any of Examples 11-16, and wherein determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes
Example 18 includes the subject matter of any of Examples 11-17, and further including determining, by the proxy computing node, the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
Example 19 includes the subject matter of any of Examples 11-18, and further including determining, by the proxy computing node, the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
Example 20 includes the subject matter of any of Examples 11-19, and wherein determining the subset of the plurality of storage nodes based on priority value comprises comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
Example 21 includes a proxy computing node comprising a processor; and a memory having stored therein a plurality of instructions that when executed by the processor cause the proxy computing node to perform the method of any of Examples 11-20.
Example 22 includes one or more machine readable storage media comprising a plurality of instructions stored thereon that in response to being executed result in a proxy computing node performing the method of any of Examples 11-20.
Example 23 includes a proxy computing node for implementing adaptive erasure code, the proxy computing node comprising network communication circuitry to receive a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is further communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node; storage node determination circuitry to determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes; and erasure code implementation circuitry to transform the data object based on the erasure code, wherein the network communication circuitry is further to transmit a different portion of the transformed data object to each of the subset of storage nodes.
Example 24 includes the subject matter of Example 23, and wherein the storage node determination circuitry is further to receive the latency value for each of the storage nodes from each of the storage nodes.
Example 25 includes the subject matter of any of Examples 23 and 24, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
Example 26 includes the subject matter of any of Examples 23-25, and wherein to select each storage node of the subset of the plurality of storage nodes comprises to select each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
Example 27 includes the subject matter of any of Examples 23-26, and wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.
Example 28 includes the subject matter of any of Examples 23-27, and wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage  nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
Example 29 includes the subject matter of any of Examples 23-28, and wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
Example 30 includes the subject matter of any of Examples 23-29, and, wherein the storage node determination circuitry is further to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
Example 31 includes the subject matter of any of Examples 23-30, and wherein the storage node determination circuitry is further to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
Example 32 includes the subject matter of any of Examples 23-31, and wherein to determine the subset of the plurality of storage nodes based on priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
Example 33 includes a proxy computing node for implementing adaptive erasure code, the proxy computing node comprising network communication circuitry to receive a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is further communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node; means for determining a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes; and means for transforming the data object based on the erasure code, wherein the network communication circuitry is further to transmit a different portion of the transformed data object to each of the subset of storage nodes.
Example 34 includes the subject matter of Example 33, and wherein the network communication circuitry is further to receive the latency value for each of the storage nodes from each of the storage nodes.
Example 35 includes the subject matter of any of Examples 33 and 34, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for  selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
Example 36 includes the subject matter of any of Examples 33-35, and wherein the means for selecting each storage node of the subset of the plurality of storage nodes comprises means for selecting each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
Example 37 includes the subject matter of any of Examples 33-36, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for determining a total number of the subset based on the erasure code.
Example 38 includes the subject matter of any of Examples 33-37, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
Example 39 includes the subject matter of any of Examples 33-38, and wherein the means for determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
Example 40 includes the subject matter of any of Examples 33-39, and further including means for determining the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
Example 41 includes the subject matter of any of Examples 33-40, and further including means for determining the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
Example 42 includes the subject matter of any of Examples 33-41, and wherein the means for determining the subset of the plurality of storage nodes based on priority value comprises means for comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
Example 43 includes a storage node for implementing adaptive erasure code, the storage node comprising one or more processors; and one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, cause the storage node to receive a data object from a proxy computing node communicatively coupled to the storage node in a storage node cluster, wherein the storage node is  communicatively coupled to a plurality of other storage nodes of the storage node cluster; determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between the storage node and a corresponding other storage node of the plurality of storage nodes; transform the data object into a plurality of erasure code elements based on the erasure code; and transmit each of the erasure code elements to a corresponding one of the subset of the plurality of storage nodes.
Example 44 includes the subject matter of Example 43, and wherein the plurality of instructions further cause the storage node to (i) determine the latency value for each of the plurality of storage nodes and (ii) store, local to the storage node, the determined latency value for each of the plurality of storage nodes.
Example 45 includes the subject matter of any of Examples 43 and 44, and wherein to determine the latency value comprises to (i) generate a message for one of the plurality of storage nodes, (ii) broadcast the generated messages to the one of the plurality of storage nodes, (iii) receive an acknowledgment from the one of the plurality of storage nodes, and (iv) determine the latency value for the one of the plurality of storage nodes based at least in part on a round-trip-time associated with an elapsed duration of time between having broadcasted the generated message and having received the acknowledgment.
Example 46 includes the subject matter of any of Examples 43-45, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node.
Example 47 includes the subject matter of any of Examples 43-46, and wherein to select each storage node of the subset of the plurality of storage nodes comprises each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
Example 48 includes the subject matter of any of Examples 43-47, and wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.
Example 49 includes the subject matter of any of Examples 43-48, and wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage  nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
Example 50 includes the subject matter of any of Examples 43-49, and wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes
Example 51 includes the subject matter of any of Examples 43-50, and wherein the plurality of instructions further cause the storage node to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
Example 52 includes the subject matter of any of Examples 43-51, and wherein the plurality of instructions further cause the storage node to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
Example 53 includes the subject matter of any of Examples 43-52, and wherein to determine the subset of the plurality of storage nodes based on priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
Example 54 includes a method for adaptive erasure code, the method comprising receiving, by a storage node of a storage node cluster that includes a plurality of storage nodes and a proxy computing node, a data object from the proxy computing node; determining, by the storage node, a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between the storage node and a corresponding other storage node of the plurality of storage nodes; transforming, by the storage node, the data object based on the erasure code; and transmitting, by the storage node, a portion of the transformed data object to corresponding storage nodes of the subset of the plurality of storage nodes.
Example 55 includes the subject matter of Example 54, and further including determining, by the storage node, the latency value for each of the plurality of storage nodes and storing, local to the storage node, the determined latency value for each of the plurality of storage nodes.
Example 56 includes the subject matter of any of Examples 54 and 55, and wherein determining the latency value comprises (i) generating a message for one of the plurality of storage nodes, (ii) broadcasting the generated messages to the one of the plurality of storage  nodes, (iii) receiving an acknowledgment from the one of the plurality of storage nodes, and (iv) determining the latency value for the one of the plurality of storage nodes based at least in part on a round-trip-time associated with an elapsed duration of time between broadcasting the generated message and receiving the acknowledgment.
Example 57 includes the subject matter of any of Examples 54-56, and wherein determining the subset of the plurality of storage nodes further comprises selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node.
Example 58 includes the subject matter of any of Examples 54-57, and wherein selecting each storage node of the subset of the plurality of storage nodes comprises each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
Example 59 includes the subject matter of any of Examples 54-58, and wherein determining the subset of the plurality of storage nodes further comprises determining a total number of the subset based on the erasure code.
Example 60 includes the subject matter of any of Examples 54-59, and wherein determining the subset of the plurality of storage nodes further comprises determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
Example 61 includes the subject matter of any of Examples 54-60, and wherein determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
Example 62 includes the subject matter of any of Examples 54-61, and further including determining, by the storage node, the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
Example 63 includes the subject matter of any of Examples 54-62, and further including determining, by the storage node, the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
Example 64 includes the subject matter of any of Examples 54-63, and wherein determining the subset of the plurality of storage nodes based on the priority value comprises  comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
Example 65 includes a computing device comprising a processor; and a memory having stored therein a plurality of instructions that when executed by the processor cause the computing device to perform the method of any of Examples 54-64.
Example 66 includes one or more machine readable storage media comprising a plurality of instructions stored thereon that in response to being executed result in a computing device performing the method of any of Examples 54-64.
Example 67 includes a storage node for implementing adaptive erasure code, the storage node comprising network communication circuitry to receive a data object from a proxy computing node communicatively coupled to the storage node in a storage node cluster, wherein the storage node is communicatively coupled to a plurality of other storage nodes of the storage node cluster; storage node determination circuitry to determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between the storage node and a corresponding other storage node of the plurality of storage nodes; and erasure code implementation circuitry to transform the data object into a plurality of erasure code elements based on the erasure code, wherein the network communication circuitry is further to transmit each of the erasure code elements to a corresponding one of the subset of the plurality of storage nodes.
Example 68 includes the subject matter of Example 67, and further including latency determination circuitry to (i) determine the latency value for each of the plurality of storage nodes and (ii) store, local to the storage node, the determined latency value for each of the plurality of storage nodes.
Example 69 includes the subject matter of any of Examples 67 and 68, and wherein to determine the latency value comprises to (i) generate a message for one of the plurality of storage nodes, (ii) broadcast the generated messages to the one of the plurality of storage nodes, (iii) receive an acknowledgment from the one of the plurality of storage nodes, and (iv) determine the latency value for the one of the plurality of storage nodes based at least in part on a round-trip-time associated with an elapsed duration of time between having broadcasted the generated message and having received the acknowledgment.
Example 70 includes the subject matter of any of Examples 67-69, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node.
Example 71 includes the subject matter of any of Examples 67-70, and wherein to select each storage node of the subset of the plurality of storage nodes comprises each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
Example 72 includes the subject matter of any of Examples 67-71, and wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.
Example 73 includes the subject matter of any of Examples 67-72, and wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
Example 74 includes the subject matter of any of Examples 67-73, and wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
Example 75 includes the subject matter of any of Examples 67-74, and wherein the storage node determination module is further to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
Example 76 includes the subject matter of any of Examples 67-75, and wherein the storage node determination module is further to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
Example 77 includes the subject matter of any of Examples 67-76, and wherein to determine the subset of the plurality of storage nodes based on the priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
Example 78 includes a storage node for implementing adaptive erasure code, the storage node comprising network communication circuitry to receive a data object from a proxy  computing node communicatively coupled to the storage node in a storage node cluster, wherein the storage node is communicatively coupled to a plurality of other storage nodes of the storage node cluster; means for determining a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between the storage node and a corresponding other storage node of the plurality of storage nodes; and means for transforming the data object into a plurality of erasure code elements based on the erasure code, wherein the network communication circuitry is further to transmit each of the erasure code elements to a corresponding one of the subset of the plurality of storage nodes.
Example 79 includes the subject matter of Example 78, and further including means for (i) determining the latency value for each of the plurality of storage nodes and (ii) storing, local to the storage node, the determined latency value for each of the plurality of storage nodes.
Example 80 includes the subject matter of any of Examples 78 and 79, and wherein the means for determining the latency value comprises means for (i) generating a message for one of the plurality of storage nodes, (ii) broadcasting the generated messages to the one of the plurality of storage nodes, (iii) receiving an acknowledgment from the one of the plurality of storage nodes, and (iv) determining the latency value for the one of the plurality of storage nodes based at least in part on a round-trip-time associated with an elapsed duration of time between having broadcasted the generated message and having received the acknowledgment.
Example 81 includes the subject matter of any of Examples 78-80, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node.
Example 82 includes the subject matter of any of Examples 78-81, and wherein the means for selecting each storage node of the subset of the plurality of storage nodes comprises each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
Example 83 includes the subject matter of any of Examples 78-82, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for determining a total number of the subset based on the erasure code.
Example 84 includes the subject matter of any of Examples 78-83, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for  determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
Example 85 includes the subject matter of any of Examples 78-84, and wherein the means for determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
Example 86 includes the subject matter of any of Examples 78-85, and further including means for determining the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
Example 87 includes the subject matter of any of Examples 78-86, and further including means for determining the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
Example 88 includes the subject matter of any of Examples 78-87, and wherein the means for determining the subset of the plurality of storage nodes based on the priority value comprises means for comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.

Claims (23)

  1. A proxy computing node for implementing adaptive erasure code, the proxy computing node comprising:
    one or more processors; and
    one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, cause the proxy computing node to:
    receive a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is further communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node;
    determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes;
    transform the data object based on the erasure code; and
    transmit a different portion of the transformed data object to each storage node of the subset of storage nodes.
  2. The proxy computing node of claim 1, wherein the plurality of instructions further cause the proxy computing node to receive the latency value for each of the storage nodes from each of the storage nodes.
  3. The proxy computing node of claim 1, wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
  4. The proxy computing node of claim 3, wherein to select each storage node of the subset of the plurality of storage nodes comprises to select each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
  5. The proxy computing node of claim 1, wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.
  6. The proxy computing node of claim 1, wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
  7. The proxy computing node of claim 6, wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
  8. The proxy computing node of claim 6, wherein the plurality of instructions further cause the proxy computing node to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
  9. The proxy computing node of claim 6, wherein the plurality of instructions further cause the proxy computing node to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
  10. The proxy computing node of claim 6, wherein to determine the subset of the plurality of storage nodes based on the priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
  11. A method for implementing adaptive erasure code, the method comprising:
    receiving, by a proxy computing node, a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node;
    determining, by the proxy computing node, a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of  storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes;
    transforming, by the proxy computing node, the data object based on the erasure code; and
    transmitting, by the proxy computing node, a different portion of the transformed data object to each storage node of the subset of storage nodes.
  12. The method of claim 11, further comprising receiving, by the proxy computing node, the latency value for each of the storage nodes from each of the storage nodes.
  13. The method of claim 11, wherein determining the subset of the plurality of storage nodes further comprises selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
  14. The method of claim 13, wherein selecting each storage node of the subset of the plurality of storage nodes comprises selecting each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
  15. The method of claim 11, wherein determining the subset of the plurality of storage nodes further comprises determining a total number of the subset based on the erasure code.
  16. The method of claim 11, wherein determining the subset of the plurality of storage nodes further comprises determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
  17. The method of claim 16, wherein determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes 
  18. The method of claim 16, further comprising determining, by the proxy computing node, the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
  19. The method of claim 16, further comprising determining, by the proxy computing node, the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
  20. The method of claim 16, wherein determining the subset of the plurality of storage nodes based on the priority value comprises comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
  21. A proxy computing node comprising:
    a processor; and
    a memory having stored therein a plurality of instructions that when executed by the processor cause the proxy computing node to perform the method of any of claims 11-20.
  22. One or more machine readable storage media comprising a plurality of instructions stored thereon that in response to being executed result in a proxy computing node performing the method of any of claims 11-20.
  23. A proxy computing node comprising means for performing the method of any of claims 11-20.
PCT/CN2015/098419 2015-12-23 2015-12-23 Technologies for adaptive erasure code WO2017107095A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/098419 WO2017107095A1 (en) 2015-12-23 2015-12-23 Technologies for adaptive erasure code

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/098419 WO2017107095A1 (en) 2015-12-23 2015-12-23 Technologies for adaptive erasure code

Publications (1)

Publication Number Publication Date
WO2017107095A1 true WO2017107095A1 (en) 2017-06-29

Family

ID=59088855

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/098419 WO2017107095A1 (en) 2015-12-23 2015-12-23 Technologies for adaptive erasure code

Country Status (1)

Country Link
WO (1) WO2017107095A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949628A (en) * 2019-05-16 2020-11-17 北京京东尚科信息技术有限公司 Data operation method and device and distributed storage system
US11461015B2 (en) 2018-10-15 2022-10-04 Netapp, Inc. Available storage space in a system with varying data redundancy schemes

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101040475A (en) * 2004-10-06 2007-09-19 诺基亚公司 Assembling forward error correction frames
CN102624866A (en) * 2012-01-13 2012-08-01 北京大学深圳研究生院 Data storage method, data storage device and distributed network storage system
CN103152395A (en) * 2013-02-05 2013-06-12 北京奇虎科技有限公司 Storage method and device of distributed file system
CN103984607A (en) * 2013-02-08 2014-08-13 华为技术有限公司 Distributed storage method, device and system
US20150169716A1 (en) * 2013-12-18 2015-06-18 Amazon Technologies, Inc. Volume cohorts in object-redundant storage systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101040475A (en) * 2004-10-06 2007-09-19 诺基亚公司 Assembling forward error correction frames
CN102624866A (en) * 2012-01-13 2012-08-01 北京大学深圳研究生院 Data storage method, data storage device and distributed network storage system
CN103152395A (en) * 2013-02-05 2013-06-12 北京奇虎科技有限公司 Storage method and device of distributed file system
CN103984607A (en) * 2013-02-08 2014-08-13 华为技术有限公司 Distributed storage method, device and system
US20150169716A1 (en) * 2013-12-18 2015-06-18 Amazon Technologies, Inc. Volume cohorts in object-redundant storage systems

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11461015B2 (en) 2018-10-15 2022-10-04 Netapp, Inc. Available storage space in a system with varying data redundancy schemes
CN111949628A (en) * 2019-05-16 2020-11-17 北京京东尚科信息技术有限公司 Data operation method and device and distributed storage system
CN111949628B (en) * 2019-05-16 2024-05-17 北京京东尚科信息技术有限公司 Data operation method, device and distributed storage system

Similar Documents

Publication Publication Date Title
US10659532B2 (en) Technologies for reducing latency variation of stored data object requests
US10691366B2 (en) Policy-based hierarchical data protection in distributed storage
US10261853B1 (en) Dynamic replication error retry and recovery
US9665428B2 (en) Distributing erasure-coded fragments in a geo-distributed storage system
US11089099B2 (en) Technologies for managing data object requests in a storage node cluster
US10003357B2 (en) Systems and methods for verification of code resiliency for data storage
US9916275B2 (en) Preventing input/output (I/O) traffic overloading of an interconnect channel in a distributed data storage system
US10528527B2 (en) File management in thin provisioning storage environments
US20170060469A1 (en) Systems and methods for data organization in storage systems using large erasure codes
US10785350B2 (en) Heartbeat in failover cluster
US11416166B2 (en) Distributed function processing with estimate-based scheduler
WO2018121456A1 (en) Data storage method, server and storage system
US11922015B2 (en) Generating recovered data in a storage network
WO2017107095A1 (en) Technologies for adaptive erasure code
US20200336540A1 (en) Integrated erasure coding for data storage and transmission
CN113852691A (en) Block chain consensus method, consensus node and electronic equipment
WO2016003454A1 (en) Managing port connections
US11025445B2 (en) Early acknowledgment for write operations
US10223033B2 (en) Coordinating arrival times of data slices in a dispersed storage network
US20160063024A1 (en) Network storage deduplicating method and server using the same
Parisis et al. Trevi: Watering down storage hotspots with cool fountain codes
US10038767B2 (en) Technologies for fabric supported sequencers in distributed architectures
US11403366B2 (en) On-demand retrieval of information from databases
US20240104064A1 (en) Unified storage and method of controlling unified storage
CN117312326A (en) Data storage method based on Yun Yuansheng database and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15911094

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15911094

Country of ref document: EP

Kind code of ref document: A1