WO2017107095A1

WO2017107095A1 - Technologies for adaptive erasure code

Info

Publication number: WO2017107095A1
Application number: PCT/CN2015/098419
Authority: WO
Inventors: Hongzhou Zhang; Qihua DAI; Xiaodong Liu
Original assignee: Intel Corporation
Priority date: 2015-12-23
Filing date: 2015-12-23
Publication date: 2017-06-29

Abstract

Technologies for adaptive erasure code include a plurality of storage nodes of a storage node cluster communicatively coupled to a proxy computing node. Each of the storage nodes are configured to receive a data object from the proxy computing node and determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes. The latency value indicates a communication latency between the storage node and another storage node of the plurality of storage nodes. The storage node having received the data object is configured to transform the data object into a plurality of erasure code elements based on the erasure code and transmit each of the erasure code elements to a corresponding one of the subset of the plurality of storage nodes. Other embodiments are described and claimed.

Description

TECHNOLOGIES FOR ADAPTIVE ERASURE CODE

BACKGROUND

Distributed data object storage clusters typically utilize a plurality of storage nodes (i.e., computing devices capable of storing a plurality of data objects) to provide enhanced performance and availability. Such storage clusters can be used for data object replication (i.e., data object redundancy/backup) , for example. Generally, the storage clusters are not visible to a client computing device that is either transmitting data objects to be stored on the storage nodes or receiving stored data objects from the storage nodes. Accordingly, in some distributed storage cluster embodiments, incoming requests (e.g., network packets including data objects or data object requests) are queued at an entry point of the storage cluster, commonly referred to as a proxy (e.g., a proxy server) . As such, the proxy may be a computing device that is configured to act as an intermediary for the incoming client requests. Additionally, the proxy computing node can be configured to select which storage node to retrieve the requested data object from.

Erasure code (a. k. a. forward error correction (FEC) code) is a data protection method usable to guarantee data availability/durability for cloud storage by transforming a message (i.e., data) into a longer message such that the original message can be recovered from a subset of symbols in which the longer message is broken into. In other words, the data object to be replicated is broken into several data fragments that are encoded into a number of parity pieces based on the implemented erasure code. The data fragments and a number of redundant parity pieces are then stored across a plurality of different locations (e.g., storage nodes, storage disks, etc. ) . Accordingly, in the event of data corruption, the corrupted data can be rebuilt by using information form the data fragments and parity pieces (i.e., erasure code elements) .

However, conventional proxy computing nodes treat all of the storage nodes the same and attempt to distribute the erasure code elements evenly across the storage nodes, despite each of the storage nodes generally having different capabilities (e.g., processor capabilities, memory capacity, disk type, configurations, bandwidth, etc. ) and geographic locations. As a result, for example, storage nodes with a higher capacity can end up receiving more requests when traditional request distribution techniques are used (e.g., random selection of one of the storage nodes, round-robin across chosen storage nodes, etc. ) . Such request distribution may lead to a performance bottleneck and/or leave other storage nodes underutilized or in an idle state.

BRIEF DESCRIPTION OF THE DRAWINGS

The concepts described herein are illustrated by way of example and not by way of limitation in the accompanying figures. For simplicity and clarity of illustration, elements illustrated in the figures are not necessarily drawn to scale. Where considered appropriate, reference labels have been repeated among the figures to indicate corresponding or analogous elements.

FIG. 1 is a simplified block diagram of at least one embodiment of a system for implementing adaptive erasure code that includes a proxy computing node communicatively coupled to a storage node cluster；

FIG. 2 is a simplified block diagram of at least one embodiment of the proxy computing node of the system of FIG. 1；

FIG. 3 is a simplified block diagram of at least one embodiment of the storage node of the system of FIG. 1；

FIG. 4 is a simplified block diagram of at least one embodiment of an environment that may be established by the proxy computing node of FIGS. 1 and 2；

FIG. 5 is a simplified block diagram of at least one embodiment of an environment that may be established by one of the storage nodes of FIGS. 1 and 3；

FIG. 6 is a simplified flow diagram of at least one embodiment of a method for determining latency in a storage node cluster that may be executed by one of the storage nodes of FIGS. 1 and 3；

FIG. 7 is a simplified block diagram of at least one embodiment of a storage node cluster；

FIG. 8 is a simplified illustration of at least one embodiment of a latency table usable to select a subset of storage nodes from the storage node cluster of FIG. 7；

FIG. 9 is a simplified block diagram of another embodiment of a storage node cluster；

FIG. 10 is a simplified illustration of at least one embodiment of a latency table usable to select a subset of storage nodes from the storage node cluster of FIG. 9；

FIG. 11 is a simplified flow diagram of at least one embodiment of a method for determining a subset of storage nodes of a storage node cluster at which to store an erasure code element that may be executed by the proxy computing node of FIGS. 1 and 2, or by one of the storage nodes of FIGS. 1 and 3； and

FIG. 12 is a simplified illustration of at least one embodiment of a priority table usable to select a subset of storage nodes from the storage node cluster of FIG. 7.

DETAILED DESCRIPTION

While the concepts of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will be described herein in detail. It should be understood, however, that there is no intent to limit the concepts of the present disclosure to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives consistent with the present disclosure and the appended claims.

References in the specification to “one embodiment, ” “an embodiment, ” “an illustrative embodiment, ” etc.， indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include that particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described. Additionally, it should be appreciated that items included in a list in the form of “at least one of A, B, and C” can mean (A) ； (B) ； (C) : (A and B) ； (Aand C) ； (B and C) ； or (A, B, and C) . Similarly, items listed in the form of “at least one of A, B, or C” can mean (A) ； (B) ； (C) : (Aand B) ； (Aand C) ； (B and C) ； or (A, B, and C) .

The disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media (e.g., memory, data storage, etc. ) , which may be read and executed by one or more processors. A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device) .

In the drawings, some structural or method features may be shown in specific arrangements and/or orderings. However, it should be appreciated that such specific arrangements and/or orderings may not be required. Rather, in some embodiments, such features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of a structural or method feature in a particular figure is not meant to imply that such feature is required in all embodiments and, in some embodiments, may not be included or may be combined with other features.

Referring now to FIG. 1, in an illustrative embodiment, a system 100 for implementing adaptive erasure code includes an endpoint computing node 102 communicatively coupled to a proxy computing node 106 of a storage node cluster 108 via a network 104. As shown, the proxy computing node 106 is further communicatively coupled to a plurality of distributed storage nodes 110. It should be appreciated that the proxy computing node 106 and the storage nodes 110 of the storage node cluster 108 may be architected as any type of distributed storage cluster, such as a distributed block storage cluster, a distributed file storage cluster, a distributed object storage cluster, etc. As will be described in further detail below, the storage nodes 110 includes a plurality of storage nodes (see, e.g., storage node (1) 112, storage node (2) 114, and storage node (n) 116) capable of storing data objects (e.g., replicas of the data objects for providing redundant backup) amongst two or more of the storage nodes 110.

In use, the endpoint computing node 102 transmits a network packet (i.e., via the network 104) to the proxy computing node 106. The network packet includes a data object (i.e., in a payload of the network packet) to be stored at a plurality of the storage nodes 110. It should be appreciated that the data object may include any type of data, such as a file, a block of data, an object, etc. Upon having received the network packet (i.e., the data object) , the proxy computing node 106 determines a subset of the storage nodes 110 at which to store at least a portion the received data object. The proxy computing node 106 is configured to determine the subset of storage nodes based on the erasure code being implemented to encode the data object and one or more criteria, including a latency associated with each of the storage nodes 110 and/or a priority associated with each of the storage nodes 110. It should be appreciated that the total number of storage nodes 110 of the subset is determined based on the erasure code.

For example, each of the storage nodes 110 may determine a latency value (e.g., a round-trip-time) between themselves and the other storage nodes 110 of the cluster 108, which is usable to determine the subset of the storage nodes 110. In some embodiments, the latencies determined by each of the storage nodes 110 may be stored in a latency table stored locally at each of the storage nodes 110 and/or aggregated (i.e., the latencies stored) at the proxy computing node 106 into a master latency table that includes the latency values for all of the storage nodes 110 of the cluster 108 communicatively coupled via the proxy computing node 106. Accordingly, it should be appreciated that, in some embodiments, the determination of the subset of storage nodes 110 may be made by one of the storage nodes 110 (e.g., using the latency table stored at that one of the storage nodes 110) and/or the proxy computing node 106 (e.g., using the master latency table) .

To determine the subset of the storage nodes 110, the proxy computing node 106, or the one of the storage nodes 110, depending on the embodiment, checks against the latencies of the latency table to identify which of the storage nodes 110 (i.e., the subset) should be selected to store at least a portion of the data object. Similarly, in embodiments wherein the priority is used in addition or alternative to the latency, a priority table including a priority value assigned to each of the storage nodes 110 may be referenced to identify which of the storage nodes 110 (i.e., one of the storage nodes 110 having a priority higher than or equal to the other storage nodes 110) should be selected to store at least a portion of the data object.

The network 104 may be embodied as any type of wired and/or wireless communication network, including cellular networks (e.g., Global System for Mobile Communications (GSM) , Long-Term Evolution (LTE) , etc. ) , telephony networks, digital subscriber line (DSL) networks, cable networks, local area networks (LANs) or wide area networks (WANs) , global networks (e.g., the Internet) , or any combination thereof. It should be appreciated that the network 104 may serve as a centralized network and, in some embodiments, may be communicatively coupled to another network (e.g., the Internet) . Accordingly, the network 104 may include a variety of other network devices (not shown) , virtual and physical, such as routers, switches, network hubs, servers, storage devices, compute devices, etc., as needed to facilitate communication between the endpoint computing node 102 and the proxy computing node 106 via the network 104.

The proxy computing node 106 may be embodied as any type of computing device that is capable of performing the functions described herein, such as, without limitation, a server (e.g., stand-alone, rack-mounted, blade, etc. ) , a switch (e.g., rack-mounted, standalone, fully/partially managed, full-duplex/half-duplex communication mode enabled, etc. ) , a network appliance (e.g., physical or virtual) , a web appliance, a distributed computing system, a processor-based system, and/or a multiprocessor system. For example, depending on the distribution architecture of the storage nodes 110 (e.g., symmetric or non-symmetric) , the proxy computing node 106 may perform as a function server (e.g., a web server, a database server, an email server, a file server, etc. ) and/or as a monitor server. It should be appreciated that, in some embodiments, at least a portion of the functions of the proxy computing node 106 described herein may be performed in a software layer of one or more of the storage nodes 110.

As shown in FIG. 2, the illustrative proxy computing node 106 includes a processor 202, an input/output (I/O) subsystem 204, a memory 206, a data storage device 208, and communication circuitry 210. Of course, the proxy computing node 106 may include other or additional components, such as those commonly found in a computing device (e.g., one or more input/output peripheral devices) , in other embodiments. Additionally, in some embodiments, one or more of the illustrative components may be incorporated in, or otherwise form a portion of, another component. For example, the memory 206, or portions thereof, may be incorporated in the processor 202 in some embodiments. Further, in some embodiments, one or more of the illustrative components may be omitted from the proxy computing node 106.

The processor 202 may be embodied as any type of processor capable of performing the functions described herein. For example, the processor 202 may be embodied as a single or multi-core processor, digital signal processor (DSP) , microcontroller, or other processor or processing/controlling circuit. Similarly, the memory 206 may be embodied as any type of volatile or non-volatile memory or data storage capable of performing the functions described herein. In operation, the memory 206 may store various data and software used during operation of the proxy computing node 106, such as operating systems, applications, programs, libraries, and drivers.

The memory 206 is communicatively coupled to the processor 202 via the I/O subsystem 204, which may be embodied as circuitry and/or components to facilitate input/output operations with the processor 202, the memory 206, and other components of the proxy computing node 106. For example, the I/O subsystem 204 may be embodied as, or otherwise include, memory controller hubs, input/output control hubs, firmware devices, communication links (i.e., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc. ) and/or other components and subsystems to facilitate the input/output operations. In some embodiments, the I/O subsystem 204 may form a portion of a system-on-a-chip (SoC) and be incorporated, along with the processor 202, the memory 206, and other components of the proxy computing node 106, on a single integrated circuit chip.

The data storage device 208 may be embodied as any type of device or devices configured for short-term or long-term storage of data such as, for example, memory devices and circuits, memory cards, hard disk drives, solid-state drives, or other data storage devices. It should be appreciated that the data storage device 208 and/or the memory 206 (e.g., the computer-readable storage media) may store various data as described herein, including operating systems, applications, programs, libraries, drivers, instructions, etc., capable of being executed by a processor (e.g., the processor 202) of the proxy computing node 106.

The communication circuitry 210 may be embodied as any communication circuit, device, or collection thereof, capable of enabling communications between the proxy computing node 106 and other computing devices (e.g., the endpoint computing node 102, the storage nodes 110, etc. ) over a network (e.g., the network 104) . The communication circuitry 210 may be configured to use any one or more wireless and/or wired communication technologies and associated protocols (e.g., Ethernet,

WiMAX, LTE, 5G, etc. ) to effect such communication.

The illustrative communication circuitry 210 includes a network interface controller (NIC) 212. The NIC 212 may be embodied as one or more add-in-boards, daughtercards, network interface cards, controller chips, chipsets, or other devices that may be used by the proxy computing node 106. For example, in some embodiments, the NIC 212 may be integrated with the processor 202, embodied as an expansion card coupled to the I/O subsystem 204 over an expansion bus (e.g., PCI Express) , part of an SoC that includes one or more processors, or included on a multichip package that also contains one or more processors.

Alternatively, in some embodiments, the NIC 212 may include a local processor (not shown) and/or a local memory (not shown) that are both local to the NIC 212. In such embodiments, the local processor of the NIC 212 may be capable of performing one or more of the functions (e.g., replication, network packet processing, etc. ) as described herein. In some embodiments, the local memory of the NIC 212 may be capable of storing data local to the NIC 212. It should be appreciated that the functionality of the NIC 212 may be integrated into one or more components of the proxy computing node 106 at the board level, socket level, chip level, and/or other levels.

Referring again to FIG. 1, the endpoint computing node 102 may be embodied as any type of computation or computing device capable of performing the functions described herein, including, without limitation, a computer, a mobile computing device (e.g., smartphone, tablet, laptop, notebook, wearable, etc. ) , a server (e.g., stand-alone, rack-mounted, blade, etc. ) , a network appliance (e.g., physical or virtual) , a web appliance, a distributed computing system, a processor-based system, and/or a multiprocessor system. Similar to the illustrative proxy computing node 106 of FIG. 2, the endpoint computing node 102 may include a processor, an I/O subsystem, a memory, a data storage device, and/or communication circuitry, which are not shown for clarity of the description. As such, further descriptions of the like components are not repeated herein with the understanding that the description of the corresponding components provided above in regard to the proxy computing node 106 applies equally to the corresponding components of the endpoint computing node 102.

The illustrative storage nodes 110 includes a first storage node, which is designated as storage node (1) 112, a second storage node, which is designated as storage node (2) 114, and a third storage node, which is designated as storage node (N) 116 (i.e., the “Nth” storage node of the storage nodes 110, wherein “N” is a positive integer and designates one or more additional storage nodes) . Each of the storage nodes 110 may be embodied as any type of storage device capable of performing the functions described herein, including, without limitation, a server (e.g., stand-alone, rack-mounted, blade, etc. ) , a network appliance (e.g., physical or virtual) , a high-performance computing device, a web appliance, a distributed computing system, a computer, a processor-based system, and/or a multiprocessor system.

Referring now to FIG. 3, the illustrative one of the storage nodes 110, similar to the illustrative proxy computing node 106 of FIG. 2, includes a processor 302, an I/O subsystem 304, a memory 306, a data storage device 308, and communication circuitry 310 that includes a NIC 312. Accordingly, further descriptions of the like components are not repeated herein with the understanding that the description of the corresponding components provided above in regard to the proxy computing node 106 applies equally to the corresponding components of each of the storage nodes 110.

As described previously, depending on the distributed storage setup of the storage nodes 110, the proxy computing node 106 and/or one or more of the storage nodes 110 may perform the functions described herein. For example, in a non-symmetric distributed storage platform, such as Ceph, the proxy computing node 106 may function as a monitor server, saving the master latency table that includes latency values for each of the storage nodes 110 of the cluster 108. In such embodiments, the proxy computing node 106 may be configured to determine the subset of storage nodes 110 at which to store at least a portion of the data object (i.e., an encoded portion of the data object) , as described below in FIG. 4. In another example, in a symmetric distributed storage platform, such as Sheepdog, each storage node has its own latency table that only includes a latency value between that storage node and the other storage nodes 110 of the cluster 108. In such embodiments, the proxy computing node 106 may be configured transmit the received data object to be stored to one of the storage nodes 110, which may then determine the subset of storage nodes 110 at which to store at least a portion of the data object (i.e., an encoded portion of the data object) , as described below in FIG. 5.

Referring now to FIG. 4, in an illustrative embodiment, the proxy computing node 106 establishes an environment 400 during operation. The illustrative environment 400 includes a network communication module 410, a storage node identification module 420, an erasure code implementation module 430, and a storage node determination module 440. Each of the modules, logic, and other components of the environment 400 may be embodied as hardware, software, firmware, or a combination thereof. For example, each of the modules, logic, and other components of the environment 400 may form a portion of, or otherwise be established by, the processor 202, the communication circuitry 210 (e.g., the NIC 212) , and/or other hardware components of the proxy computing node 106. As such, in some embodiments, one or more of the modules of the environment 400 may be embodied as circuitry or a collection of electrical devices (e.g., network communication circuitry 410, storage node identification circuitry 420, erasure code implementation circuitry 430, storage node determination circuitry 440, etc. ) .

In the illustrative environment 400, the proxy computing node 106 includes storage node data 402, erasure code data 404, latency data 406, and priority data 408, each of which may be accessed by the various modules and/or sub-modules of the proxy computing node 106. It should be appreciated that the proxy computing node 106 may include other components, sub-components, modules, sub-modules, and/or devices commonly found in a computing node, which are not illustrated in FIG. 4 for clarity of the description.

The network communication module 410 is configured to facilitate inbound and outbound network communications (e.g., network traffic, network packets, network flows, etc. ) to and from the proxy computing node 106. To do so, the network communication module 410 is configured to receive and process network packets from other computing devices (e.g., the endpoint computing node 102, the storage nodes 110, and/or other computing device (s) communicatively coupled via the network 104) . Additionally, the network communication module 410 is configured to prepare and transmit network packets to another computing device (e.g., the endpoint computing node 102, the storage nodes 110, and/or other computing device (s) communicatively coupled via the network 104) . Accordingly, in some embodiments, at least a portion of the functionality of the network communication module 410 may be performed by the communication circuitry 210, and more specifically by the NIC 212.

The storage node identification module 420 is configured to identify each of the storage nodes 110 communicatively coupled via the proxy computing node 106. In other words, the storage node identification module 420 is configured to identify a topology of the cluster 108 (see, e.g., the

storage node clusters

700 and 900 of FIGS. 7 and 9, respectively) . It should be appreciated that any known technology for identifying the storage nodes 110 of the cluster 108 may be implemented by the storage node identification module 420. In some embodiments, the identified storage nodes (e.g., the topology) may be stored in the storage node data 402.

The erasure code implementation module 430 is configured to implement an erasure code to the data object received for storage. In other words, the erasure code implementation module 430 is configured to transform or otherwise encode the received data object based on the erasure code, as well as rebuild the data object from the encoded data objects. For example, an erasure code 10 of 15 configuration, or EC 10/15, includes fifteen erasure code elements, or symbols, comprised of ten data elements, or base symbols, and five parity elements, or extra symbols. Accordingly, the erasure code implementation module 430 is configured to encode the erasure code elements, each of which is to be stored at a different one of the subset of storage nodes 110. In such an embodiment, each of the fifteen erasure code elements would be stored across fifteen different storage nodes of subset of storage nodes 110. Additionally, the erasure code implementation module 430 is configured to rebuild the data object (e.g., as a result of a detected corruption) . In furtherance of the previous example, the original data of the data object could be reconstructed from ten of the verified erasure code elements, or fragments. In some embodiments, the erasure code may be stored in the erasure code data 404.

The storage node determination module 440 is configured to determine which of the storage nodes 110 at which to store each of the erasure code elements. In other words, the storage node determination module 440 is configured to determine a subset from the storage nodes 110, each of which is to receive and store a different erasure code element. To do so, the illustrative storage node determination module 440 includes a latency analysis module 442 and/or a priority analysis module 444.

The latency analysis module 442 is configured to analyze a latency table that includes a latency value between each of the storage nodes 110 and determine the subset of storage nodes 110 at which to store the erasure code elements based on the latency values. To do so, the latency analysis module 442 may be configured to query or otherwise receive the latency values determined at each of the storage nodes 110. For example, the latency analysis module 442 may be configured to determine the subset of the storage nodes 110 based on a distance-aware policy. In such an embodiment, the latency analysis module 442 may be configured to select the storage nodes 110 having the smallest latency (i.e., indicating a more proximate location as compared to the other storage nodes) to keep the erasure code elements as close as possible to a particular one of the storage nodes 110. In some embodiments, the latency table may be stored in the latency data 406.

The priority analysis module 444 is configured to analyze a priority table (e.g., retrieved from a policy) and determine the subset of storage nodes 110 at which to store the erasure code elements based on the priority values. For example, the priority analysis module 444 may be configured to determine the subset of the storage nodes 110 based on a priority ranked between high priority (e.g., a low value) and a low priority (e.g., a high value) . In some embodiments, an administrator of the storage nodes 110 may assign the priority to indicate a preference to use one storage node over another storage node, for example when one storage node has a shorter end-of-life (i.e., scheduled to be replaced in a shorter period of time) than another storage node. Accordingly, the priority analysis module 444 may determine to store the erasure code elements at the higher priority storage nodes, rather than the lower priority storage nodes. In some embodiments, the priority table may be stored in the priority data 408.

It should be appreciated that, in some embodiments, the storage node determination module 440 may first perform a latency analysis (e.g., via the latency analysis module 442) to identify candidate storage nodes from the storage nodes 110 of the cluster 108, and further refine the subset of storage nodes 110 from the identified candidate storage nodes based on the priority value (e.g., via the priority analysis module 444) , such as by serving as a tie-breaker when more than one of the storage nodes 110 has an equal latency value. Alternatively, in some embodiments, the storage node determination module 440 may first perform a priority analysis (e.g., via the priority analysis module 444) to identify the candidate storage nodes from the storage nodes 110 of the cluster 108, and further refine the subset of storage nodes 110 from the identified candidate storage nodes based on the latency value (e.g., via the latency analysis module 442) , such as by serving as a tie-breaker when more than one of the storage nodes 110 has an equal priority value.

Referring now to FIG. 5, in an illustrative embodiment, one of the storage nodes 110 establishes an environment 500 during operation. The illustrative environment 500 includes a network communication module 510, a latency determination module 520, an erasure code implementation module 530, and a storage node determination module 540. Each of the modules, logic, and other components of the environment 500 may be embodied as hardware, software, firmware, or a combination thereof. For example, each of the modules, logic, and other components of the environment 500 may form a portion of, or otherwise be established by, the processor 302, the communication circuitry 310 (e.g., the NIC 312) , and/or other hardware components of the storage node. As such, in some embodiments, one or more of the modules of the environment 500 may be embodied as circuitry or a collection of electrical devices (e.g., network communication circuitry 510, latency determination circuitry 520, erasure code implementation circuitry 530, storage node determination circuitry 540, etc. ) .

In the illustrative environment 500, similar to the illustrative environment 400 of the proxy computing node 106, the storage node includes storage node data 502, erasure code data 504, latency data 506, and priority data 508, each of which may be accessed by the various modules and/or sub-modules of the storage node. It should be appreciated that the storage node may include other components, sub-components, modules, sub-modules, and/or devices commonly found in a computing node, which are not illustrated in FIG. 5 for clarity of the description.

The network communication module 510, similar to the network communication module 410 of the proxy computing node 106 of FIG. 4, is configured to facilitate inbound and outbound network communications (e.g., network traffic, network packets, network flows, etc. ) to and from the storage node. To do so, the network communication module 510 is configured to receive and process network packets from other computing devices (e.g., the proxy computing node 106, other storage nodes 110, and/or other computing device (s) communicatively coupled via the network 104) . Additionally, the network communication module 510 is configured to prepare and transmit network packets to another computing device (e.g., the proxy computing node 106, other storage nodes 110, and/or other computing device (s) communicatively coupled via the network 104) . Accordingly, in some embodiments, at least a portion of the functionality of the network communication module 510 may be performed by the communication circuitry 310, and more specifically by the NIC 312.

The latency determination module 520 is configured to determine a latency value between the storage node and each of the other storage nodes 110 of the cluster 108. To do so, the latency determination module 520 is configured to transmit a network packet to each of the other storage nodes 110 and receive a response network packet (i.e., an acknowledgement) from each of the other storage nodes 110. Accordingly, the latency value can be determined relative to an amount of time measured to have passed between transmitting and receiving the latency determination network packets.

The erasure code implementation module 530, similar to the erasure code implementation module 430 of the proxy computing node 106 of FIG. 4, is configured to implement an erasure code to the data object received for storage. In other words, the erasure code implementation module 530 is configured to encode the received data object based on the erasure code, as well as rebuild the data object from the encoded data objects. For example, an erasure code 4 of 6 configuration (a. k. a. EC 4/6) includes six erasure code elements comprised of four data elements and two parity elements. In other words, two storage nodes could be lost or unavailable, and the original data (e.g., a file) could be recoverable for the other four storage nodes. Accordingly, in such a configuration, the erasure code implementation module 430 is configured to encode the erasure code elements, each of which is to be stored at a different one of six of the storage nodes 110 (i.e., the subset of storage nodes 110) . Additionally, the erasure code implementation module 430 is configured to rebuild the data object (e.g., as a result of a detected corruption) from four of the erasure code elements. In some embodiments, the erasure code may be stored in the erasure code data 504.

The storage node determination module 540, similar to the storage node determination module 440 of the proxy computing node 106 of FIG. 4, is configured to determine which of the storage nodes 110 at which to store each of the erasure code elements. In other words, the storage node determination module 540 is configured to determine a subset from the storage nodes 110, each of which is to receive and store a different erasure code element. It should be appreciated that, since one of the erasure components is to be stored local to the storage node, the storage node determination module 540 is configured to determine the subset of storage nodes 110 having a total number of storage nodes equal to the number of erasure components (i.e., six, in the above example) minus one (i.e., five, in the above example) . To determine which of the storage nodes 110 to store each of the erasure code elements, the illustrative storage node determination module 540 includes a latency analysis module 542 and/or a priority analysis module 544.

The latency analysis module 542, similar to the latency analysis module 442 of the proxy computing node 106 of FIG. 4, is configured to analyze a latency table that includes a latency value between the storage node and each of the other storage nodes 110 of the cluster 108, as well as determine the subset of storage nodes 110 at which to store the erasure code elements based on the latency values. For example, the latency analysis module 542 may be configured to determine the subset of the storage nodes 110 based on a distance-aware policy. In such an embodiment, the latency analysis module 542 may be configured to select the storage nodes 110 having the smallest latency (i.e., indicating a more proximate location as compared to the other storage nodes) to keep the erasure code elements as close as possible to a particular one of the storage nodes 110. It should be appreciated that, in some embodiments, a copy of the latency table may be transmitted to and stored at the proxy computing node 106, such as in a master latency table maintained by the proxy computing node 106. In some embodiments, the latency table may be stored in the latency data 506.

The priority analysis module 544, similar to the priority analysis module 444 of the proxy computing node 106 of FIG. 4, is configured to analyze a priority table (e.g., retrieved from a policy) and determine the subset of storage nodes 110 at which to store the erasure code elements based on the priority values. For example, the priority analysis module 544 may be configured to determine the subset of the storage nodes 110 based on a priority (e.g., ranked between one and ten) ranked between high priority (e.g., a low value, the lowest being one) and a low priority (e.g., a high value, the highest being ten) . In some embodiments, an administrator of the storage nodes 110 may assign the priority to indicate a preference to use one storage node over another storage node, for example when one storage node has a shorter end-of-life (i.e., scheduled to be replaced in a shorter period of time) than another storage node. Accordingly, the priority analysis module 544 may determine to store the erasure code elements at the higher priority storage nodes, rather than the lower priority storage nodes. In some embodiments, the priority table may be stored in the priority data 508.

It should be appreciated that, in some embodiments, the storage node determination module 540 may first perform a latency analysis (e.g., via the latency analysis module 542) to identify candidate storage nodes from the storage nodes 110 of the cluster, and further refine the subset of storage nodes 110 from the identified candidate storage nodes based on the priority value (e.g., via the priority analysis module 544) , such as by serving as a tie-breaker when more than one of the storage nodes 110 has an equal latency value. Alternatively, in some embodiments, the storage node determination module 540 may first perform a priority analysis (e.g., via the priority analysis module 544) to identify the candidate storage nodes from the storage nodes 110 of the cluster 108, and further refine the subset of storage nodes 110 from the identified candidate storage nodes based on the latency value (e.g., via the latency analysis module 542) , such as by serving as a tie-breaker when more than one of the storage nodes 110 has an equal priority value.

Referring now to FIG. 6, in use, one of the storage nodes 110 may execute a method 600 for determining latency in a storage node cluster (e.g., the storage node cluster 108 of FIG. 1) . It should be appreciated that the method 600 may be executed upon initiation by an administrator, detection of a new storage node 110 to the storage node cluster 108, and/or any other actionable event configured to trigger the latency determination, which may be triggered automatically. It should be further appreciated that, in some embodiments, the method 600 may be embodied as various instructions stored on a computer-readable media, which may be executed by the processor 302, the NIC 312, and/or other components of the storage node to cause the storage node to perform the method 600. The computer-readable media may be embodied as any type of media capable of being read by the storage node including, but not limited to, the memory 306, the data storage device 308, a local memory of the NIC 312, other memory or data storage devices of the storage node, portable media readable by a peripheral device of the storage node, and/or other media.

The method 600 begins with block 602, in which the storage node determines whether a latency check was initiated. If so, the method 600 proceeds to block 604, wherein the storage node determines the other storage nodes of the storage node cluster 108 in which the storage node performing the latency check resides. One such embodiment of a cluster of storage nodes 700 is shown in FIG. 7, which includes a first rack 710 of storage nodes (i.e., storage node (A) 712, storage node (B) 714, storage node (C) 716, and storage node (D) 718) and a second rack 720 of storage nodes (i.e., storage node (E) 722, storage node (F) 724, storage node (G) 726, and storage node (H) 728) , each of which are communicatively coupled to the proxy computing node 106.

It should be appreciated that the first and

second racks

710, 720, respectively, may be located in different nodes, racks, geographical locations, etc. In some embodiments, the other storage nodes may be determined by the storage node (e.g., via a message broadcast and responses received from the other storage nodes) or by an external computing device, such as the proxy computing node 106 or other monitoring/administration computing node (e.g., a controller/orchestrator computing node) communicatively coupled to the storage node.

Accordingly, the latencies between each of the

storage nodes

712, 714, 716, 718 at the first rack 710 may be less than the latency between one of the

storage nodes

712, 714, 716, 718 at the first rack 710 and one or more of the

storage nodes

722, 724, 726, 728 at the second rack 720, due to the proximate location (i.e., shorter network packet path distance) of the first rack 710 relative to the second rack 720. For example, as shown in an illustrative master latency table 800 of FIG. 8, the latency value (in milliseconds for the present example) between storage node (A) 712 and storage node (B) 714 is 5ms, while the latency value between storage node (A) 712 and storage node (C) 716 is 10ms and the latency value between storage node (A) 712 and storage node (D) 718 is also 10ms. As also shown in the master latency table 800, the latency value between storage node (A) 712 and storage node (E) 722 and storage node (F) 724 is 50ms, while the latency value between storage node (A) 712 and storage node (G) 726 and storage node (H) 728 is 45ms.

Another such embodiment of a cluster of storage nodes 900 is shown in FIG. 10. The cluster of storage nodes 900 includes four additional storage nodes (i.e., storage node (H) 902, storage node (I) 904, storage node (J) 906, and storage node (K) 908) in the first rack 710 of storage nodes, each of which are also communicatively coupled to the proxy computing node 106. In another example of latency values, as shown in FIG. 10, an illustrative master latency table 1000 indicates the latency value (in milliseconds for the present example) between storage node (A) 712 and each of storage node (H) 902, storage node (I) 904, storage node (J) 906, and storage node (K) 908 is 10ms, likely indicating the additional storage nodes (i.e., storage node (H) 902, storage node (I) 904, storage node (J) 906, and storage node (K) 908) are more proximately located to the original storage nodes (i.e., storage node (A) 712, storage node (B) 714, storage node (C) 716, and storage node (D) 718) of the first rack 710.

Referring again to FIG. 6, in block 606, the storage node determines the latency for each of the other storage nodes 110 of the cluster 108, such as may be embodied in a like architecture of the cluster of

storage nodes

700, 900. For example, the storage node may, in block 608, generate a message for each of the other storage nodes 110 determined in block 604. Additionally, in block 610, the storage node may transmit each of the messages generated in block 608 to a corresponding one of the other storage nodes. Further, in block 612, the storage node may determine the latency value for each of the storage nodes based on a measured duration of time between the storage node having transmitted the message and received a response to the message from that other storage node.

For example, the storage node may use a clock of the storage node to log timestamps corresponding to when the message was transmitted from the storage node and the response to the message was received by the storage node from one of the other storage nodes that received the message. Accordingly, a round-trip-time may be determined as a function of the comparison between the logged timestamps. It should be appreciated that, in some embodiments, the latency may be determined for a single storage node of the cluster 108, such as when a new storage node is added to the storage node cluster 108, rather than for each of the other storage nodes of the cluster 108.

In block 614, the storage node stores the determined latencies. As described previously, depending on the distributed storage setup of the storage nodes 110 (e.g., symmetric versus non-symmetric distribution) , the proxy computing node 106 and/or one or more of the storage nodes 110 may perform certain functions described herein (see, e.g., the method 1100 of FIG. 11) . Accordingly, the storage node may, in block 616, store the latencies local to the storage node, and/or the storage node may, in block 618, store the latencies in a location remote of the storage node (e.g., the proxy computing node 106) before the method 600 returns to block 602 to determine whether another latency check was initiated.

Referring now to FIG. 11, in use, the proxy computing node 106 or one of the storage nodes in a storage node cluster (e.g., one of the storage nodes 110 of the storage node cluster 108) , depending on the embodiment, may execute a method 1100 for determining a subset of storage nodes of a cluster at which to store an erasure code element. It should be appreciated that, as described previously, depending on the distributed storage setup of the storage nodes (e.g., symmetric versus non-symmetric distribution) , the proxy computing node 106 and/or one or more of the storage nodes of the cluster may perform the functions of the method 1100. However, to preserve clarity of the description, the functions of the method 1100 as described herein will be described from the perspective of one of the storage nodes (i.e., storage node (A) 712 as used herein) of the storage nodes (see, e.g., the storage node cluster 700 of FIG. 7 and the storage node cluster 900 of FIG. 9) performing the functions of the method 1100 as described herein. Accordingly, the storage node (A) 712 may only have a list of the latency values determined by the storage node (A) 712, as indicated by the highlighted first column/row of FIG. 10, as opposed to a master latency table that includes the latency values for each of the storage nodes of the cluster.

It should be further appreciated that, in some embodiments, the method 1100 may be embodied as various instructions stored on a computer-readable media, which may be executed by the processor 302, the NIC 312, and/or other components of the storage node (A) 712 to cause the storage node (A) 712 to perform the method 1100. The computer-readable media may be embodied as any type of media capable of being read by the storage node (A) 712 including, but not limited to, the memory 306, the data storage device 308, a local memory of the NIC 312, other memory or data storage devices of the storage node (A) 712, portable media readable by a peripheral device of the storage node (A) 712, and/or other media.

The method 1100 begins with block 1102, in which the storage node (A) 712 determines whether a data object to be stored was received, such as via the proxy computing node 106. If so, the method 1100 continues to block 1104, wherein the storage node (A) 712 determines an erasure code to implement. It should be appreciated that, in some embodiments, the particular erasure code implemented may depend on one or more characteristics of the data associated with the data object, such as a workload type, a flow, a tuple of identifying elements, etc.

In block 1106, the storage node (A) 712 determines a subset of storage nodes from the storage nodes of the cluster at which to store the received data. To do so, in block 1108, the storage node (A) 712 determines a total number of the subset of storage nodes based on the erasure code determined in block 1104. For example, the storage node (A) 712 may have determined to implement an erasure code 4 of 6, or EC 4/6, configuration. Accordingly, in such an embodiment, the total number of the subset of storage nodes is five (i.e., six minus one, since one of the erasure code elements is to be stored at the storage node (A) 712) .

Additionally, in block 1110, the storage node (A) 712 determines the subset of storage nodes based on a latency value determined between the storage node (A) 712 and each of the other storage nodes of the cluster (see, e.g., the method 600 of FIG. 6) . In furtherance of the above example in which the storage node (A) 712 is implementing an EC 4/6 configuration, the storage node (A) 712 is configured to select five other storage nodes of the storage nodes. In such an embodiment, based on the latency values of the master latency table 800 shown in FIG. 8 for the storage node cluster 700 of FIG. 7, the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, storage node (D) 718, storage node (G) 726, and storage node (H) 728, since each of storage node (E) 722 and storage node (F) 724 have latency values (i.e., 50ms) that exceed the other available storage nodes of the cluster.

Alternatively, in another such embodiment, based on the latency values of the master latency table 1000 of FIG. 10 for the storage node cluster 900 of FIG. 9, the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, and storage node (D) 718, as well as two storage nodes from storage node (I) 902, storage node (J) 904, storage node (K) 906, and storage node (L) 908, since each of storage node (E) 722, storage node (F) 724, storage node (G) 726, and storage node (H) 728 have latency values (i.e., 45-50ms) that exceed the other available storage nodes of the cluster. As noted previously, it should be appreciated that although the master latency tables 800 and 1000 are referenced above, the storage node (A) 712 may only have those latency values determined by the storage node (A) 712, as indicated by the highlighted first column/row of FIGS. 8 and 10, rather than the entirety of latency values as shown in the master latency tables 800 and 1000.

In some embodiments, in addition or alternative to the latency determination, in block 1112, the storage node (A) 712 may determine the subset of storage nodes based on a priority value assigned to each of the other storage nodes of the cluster. For example, as shown in FIG. 12, an illustrative master priority table 1200 includes a priority value for each of the storage nodes of the storage node cluster 700 of FIG. 7. As described previously, each of the priority values indicate a preference of use for that particular storage node. For example, a storage node that has 2 years or less remaining in operation (e.g., before it is removed from the storage node cluster) may be given a priority value of 10, while a storage node that has 1 year or less remaining in operation may be assigned a value of 5 and a storage node that is not anticipated to be taken out of the storage node cluster may be assigned a value of 1. In other words, a lower value priority indicates a higher preference of use. Accordingly, in an embodiment determining the subset of storage nodes based on the priority values will select those storage nodes with a lower priority value (i.e., a higher preference) over those storage nodes with a higher priority value (i.e., a lower preference) .

For example, in an embodiment implementing the EC 4/6 configuration described above, the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, storage node (E) 722, and storage node (F) 724, each of which having a priority value equal to one, as well as one storage nodes from storage node (G) 726 and storage node (H) 728, since each have the same priority value (i.e., 5) , as the five storage nodes of the subset to transfer an erasure code element to. Storage node (D) 718, having a priority value (i.e., 10) higher than the other available storage nodes of the cluster would not be in consideration for the subset.

It should be appreciated that, in some embodiments, the latency determination in block 1110 and the priority determination in block 1112 may both be used. For example, referring again to the master latency table 800 of FIG. 8, wherein the storage node (A) 712 is using an EC 3/5 configuration (i.e., the storage node (A) 712 needs to identify four other storage nodes at which to store an erasure code element) , the storage node (A) 712 would select storage node (B) 714, storage node (C) 716, and storage node (D) 718, since each of those storage nodes have the three lowest latency values (i.e., 5ms, 10ms, and 10ms, respectively) . Additionally, storage node (A) 712 would select one storage node from storage node (G) 726 and storage node (H) 728, since both of those storage nodes have the next lowest latency value (i.e., 45ms) . Accordingly, in such an embodiment, the storage node (A) 712 may determine the last storage node from between the storage node (G) 726 and the storage node (H) 728 based on the priority value assigned to each storage node. As such, as shown in FIG. 12, the storage node (A) 712 would select storage node (G) 726, as it has the lowest priority value (i.e., the higher preference of use) between the storage node (G) 726 (i.e., a priority value of 1) and the storage node (H) 728 (i.e., a priority value of 5) .

Referring back to FIG. 11, in block 1114 the storage node (A) 712 encodes the received data object based on the erasure code configuration determined in block 1104. In other words, the storage node (A) 712 transforms the data of the data object into a number of erasure code elements, as described above, according to the implemented erasure code configuration. In block 1116, the storage node (A) 712 stores one of the erasure code elements local to the storage node (A) 712 (e.g., in memory 306) . In block 1118, the storage node (A) 712 transmits each of the remaining erasure code elements to corresponding storage nodes of the subset of storage nodes determined in block 1106 before the method returns to block 1102 to determine whether another data object was received.

EXAMPLES

Illustrative examples of the technologies disclosed herein are provided below. An embodiment of the technologies may include any one or more, and any combination of, the examples described below.

Example 1 includes a proxy computing node for implementing adaptive erasure code, the proxy computing node comprising one or more processors； and one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, cause the proxy computing node to receive a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is further communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node； determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes； transform the data object based on the erasure code； and transmit a different portion of the transformed data object to each of the subset of storage nodes.

Example 2 includes the subject matter of Example 1, and wherein the plurality of instructions further cause the proxy computing node to receive the latency value for each of the storage nodes from each of the storage nodes.

Example 3 includes the subject matter of any of Examples 1 and 2, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.

Example 4 includes the subject matter of any of Examples 1-3, and wherein to select each storage node of the subset of the plurality of storage nodes comprises to select each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.

Example 5 includes the subject matter of any of Examples 1-4, and wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.

Example 6 includes the subject matter of any of Examples 1-5, and wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.

Example 7 includes the subject matter of any of Examples 1-6, and wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.

Example 8 includes the subject matter of any of Examples 1-7, and wherein the plurality of instructions further cause the proxy computing node to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.

Example 9 includes the subject matter of any of Examples 1-8, and wherein the plurality of instructions further cause the proxy computing node to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.

Example 10 includes the subject matter of any of Examples 1-9, and wherein to determine the subset of the plurality of storage nodes based on priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.

Example 11 includes a method for implementing adaptive erasure code, the method comprising receiving, by a proxy computing node, a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node； determining, by the proxy computing node, a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes； transforming, by the proxy computing node, the data object based on the erasure code； and transmitting, by the proxy computing node, a different portion of the transformed data object to each of the subset of storage nodes.

Example 12 includes the subject matter of Example 11, and further including receiving, by the proxy computing node, the latency value for each of the storage nodes from each of the storage nodes.

Example 13 includes the subject matter of any of Examples 11 and 12, and wherein determining the subset of the plurality of storage nodes further comprises selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.

Example 14 includes the subject matter of any of Examples 11-13, and wherein selecting each storage node of the subset of the plurality of storage nodes comprises selecting each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.

Example 15 includes the subject matter of any of Examples 11-14, and wherein determining the subset of the plurality of storage nodes further comprises determining a total number of the subset based on the erasure code.

Example 16 includes the subject matter of any of Examples 11-15, and wherein determining the subset of the plurality of storage nodes further comprises determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.

Example 17 includes the subject matter of any of Examples 11-16, and wherein determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes

Example 18 includes the subject matter of any of Examples 11-17, and further including determining, by the proxy computing node, the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.

Example 19 includes the subject matter of any of Examples 11-18, and further including determining, by the proxy computing node, the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.

Example 20 includes the subject matter of any of Examples 11-19, and wherein determining the subset of the plurality of storage nodes based on priority value comprises comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.

Example 21 includes a proxy computing node comprising a processor； and a memory having stored therein a plurality of instructions that when executed by the processor cause the proxy computing node to perform the method of any of Examples 11-20.

Example 22 includes one or more machine readable storage media comprising a plurality of instructions stored thereon that in response to being executed result in a proxy computing node performing the method of any of Examples 11-20.

Example 23 includes a proxy computing node for implementing adaptive erasure code, the proxy computing node comprising network communication circuitry to receive a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is further communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node； storage node determination circuitry to determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes； and erasure code implementation circuitry to transform the data object based on the erasure code, wherein the network communication circuitry is further to transmit a different portion of the transformed data object to each of the subset of storage nodes.

Example 24 includes the subject matter of Example 23, and wherein the storage node determination circuitry is further to receive the latency value for each of the storage nodes from each of the storage nodes.

Example 25 includes the subject matter of any of Examples 23 and 24, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.

Example 26 includes the subject matter of any of Examples 23-25, and wherein to select each storage node of the subset of the plurality of storage nodes comprises to select each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.

Example 27 includes the subject matter of any of Examples 23-26, and wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.

Example 28 includes the subject matter of any of Examples 23-27, and wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.

Example 29 includes the subject matter of any of Examples 23-28, and wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.

Example 30 includes the subject matter of any of Examples 23-29, and, wherein the storage node determination circuitry is further to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.

Example 31 includes the subject matter of any of Examples 23-30, and wherein the storage node determination circuitry is further to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.

Example 32 includes the subject matter of any of Examples 23-31, and wherein to determine the subset of the plurality of storage nodes based on priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.

Example 33 includes a proxy computing node for implementing adaptive erasure code, the proxy computing node comprising network communication circuitry to receive a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is further communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node； means for determining a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes； and means for transforming the data object based on the erasure code, wherein the network communication circuitry is further to transmit a different portion of the transformed data object to each of the subset of storage nodes.

Example 34 includes the subject matter of Example 33, and wherein the network communication circuitry is further to receive the latency value for each of the storage nodes from each of the storage nodes.

Example 35 includes the subject matter of any of Examples 33 and 34, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.

Example 36 includes the subject matter of any of Examples 33-35, and wherein the means for selecting each storage node of the subset of the plurality of storage nodes comprises means for selecting each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.

Example 37 includes the subject matter of any of Examples 33-36, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for determining a total number of the subset based on the erasure code.

Example 38 includes the subject matter of any of Examples 33-37, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.

Example 39 includes the subject matter of any of Examples 33-38, and wherein the means for determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.

Example 40 includes the subject matter of any of Examples 33-39, and further including means for determining the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.

Example 41 includes the subject matter of any of Examples 33-40, and further including means for determining the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.

Example 42 includes the subject matter of any of Examples 33-41, and wherein the means for determining the subset of the plurality of storage nodes based on priority value comprises means for comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.

Example 43 includes a storage node for implementing adaptive erasure code, the storage node comprising one or more processors； and one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, cause the storage node to receive a data object from a proxy computing node communicatively coupled to the storage node in a storage node cluster, wherein the storage node is communicatively coupled to a plurality of other storage nodes of the storage node cluster； determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between the storage node and a corresponding other storage node of the plurality of storage nodes； transform the data object into a plurality of erasure code elements based on the erasure code； and transmit each of the erasure code elements to a corresponding one of the subset of the plurality of storage nodes.

Example 44 includes the subject matter of Example 43, and wherein the plurality of instructions further cause the storage node to (i) determine the latency value for each of the plurality of storage nodes and (ii) store, local to the storage node, the determined latency value for each of the plurality of storage nodes.

Example 45 includes the subject matter of any of Examples 43 and 44, and wherein to determine the latency value comprises to (i) generate a message for one of the plurality of storage nodes, (ii) broadcast the generated messages to the one of the plurality of storage nodes, (iii) receive an acknowledgment from the one of the plurality of storage nodes, and (iv) determine the latency value for the one of the plurality of storage nodes based at least in part on a round-trip-time associated with an elapsed duration of time between having broadcasted the generated message and having received the acknowledgment.

Example 46 includes the subject matter of any of Examples 43-45, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node.

Example 47 includes the subject matter of any of Examples 43-46, and wherein to select each storage node of the subset of the plurality of storage nodes comprises each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.

Example 48 includes the subject matter of any of Examples 43-47, and wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.

Example 49 includes the subject matter of any of Examples 43-48, and wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.

Example 50 includes the subject matter of any of Examples 43-49, and wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes

Example 51 includes the subject matter of any of Examples 43-50, and wherein the plurality of instructions further cause the storage node to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.

Example 52 includes the subject matter of any of Examples 43-51, and wherein the plurality of instructions further cause the storage node to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.

Example 53 includes the subject matter of any of Examples 43-52, and wherein to determine the subset of the plurality of storage nodes based on priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.

Example 54 includes a method for adaptive erasure code, the method comprising receiving, by a storage node of a storage node cluster that includes a plurality of storage nodes and a proxy computing node, a data object from the proxy computing node； determining, by the storage node, a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between the storage node and a corresponding other storage node of the plurality of storage nodes； transforming, by the storage node, the data object based on the erasure code； and transmitting, by the storage node, a portion of the transformed data object to corresponding storage nodes of the subset of the plurality of storage nodes.

Example 55 includes the subject matter of Example 54, and further including determining, by the storage node, the latency value for each of the plurality of storage nodes and storing, local to the storage node, the determined latency value for each of the plurality of storage nodes.

Example 56 includes the subject matter of any of Examples 54 and 55, and wherein determining the latency value comprises (i) generating a message for one of the plurality of storage nodes, (ii) broadcasting the generated messages to the one of the plurality of storage nodes, (iii) receiving an acknowledgment from the one of the plurality of storage nodes, and (iv) determining the latency value for the one of the plurality of storage nodes based at least in part on a round-trip-time associated with an elapsed duration of time between broadcasting the generated message and receiving the acknowledgment.

Example 57 includes the subject matter of any of Examples 54-56, and wherein determining the subset of the plurality of storage nodes further comprises selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node.

Example 58 includes the subject matter of any of Examples 54-57, and wherein selecting each storage node of the subset of the plurality of storage nodes comprises each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.

Example 59 includes the subject matter of any of Examples 54-58, and wherein determining the subset of the plurality of storage nodes further comprises determining a total number of the subset based on the erasure code.

Example 60 includes the subject matter of any of Examples 54-59, and wherein determining the subset of the plurality of storage nodes further comprises determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.

Example 61 includes the subject matter of any of Examples 54-60, and wherein determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.

Example 62 includes the subject matter of any of Examples 54-61, and further including determining, by the storage node, the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.

Example 63 includes the subject matter of any of Examples 54-62, and further including determining, by the storage node, the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.

Example 64 includes the subject matter of any of Examples 54-63, and wherein determining the subset of the plurality of storage nodes based on the priority value comprises comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.

Example 65 includes a computing device comprising a processor； and a memory having stored therein a plurality of instructions that when executed by the processor cause the computing device to perform the method of any of Examples 54-64.

Example 66 includes one or more machine readable storage media comprising a plurality of instructions stored thereon that in response to being executed result in a computing device performing the method of any of Examples 54-64.

Example 67 includes a storage node for implementing adaptive erasure code, the storage node comprising network communication circuitry to receive a data object from a proxy computing node communicatively coupled to the storage node in a storage node cluster, wherein the storage node is communicatively coupled to a plurality of other storage nodes of the storage node cluster； storage node determination circuitry to determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between the storage node and a corresponding other storage node of the plurality of storage nodes； and erasure code implementation circuitry to transform the data object into a plurality of erasure code elements based on the erasure code, wherein the network communication circuitry is further to transmit each of the erasure code elements to a corresponding one of the subset of the plurality of storage nodes.

Example 68 includes the subject matter of Example 67, and further including latency determination circuitry to (i) determine the latency value for each of the plurality of storage nodes and (ii) store, local to the storage node, the determined latency value for each of the plurality of storage nodes.

Example 69 includes the subject matter of any of Examples 67 and 68, and wherein to determine the latency value comprises to (i) generate a message for one of the plurality of storage nodes, (ii) broadcast the generated messages to the one of the plurality of storage nodes, (iii) receive an acknowledgment from the one of the plurality of storage nodes, and (iv) determine the latency value for the one of the plurality of storage nodes based at least in part on a round-trip-time associated with an elapsed duration of time between having broadcasted the generated message and having received the acknowledgment.

Example 70 includes the subject matter of any of Examples 67-69, and wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node.

Example 71 includes the subject matter of any of Examples 67-70, and wherein to select each storage node of the subset of the plurality of storage nodes comprises each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.

Example 72 includes the subject matter of any of Examples 67-71, and wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.

Example 73 includes the subject matter of any of Examples 67-72, and wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.

Example 74 includes the subject matter of any of Examples 67-73, and wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.

Example 75 includes the subject matter of any of Examples 67-74, and wherein the storage node determination module is further to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.

Example 76 includes the subject matter of any of Examples 67-75, and wherein the storage node determination module is further to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.

Example 77 includes the subject matter of any of Examples 67-76, and wherein to determine the subset of the plurality of storage nodes based on the priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.

Example 78 includes a storage node for implementing adaptive erasure code, the storage node comprising network communication circuitry to receive a data object from a proxy computing node communicatively coupled to the storage node in a storage node cluster, wherein the storage node is communicatively coupled to a plurality of other storage nodes of the storage node cluster； means for determining a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between the storage node and a corresponding other storage node of the plurality of storage nodes； and means for transforming the data object into a plurality of erasure code elements based on the erasure code, wherein the network communication circuitry is further to transmit each of the erasure code elements to a corresponding one of the subset of the plurality of storage nodes.

Example 79 includes the subject matter of Example 78, and further including means for (i) determining the latency value for each of the plurality of storage nodes and (ii) storing, local to the storage node, the determined latency value for each of the plurality of storage nodes.

Example 80 includes the subject matter of any of Examples 78 and 79, and wherein the means for determining the latency value comprises means for (i) generating a message for one of the plurality of storage nodes, (ii) broadcasting the generated messages to the one of the plurality of storage nodes, (iii) receiving an acknowledgment from the one of the plurality of storage nodes, and (iv) determining the latency value for the one of the plurality of storage nodes based at least in part on a round-trip-time associated with an elapsed duration of time between having broadcasted the generated message and having received the acknowledgment.

Example 81 includes the subject matter of any of Examples 78-80, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node.

Example 82 includes the subject matter of any of Examples 78-81, and wherein the means for selecting each storage node of the subset of the plurality of storage nodes comprises each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.

Example 83 includes the subject matter of any of Examples 78-82, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for determining a total number of the subset based on the erasure code.

Example 84 includes the subject matter of any of Examples 78-83, and wherein the means for determining the subset of the plurality of storage nodes further comprises means for determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.

Example 85 includes the subject matter of any of Examples 78-84, and wherein the means for determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.

Example 86 includes the subject matter of any of Examples 78-85, and further including means for determining the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.

Example 87 includes the subject matter of any of Examples 78-86, and further including means for determining the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.

Example 88 includes the subject matter of any of Examples 78-87, and wherein the means for determining the subset of the plurality of storage nodes based on the priority value comprises means for comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.

Claims

A proxy computing node for implementing adaptive erasure code, the proxy computing node comprising:

one or more processors； and

one or more memory devices having stored therein a plurality of instructions that, when executed by the one or more processors, cause the proxy computing node to:

receive a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is further communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node；

determine a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes；

transform the data object based on the erasure code； and

transmit a different portion of the transformed data object to each storage node of the subset of storage nodes.
The proxy computing node of claim 1, wherein the plurality of instructions further cause the proxy computing node to receive the latency value for each of the storage nodes from each of the storage nodes.
The proxy computing node of claim 1, wherein to determine the subset of the plurality of storage nodes further comprises to select each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
The proxy computing node of claim 3, wherein to select each storage node of the subset of the plurality of storage nodes comprises to select each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
The proxy computing node of claim 1, wherein to determine the subset of the plurality of storage nodes further comprises to determine a total number of the subset based on the erasure code.
The proxy computing node of claim 1, wherein to determine the subset of the plurality of storage nodes further comprises to determine the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
The proxy computing node of claim 6, wherein to determine the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes.
The proxy computing node of claim 6, wherein the plurality of instructions further cause the proxy computing node to determine the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
The proxy computing node of claim 6, wherein the plurality of instructions further cause the proxy computing node to determine the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
The proxy computing node of claim 6, wherein to determine the subset of the plurality of storage nodes based on the priority value comprises to compare the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
A method for implementing adaptive erasure code, the method comprising:

receiving, by a proxy computing node, a data object from a client computing node communicatively coupled to the proxy computing node, wherein the proxy computing node is communicatively coupled to a plurality of storage nodes of a storage node cluster that includes the proxy computing node；

determining, by the proxy computing node, a subset of the plurality of storage nodes based on an erasure code and a latency value associated with each of the plurality of storage nodes, wherein the latency value is indicative of a communication latency between each of the storage nodes and the other storage nodes of the plurality of storage nodes；

transforming, by the proxy computing node, the data object based on the erasure code； and

transmitting, by the proxy computing node, a different portion of the transformed data object to each storage node of the subset of storage nodes.
The method of claim 11, further comprising receiving, by the proxy computing node, the latency value for each of the storage nodes from each of the storage nodes.
The method of claim 11, wherein determining the subset of the plurality of storage nodes further comprises selecting each storage node of the subset of the plurality of storage nodes based on the latency value associated with each selected storage node as compared to the other storage nodes.
The method of claim 13, wherein selecting each storage node of the subset of the plurality of storage nodes comprises selecting each storage node of the plurality of storage nodes having a latency value less than or equal to each of the latency values of the other storage nodes.
The method of claim 11, wherein determining the subset of the plurality of storage nodes further comprises determining a total number of the subset based on the erasure code.
The method of claim 11, wherein determining the subset of the plurality of storage nodes further comprises determining the subset of the plurality of storage nodes further based on a priority value assigned to each of the storage nodes, wherein the priority value is indicative of a preference of use relative to the other storage nodes.
The method of claim 16, wherein determining the subset of the plurality of storage nodes is further based on the priority value assigned to each of the storage nodes
The method of claim 16, further comprising determining, by the proxy computing node, the priority value assigned to each of the storage nodes based on a policy of the storage node cluster.
The method of claim 16, further comprising determining, by the proxy computing node, the priority value to assign to each of the plurality of storage nodes based on an expected end of use of each of the plurality of storage nodes.
The method of claim 16, wherein determining the subset of the plurality of storage nodes based on the priority value comprises comparing the priority value of two or more nodes of the subset, wherein the two or more nodes of the subset have the same latency value.
A proxy computing node comprising:

a processor； and

a memory having stored therein a plurality of instructions that when executed by the processor cause the proxy computing node to perform the method of any of claims 11-20.
One or more machine readable storage media comprising a plurality of instructions stored thereon that in response to being executed result in a proxy computing node performing the method of any of claims 11-20.
A proxy computing node comprising means for performing the method of any of claims 11-20.