CN116917883A - Distributed naming scheme for network attached storage devices - Google Patents

Distributed naming scheme for network attached storage devices Download PDF

Info

Publication number
CN116917883A
CN116917883A CN202180093306.4A CN202180093306A CN116917883A CN 116917883 A CN116917883 A CN 116917883A CN 202180093306 A CN202180093306 A CN 202180093306A CN 116917883 A CN116917883 A CN 116917883A
Authority
CN
China
Prior art keywords
data object
storage devices
storage device
data
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180093306.4A
Other languages
Chinese (zh)
Inventor
刘淳
胡潮红
廖鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN116917883A publication Critical patent/CN116917883A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning

Abstract

In some implementations, a computer-implemented method for processing data includes detecting a request for data processing operations originating from a network node. The request includes a data object identifier for a data object, a shard instance identifier, and availability information identifying multiple versions of the data object. A shard instance is selected by the shard instance identifier, the shard instance identifying a hash function and a subset of storage devices of a set of available storage devices within a distributed storage network. A modification request for the data processing operation is generated for each storage device in the subset. The modification request includes the data object identifier and a corresponding version of the plurality of versions. The modification request is sent to each storage device in the subset of storage devices for execution.

Description

Distributed naming scheme for network attached storage devices
Technical Field
The present disclosure relates to processing data access flows in computing devices, including distributed naming schemes for network attached storage devices.
Background
Storage systems are often associated with inefficiencies in the handling of streaming partial or unaligned updates. For example, performing a partial update to the storage device, e.g., an update to the less-than-device mapping unit, includes reading data that is greater than the partial update, modifying the data, and storing the modified data back. This approach suffers from data volatility (e.g., data volatility when reading data from the storage device) and acknowledgement latency (e.g., data update acknowledgements are sent after the update data is written back to the storage device). Furthermore, multiple computing devices accessing a shared storage device is typically implemented through the use of reservations, which is inefficient for accessing network storage such as network attached storage devices. Furthermore, conventional techniques for accessing storage devices are based on using a namespace for each device, which is a non-extensible, globally unique, or easily network-routed naming service.
Disclosure of Invention
Various examples are now described, some of which are introduced in simplified form, and which are further described in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
According to a first aspect of the present invention, a computer-implemented method for processing data in a distributed storage network is provided. The method comprises the following steps: a request for a data processing operation is detected that originates from a network node, the request including a data object identifier for a data object, a shard instance identifier, and availability information identifying multiple versions of the data object. A shard instance is selected from a plurality of available shard instances by the shard instance identifier, the shard instance identifying a hash function and a subset of storage devices of a set of available storage devices within the distributed storage network. A modification request for the data processing operation is generated for each storage device in the subset of storage devices. The modification request includes the data object identifier and a corresponding version of the plurality of versions of the data object. The modification request for the data processing operation is sent to each storage device of the subset of storage devices for execution.
In a first implementation of the method according to the first aspect, the data processing operation is a create operation, and the sending of the modification request to each of the subset of storage devices causes the create operation to be performed at each of the subset of storage devices and generates the data object.
In a second implementation form of the method according to the first aspect as such or any implementation form of the first aspect, sending the modification request to each storage device of the subset of storage devices further causes a data object naming sequence to be generated at each storage device of the subset of storage devices that identifies the data object.
In a third implementation form of the method according to the first aspect as such or any implementation form of the first aspect, the data object naming sequence comprises the data object identifier, sharding information based on the sharding instance identifier and the corresponding version of the plurality of versions of the data object based on the availability information.
In a fourth implementation form of the method according to the first aspect as such or any implementation form of the first aspect, an acknowledgement is received from each storage device of the subset of storage devices indicating a successful execution of the data processing operation associated with the data object. The validation includes identifying the data object naming sequence of the data object at the storage device.
In a fifth implementation form of the method according to the first aspect as such or any implementation form of the first aspect, the notification of the successful execution is sent to the network node. The notification includes the data object naming sequence identifying the data object at each storage device in the subset of storage devices.
In a sixth implementation form of the method according to the first aspect as such or any implementation form of the first aspect, the plurality of versions of the data object identified by the availability information comprises a plurality of erasure code versions associated with data information or parity information of the data object.
In a seventh implementation form of the method according to the first aspect as such or any implementation form of the first aspect, a confirmation is received from a first storage device of the subset of storage devices indicating that the data processing operation associated with the data object stored at the first storage device was not successfully performed. A storage device is selected from a second subset of storage devices of the set of available storage devices. Performing a data object reconstruction operation to use the data object stored by at least a second storage device in the subset and generate the data object at the selected storage device based on the plurality of erasure code versions associated with the data information or the parity information of the data object.
In an eighth implementation form of the method according to the first aspect as such or any implementation form of the first aspect, a second request for a second data processing operation originating from the network node is detected. The second request includes the data object identifier of the data object, a shard information tuple (shard information tuple, SIT) with the shard instance identifier and hash input information, and the availability information. The hash function of the sharded instance corresponding to the sharded instance identifier is applied to the hash input information of the SIT to identify a storage device in the subset of storage devices. The second request for the second data processing operation is sent to the storage devices in the subset of storage devices for execution. In response to an acknowledgement from the storage device indicating successful execution of the second data processing operation of the data object, a notification of the successful execution is sent to the network node.
In a ninth implementation form of the method according to the first aspect as such or any implementation form of the first aspect, the second data processing operation is an append operation and the second request further comprises a data segment for appending to the data object during execution of the append operation.
In a tenth implementation form of the method according to the first aspect as such or any implementation form of the first aspect, the second data processing operation is a read operation, the second request further comprises an offset segment indicating a starting address for performing the read operation on the data object at the storage device.
In an eleventh implementation form of the method according to the first aspect as such or any implementation form of the first aspect, the plurality of versions of the data object identified by the availability information comprises a plurality of redundancy versions associated with a corresponding plurality of copies of the data object.
In a twelfth implementation form of the method according to the first aspect as such or any implementation form of the first aspect, an acknowledgement is received from the storage devices of the subset of storage devices indicating that the second data processing operation associated with a first redundancy version of the plurality of redundancy versions of the first copy of the data object stored at the storage device was not successfully performed. Another storage device is selected from the subset of storage devices. The other storage device stores a second redundancy version of the plurality of redundancy versions of the second copy of the data object. A second request for the second data processing operation is sent to the other storage device for execution using the second copy of the data object associated with the second redundancy version.
According to a second aspect of the present invention, there is provided a system for processing data in a distributed storage network. The system includes a plurality of storage devices associated with the distributed storage network. The system also includes a processing circuit communicatively coupled to the plurality of storage devices. The processing circuitry is to perform operations comprising: a request for a data processing operation originating from a network node is detected. The request includes a data object identifier for a data object, a shard instance identifier, and availability information identifying multiple versions of the data object. A shard instance is selected from a plurality of available shard instances by the shard instance identifier. The sharded instance identifies a hash function and a subset of storage devices of the plurality of storage devices within the distributed storage network. A modification request for the data processing operation is generated for each storage device in the subset of storage devices. The modification request includes the data object identifier and a corresponding version of the plurality of versions of the data object. The modification request for the data processing operation is sent to each storage device of the subset of storage devices for execution.
In a first implementation of the system according to the second aspect, the plurality of versions of the data object identified by the availability information comprises a plurality of erasure code versions associated with data information or parity information of the data object.
In a second implementation form of the system according to the second aspect as such or any implementation form of the second aspect, the processing circuit is configured to perform operations comprising: an acknowledgement is received from a first storage device in the subset of storage devices. The acknowledgement indicates that the data processing operation associated with the data object stored at the first storage device was not successfully performed. A storage device is selected from a second subset of storage devices of the plurality of storage devices. Performing a data object reconstruction operation to use the data object stored by at least a second storage device in the subset and generate the data object at the selected storage device based on the plurality of erasure code versions associated with the data information or the parity information of the data object.
In a third implementation form of the system according to the second aspect as such or any implementation form of the second aspect, the processing circuit is configured to perform operations comprising: a second request for a second data processing operation originating from the network node is detected. The second request includes the data object identifier of the data object, a shard information tuple (shard information tuple, SIT) with the shard instance identifier and hash input information, and the availability information. The hash function of the sharded instance corresponding to the sharded instance identifier is applied to the hash input information of the SIT to identify a storage device in the subset of storage devices. The second request for the second data processing operation is sent to the storage devices in the subset of storage devices for execution. In response to an acknowledgement from the storage device indicating successful execution of the second data processing operation of the data object, a notification of the successful execution is sent to the network node.
According to a third aspect of the present invention, there is provided a non-transitory computer readable medium storing instructions for processing data in a distributed storage network. The instructions, when executed by one or more processors of at least one computing device, cause the one or more processors to perform operations comprising: a request for a data processing operation originating from a network node is detected. The request includes a data object identifier for a data object, a shard instance identifier, and availability information identifying multiple versions of the data object. A shard instance is selected from a plurality of available shard instances by the shard instance identifier. The sharded instance identifies a hash function and a storage device subset of a plurality of storage devices within the distributed storage network. A modification request for the data processing operation is generated for each storage device in the subset of storage devices. The modification request includes the data object identifier and a corresponding version of the plurality of versions of the data object. The modification request for the data processing operation is sent to each storage device of the subset of storage devices for execution.
In a first implementation of the non-transitory computer-readable medium, according to the third aspect, sending the modification request to each storage device of the subset of storage devices further causes a data object naming sequence to be generated at each storage device of the subset of storage devices that identifies the data object. The data object naming sequence includes the data object identifier, sharding information based on the sharding instance identifier, and the corresponding version of the plurality of versions of the data object based on the availability information.
In a second implementation form of the non-transitory computer-readable medium according to the third aspect or any implementation form of the third aspect, the instructions further cause the one or more processors to perform operations comprising: a confirmation is received from each storage device of the subset of storage devices indicating successful execution of the data processing operation associated with the data object, the confirmation including the data object naming sequence identifying the data object at the storage device. And sending the notice of successful execution to the network node. The notification includes the data object naming sequence identifying the data object at each storage device in the subset of storage devices.
Any of the above examples may be combined with any one or more of the other examples described above to create new embodiments within the scope of the present invention.
Drawings
In the drawings, like numerals may describe similar components throughout the different views. The drawings illustrate generally, by way of example and not by way of limitation, the various embodiments discussed herein.
FIG. 1 is a block diagram of a network architecture including a naming server that generates a data object naming sequence and a plurality of computing devices for accessing a plurality of network attached storage devices according to the data object naming sequence, in accordance with some embodiments;
FIG. 2 is a block diagram of an exemplary data object naming sequence, in accordance with some embodiments;
FIG. 3 is a communication flow diagram of exemplary communications associated with performing create data processing operations in a distributed storage network using a sequence of data object names and a sharding map to a plurality of storage devices, according to some embodiments;
FIG. 4A is a communication flow diagram of exemplary communications associated with performing additional data processing operations in a distributed storage network using a sequence of data object names and a sharded mapping to a single storage device, in accordance with some embodiments;
FIG. 4B is a communication flow diagram of exemplary communications associated with performing additional data processing operations in a distributed storage network using a sequence of data object names and a shard map to a single redundancy group of storage devices, in accordance with some embodiments;
FIG. 5 is a communication flow diagram of exemplary communications associated with performing read data processing operations in a distributed storage network using a sequence of data object names and a sharded mapping to a single data storage device, in accordance with some embodiments;
FIG. 6 is a communication flow diagram of exemplary communications associated with performing delete data processing operations in a distributed storage network using a sequence of data object names and a sharding map to a plurality of storage devices, in accordance with some embodiments;
FIG. 7 is a block diagram of performing an update to a hash value conversion table at a network attached storage device using a data object naming sequence in accordance with some embodiments;
FIG. 8A is a flowchart of a create data processing operation performed on a network attached storage device, according to some embodiments;
FIG. 8B is a flowchart of performing additional data processing operations on a network additional storage device, according to some embodiments;
FIG. 9 is a flowchart of a method for processing data in a distributed storage network in accordance with an exemplary embodiment;
FIG. 10 is a block diagram of a representative software architecture that may be used in connection with the various device hardware described herein in accordance with an exemplary embodiment;
fig. 11 is a block diagram of circuitry of an apparatus for implementing an algorithm and performing a method according to an example embodiment.
Detailed Description
It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and methods described in connection with fig. 1-11 may be implemented using any number of techniques, whether currently known or not. The invention is in no way limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
The following detailed description, taken in conjunction with the accompanying drawings, is a part of the description and shows, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the subject matter of the invention, and it is to be understood that other embodiments may be utilized and that structural, logical, and electrical changes may be made without departing from the scope of the present invention. The exemplary embodiments described below should therefore not be construed in a limiting sense, and the scope of the present invention is defined by the appended claims.
The terms "host," "host computing device," "client computing device," "computing device," and "computer device" are used interchangeably herein and refer to a network device for accessing network attached storage devices within a data processing network. The terms "network attached storage" and "storage" are used interchangeably herein.
Storage systems are often accompanied by processing inefficiencies associated with accessing network attached storage devices. Processing inefficiencies are due to the use of a traditional naming scheme in which each storage device namespace is identified by a namespace identifier (namespace identifier, NSID) selected from the set {0;1, a step of; 2; …) and are assigned sequentially as NSIDs for accessing namespaces. However, this naming scheme is not scalable, nor does it provide for unique and usable naming of data objects that are routed by network devices within a distributed storage network. Most distributed systems use naming services to map system-wide unique names to names local to the storage device. Typically, this approach means that each access will jump to the naming service additionally. Some systems use client caches to keep track of recently used mapping information in order to avoid extra hops to naming services, but maintaining consistency of such information between naming services and clients can result in additional communication and protocol overhead.
The techniques disclosed herein may be used for distributed naming schemes to access network-attached storage devices and to perform data processing operations associated with data objects stored at such storage devices. In some embodiments, a naming server is used to provide flexible naming services using a naming sequence of data objects to address data objects stored at different locations in a distributed storage network. More specifically, the data object naming sequence is long enough to globally provide a unique reference within the distributed storage network, allowing the naming sequence to be used within a separate storage device. In addition, the data object naming sequence includes partition information (e.g., hash algorithms used, and logical partitions/slices mapped to physical devices) about how the data object is partitioned within the distributed system. In addition, the data object naming sequence includes availability information that identifies multiple versions of the data object (e.g., the availability information may indicate that a copy exists, an erasure code version, etc.). In one exemplary embodiment, the data object naming sequence includes a data object name, shard information (e.g., information associated with a shard map used to select shard instances of one or more storage devices having stored data objects), and availability information. A distributed storage network using the disclosed naming scheme is outlined in connection with fig. 1. FIG. 2 shows a more detailed view of a data object naming sequence. Processing data processing operations using the data object naming sequence is described in more detail in connection with fig. 3-6 and 9-11. Managing data object naming sequences in separate network attached storage devices is discussed in connection with fig. 7-8B.
FIG. 1 is a block diagram of a network architecture 100 including a naming server that generates a data object naming sequence and a plurality of computing devices for accessing a plurality of network attached storage devices according to the data object naming sequence, in accordance with some embodiments. Referring to fig. 1, a network architecture 100 (which may also be referred to as a distributed storage network 100) includes a plurality of computer devices, such as computer devices (104, 106, … …, 108), for accessing storage devices (128, 130, 132, … …, 134) via network routers (112, 114), network switches (116, 118), and network interconnects 160 associated with a communication network 110. In some aspects, the storage device (128, 130, 132, … …, 134) is part of a corresponding computer device (120, 122, 124, … …, 126) communicatively coupled to the computer device (104, 106, … …, 108) via a network router (112, 114) and a network interconnect 160 associated with the communication network 110. The network interconnect 160 is part of the communication network 110, the communication network 110 including network routers (112, 114) and network switches (116, 118). The network interconnect 160 includes a fused ethernet based remote direct memory access (Remote Direct Memory Access, RDMA) (RDMAover Converged Ethernet, roCE) interface, a peripheral component interconnect express (Peripheral Component Interconnect express, PCIe) interface, or other type of interface.
The network architecture 100 also includes a naming server 102, the naming server 102 being communicatively coupled to the computer devices (104, … …, 108), the storage devices (128, … …, 134) and the network switches (116, 118). Naming server 102 includes suitable circuitry, logic, interfaces and/or code and is operable to perform the data object naming functions described herein. More specifically, naming server 102 is configured to generate a data object naming sequence 148 upon creation of a data object (e.g., using a create data processing operation 150, as described in connection with FIG. 3). In addition, naming server 102 is also configured to update data object naming sequence 148 in accordance with performing other data processing operations (e.g., read data processing operation 154, append data processing operation 156, or delete data processing operation 152). In one exemplary embodiment, the naming server 102 uses the shard map 144 to determine shard information associated with a data object naming sequence of data objects.
The term "sharded mapping" as used herein indicates the mapping of sharded instance identifiers to a set of storage devices and hash functions (collectively referred to as sharded instances). In this regard, sharding information, such as a sharding instance identifier, may be used as input to a sharding map to obtain a set of storage devices that may be used to perform one or more data processing operations. In other aspects, the shard information is a tuple (also referred to as a shard information tuple (shard information tuple, SIT)) comprising the shard instance identifier and an input value such as hash input information, the tuple being used as an input to a shard map to obtain a set of storage devices (e.g., a shard instance is selected using the shard instance identifier) and to select a particular storage device from the set (e.g., by applying a hash function to the hash input information). An exemplary communication flow associated with a data processing operation using a data object naming sequence and a sharding map is shown in connection with fig. 3-6.
In other embodiments, the network switch 116 is configured with a sharding map 146, and the sharding map 146 may be used to determine sharding information when performing additional data processing operations 156 or read data processing operations 154. In aspects where the network switch 116 is configured with a sharding map 146, the create data processing operations and delete data processing operations may be performed by the naming server 102 using the sharding map 144. Because the shard map 144 and the shard map 146 may be updated during execution of the data processing operation, the shard map 144 and the shard map 146 are synchronized with each other via the communication link 158.
Although fig. 1 illustrates naming server 102 and network switch 116 using a sharded mapping and data object naming sequence, the present invention is not limited in this respect and such functionality may be incorporated within a single computer device using a data object naming module that performs the disclosed functionality of naming server 102 and network switch 116. Fig. 10 and 11 illustrate an exemplary software architecture and computer device using such a data object naming module. Alternatively, other network switches besides network switch 116 (e.g., network switch 118) may be used to perform the disclosed functions associated with the distributed naming scheme using a data object naming sequence.
The computer devices 104-108 and 120-126 and naming server 102 may be mobile devices or other types of computing devices, such as computing nodes or storage nodes in the network architecture 100. An exemplary software architecture and computer device is shown in connection with fig. 10 and 11, which may represent any of naming server 102 and computer devices 104-108 and 120-126 for performing one or more aspects of the disclosed technology. The storage devices (128, 130, 132, … …, 134) include Solid State Drives (SSDs), hard Drives (HD), or other types of storage devices, and are used to store data objects accessed by the naming server 102 or computer device (104, … …, 108) via the network switch 116 and the network interconnect 160. For example, different copies of a data object are stored at different storage devices for redundancy or error encoding purposes, as indicated by availability information in a corresponding data object naming sequence associated with the data object.
As shown in fig. 1, three Erasure Coding (EC) versions of data object D1 (e.g., data portions D1.e1/2, D1.e2/2, and parity bits D1. Ep) are stored in corresponding storage devices 128, 130, and 132 and accessed via network switch 116. In this regard, the availability information for data object D1 includes 2+1EC information reflected by the suffix E1/2 (e.g., data portion 1/2), the suffix E2/2 (e.g., data portion 2/2), and the suffix EP (e.g., parity bit portion).
Two duplicate versions of data object D2 (e.g., D2.R1 and D2. R2) are stored in corresponding storage devices 130 and 132 and accessed via network switch 116. In this regard, the availability information for data object D2 includes copy (or copy) information reflected by suffix R1 (e.g., a first copy of object D2) and suffix R2 (e.g., a second copy of object D2).
Further, two duplicate versions of data object D3 (e.g., D3.R1 and D3. R2) are stored in corresponding storage devices 132 and 134 and accessed via network switch 116. In this regard, the availability information for data object D3 includes copy (or copy) information reflected by suffix R1 (e.g., a first copy of object D3) and suffix R2 (e.g., a second copy of object D3).
In one exemplary embodiment, the storage devices (128, 130, 132, … …, 134) include corresponding storage controllers (136, 138, 140, … …, 142). Each of the memory controllers 136-142 comprises suitable circuitry, logic, interfaces, and/or code and is operable to manage access to the memory devices 128-134, including configuring and managing operations for handling data entries related to ordered accesses to data objects, and managing operations associated with the distributed naming scheme described herein. For example, each storage controller may include a translation table (e.g., as shown in FIG. 7) for managing a sequence of data object names associated with data objects stored at the corresponding storage device associated with the storage controller.
FIG. 2 is a block diagram of an exemplary data object naming sequence 148 in accordance with some embodiments. Referring to FIG. 2, data object naming sequence 148 includes data object naming sequences (data object naming sequence, DONS) 200, … …, DONS201.DONS200 includes data object identifier 202 (e.g., data object name), sharding information 204, and availability information 206. Availability information 206 may include one of the following: erasure code information 208, replica information 210, reliability information 212, and service level agreement (service level agreement, SLA) information 214. Similarly, DONS201 includes data object name 216, sharding information 218, and availability information 220. Availability information 220 may include one of the following: erasure code information 222, replica information 224, reliability information 226, and SLA information 228.
As described above in connection with fig. 1, the shard information 204 may include a shard instance identifier that is used as an input to a shard map to obtain a shard instance having a set of storage devices that may be used to perform one or more data processing operations. In other aspects, the shard information 204 is a shard information tuple (shard information tuple, SIT) that includes a shard instance identifier and an input value, such as hash input information. The input value is used as an input to a shard map to obtain a set of storage devices (e.g., a shard instance is selected using a shard instance identifier) and to select a particular storage device from the set (e.g., by applying a hash function to the hash input information).
Availability information 206, used as part of the DONS, is used to describe the relationship between different versions of data objects residing on multiple storage devices that belong to the same logical data object identified by data object identifier 202. In some embodiments, the availability information portion of the DONS may be used to provide availability in the event of a device failure and facilitate data recovery. Erasure code information 208 is described in connection with fig. 1, erasure code information 208 may include different suffixes (e.g., EC1/2, EC2/2, ECP/2, ECQ/2) that indicate whether a particular data object version is a data portion or a parity bit portion, depending on the EC type used in creating the data object. Copy information 210 is described in connection with fig. 1, copy information 210 may include suffixes (e.g., R1/3, R2/3, R3/3) that indicate the total number of copies and a particular number of copies in the total number of copies (e.g., suffix R1/3 indicates that the data object associated with the don having the availability information suffix is the first copy in the total of three copies).
The reliability information 212 indicates the reliability of the storage device used to access the particular data object. For example, a Hard Disk Drive (HDD) may be associated with a reliability of 0.99999s (or 5_9s), and a data object stored at such HDD may have availability information denoted as suffix 5_9s. Solid-state drive (SSD) may be associated with reliability of 0.9999999s (or 7_9s), and data objects stored at such SSD may have availability information denoted as suffix 7_9s.
SLA information 214 indicates delays associated with accessing particular data objects associated with the DONS using the SLA information. For example, the SLA information may include the following SLA suffixes: 10us, 100us, 1ms, and 10ms, which indicate access delays associated with particular data objects.
The following is an example illustrating the use of DONS in the disclosed distributed naming scheme. An exemplary DONS may be D1.Shard1_X.R1/3, which may be used to determine the following information: (a) the data object name is D1; (b) The fragmentation information comprises a SIT consisting of a fragmentation instance identifier (Shard 1) and hash input information X; (c) The availability information is R1/3, indicating that the particular data object associated with the DONS is the first copy (or copy) of the total of three copies of the data object.
Although FIG. 2 shows four different types of availability information 206, the invention is not limited in this respect and other types of availability information may be used.
FIG. 3 is a communication flow diagram 300 of exemplary communications associated with performing create data processing operations in a distributed storage network using a sequence of data object names and a sharding map to a plurality of storage devices, according to some embodiments. Referring to FIG. 3, communication flows occur between computer device 104, naming server 102, and storage devices 128, 130, and 132.
Initially, the computer device 104 sends a request 302 for a data processing operation, such as creating a data processing operation, that includes a data object identifier (e.g., a data object name) for the data object, a shard instance identifier, and availability information identifying multiple versions of the data object. For example, the availability information may indicate that a total of three copies should be used when creating data objects at multiple storage devices.
Naming server 102 includes a shard map 144 having shard instances 316. Each of the sharded instances 316 corresponds to a subset of storage devices and a hash function of a set of available storage devices (e.g., a set of available storage devices 128-134). For example, the first SHARD instance of SHARD instance 316 is "shard_1:a, B, C, sha256," which indicates that the SHARD instance identifier shard_1 corresponds to a subset of storage devices a (e.g., storage device 128), B (e.g., storage device 130), and C (e.g., storage device 132), and hash function SHA256. Exemplary uses of the hash function for selecting a single storage device from a subset of storage devices are discussed in connection with fig. 4A-5.
Naming server 102 generates a modification (or update) request 304 for a data processing operation (e.g., a modified create data processing operation) for each storage device in the subset of storage devices based on the shard instance identified by the shard instance identifier. More specifically, the original request 302 to create a data processing operation is modified to include the DONS of the corresponding version of the data object to be created at each storage device in the subset of storage devices.
For example, the original request 302 to create a data processing operation includes a data object identifier (e.g., a data object name) D1, a SHARD instance identifier shard_1, and availability information (e.g., R3) identifying multiple versions of the data object (e.g., a total of three copies of the data object) when the data object is created at a different storage device. The naming server 102 uses the SHARD map 144 and selects the SHARD instance "shard_1:a, b, c, sha256" based on the SHARD instance identifier shard_1. From the sharded instance, naming server 102 determines that storage 128, 130, 132 is to be used to perform a requested data processing operation (e.g., create a data processing operation) associated with a data object identifier. Naming server 102 generates a modification request for transmission to each of storage devices 128-132. For example, the modification request 304 to the storage device 128 to create a data processing operation may include a DONS with a data object identifier D1 and a shard instance identifier. In addition, similar modification requests are generated for storage devices 130 and 132.
A modification request to create a data processing operation is sent to storage devices 128, 130, and 132 via corresponding communication links 306, 308, and 310, respectively. The modification request 304 is performed at each storage device and a consensus operation 312 is performed between the storage devices 128-132. A notification 314 of the result of the consensus operation 312 is sent back to the naming server 102. In some aspects, notification 314 is an acknowledgement sent by each of storage devices 128-132 separately indicating that the modification request was successfully performed (e.g., confirmed to be successfully performed) or was not successfully performed (e.g., confirmed to be not successfully performed). In other aspects, the storage devices 128-132 communicate with each other during the consensus operation 312, and a single notification 314 indicating successful or unsuccessful execution of the modification request at each storage device is sent back to the naming server 102 by one of the storage devices in the subset. The naming server 102 forwards (e.g., sends) the received notification 314 to the computer device 104 as a notification 318, wherein the original request 302 to create the data processing operation originated from the computer device 104.
The modification request to create the data processing operation at the separate storage device may be performed because there are already duplicate data objects at the storage device that have the same DONS (e.g., as described in connection with fig. 7).
FIG. 4A is a communication flow diagram 400A of exemplary communications associated with performing additional data processing operations in a distributed storage network using a sequence of data object names and a sharded mapping to a single storage device, according to some embodiments. Referring to fig. 4A, communication flows occur between computer device 104, network switch 116, and storage devices 128, 130, and 132.
Initially, the computer device 104 sends a request 402 for a data processing operation, such as an additional data processing operation, that includes a data object identifier (e.g., a data object name) of the data object, a SIT (including a shard instance identifier and hash input information), availability information (identifying a version of the data object from among multiple versions of the data object), and data for the additional data object.
The network switch 116 includes a shard map 146 having shard instances, similar to the shard map 144 shown in fig. 3. Each of the sharded instances of sharded map 146 corresponds to a subset of storage devices and a hash function of a set of available storage devices (e.g., a set of available storage devices 128-134). For example, the first SHARD instance of SHARD map 146 is "shard_1:a, B, C, sha256," which indicates that SHARD instance identifier shard_1 corresponds to a subset of storage devices a (e.g., storage device 128), B (e.g., storage device 130), and C (e.g., storage device 132), and hash function SHA256. In operation 404, the network switch 116 performs a shard mapping based on the shard instance identifier and the hash input information received with the request 402. More specifically, the shard instance identifier is mapped to the first shard instance and a hash function (e.g., SHA 256) of the first shard instance is applied to the hash input information to obtain storage device a (e.g., storage device 128) as a receiving device for the additional data processing operations of request 402.
The network switch 116 generates a modification (or update) request 406 for a data processing operation (e.g., a modified additional data processing operation) that is sent to the storage device 128. More specifically, original request 402 for additional data processing operations is modified to include the DONS of the data object to be appended at storage device 128. For example, the original request 402 for the additional Data processing operation includes a Data object identifier (e.g., data object name) D1, a SIT including a SHARD instance identifier SHARD_1 and hash input information X, and Data for the additional Data (e.g., data 1). The network switch 116 uses the SHARD map 146 and selects the SHARD instance "shard_1:a, b, c, sha256" according to the SHARD instance identifier shard_1. According to this sharding example, by applying a hash function to the hash input information X, the network switch 116 determines that the storage device 128 is to be used to perform a requested data processing operation (e.g., an additional data processing operation) associated with the data object identifier. The network switch 116 generates a modification request 406 for transmission to the storage device 128. The modification request 406 for the additional data processing operation of the storage device 128 includes DONS (e.g., { SHARD_1, X }) with the data object identifier D1 and SIT.
A modification request 406 for additional data processing operations is sent to the storage device 128 for execution. The modification request 406 is executed at the storage device 128 and a notification 408 of the execution result is sent back to the network switch 116. The network switch 116 forwards the received notification 408 to the computer device 104 as a notification 410, wherein the original request 402 for additional data processing operations originated from the computer device 104.
FIG. 4B is a communication flow diagram 400B of exemplary communications associated with performing additional data processing operations in a distributed storage network using a sequence of data object names and a shard map to separate redundancy groups of storage devices, according to some embodiments. Referring to fig. 4B, communication flows occur between computer device 104, network switch 116, and Redundancy Groups (RG) 412, 414, and 416 of storage devices. The RG1 412 includes storage devices A, B, C and D. The RG2 414 includes storage devices E, F, G and H. The RG3 416 includes storage devices I, J, K and L. The storage devices a-L may be any of the storage devices of the network architecture 100 shown in fig. 1.
Initially, the computer device 104 sends a request 418 for a data processing operation, such as an additional data processing operation, that includes a data object identifier (e.g., a data object name) for the data object, a SIT (including a shard instance identifier and hash input information), availability information (identifying a version of the data object from among multiple versions of the data object), and data for the additional data object. For example, the availability information may include a suffix EC1/2 indicating that additional data processing operations should be applied to a first erasure code version of a data object of two available erasure code versions representing erasure code data.
The network switch 116 includes a sharded map 147 that is associated with a mapping to an RG (rather than a single storage device). Each of the sharded instances of the sharding map 147 corresponds to a subset of the available storage devices that form an RG (e.g., a set of available storage devices 128-134), a hash-to-RG mapping, and a hash function.
For example, the shard instances of shard map 147 include hash functions, hash-to-RG mappings, and RGs. An exemplary sharding instance of sharding map 147 is shown in table 1 below.
TABLE 1
Examples Function of Hash->Mapping of RGs {RG}
1 CLHASH.Key=X mod 3 {RG1,RG2,RG3}
2 Murmur,Key=Y mod 4 {RG4,RG5,RG6,RG7}
Each RG in the sharded instance may indicate a redundancy algorithm and a list of storage devices. An exemplary RG is shown in table 2 below.
TABLE 2
RG Redundancy algorithm Storage device
RG1 2+2EC {A,B,C,D}
RG2 2+2EC {E,F,G,H}
RG3 2+2EC {I,J,K,L}
RG4 3 copies {A,B,C}
RG5 3 copies {D,E,F}
RG6 3 copies {G,H,I}
RG7 3 copies {J,K,L}
At operation 420, the network switch 116 performs a shard mapping based on the shard instance identifier and the hash input information received with the request 418. More specifically, the shard instance identifier is mapped to the first shard instance and a hash function (e.g., SHA 256) of the first shard instance is applied to the hash input information to obtain RG1 as the received RG for the additional data processing operations of request 418.
The network switch 116 generates a modification (or update) request 422 for data processing operations (e.g., modified additional data processing operations) that is sent to all storage devices in the RG1 412. More specifically, the original request 418 for additional data processing operations is modified to include the DONS of the data object to be appended at the storage devices A, B, C and D in RG1 412. The modification request 422 for the additional data processing operation is sent to the storage devices A, B, C and D in the RG1 412 for execution. The modification request 422 is executed at the storage devices in the RG1 412 and a notification 424 of the execution result at each storage device is sent back to the network switch 116. The network switch 116 forwards the received notification 424 to the computer device 104 as a notification 426, wherein the original request 418 for additional data processing operations originated from the computer device 104.
FIG. 5 is a communication flow diagram 500 of exemplary communications associated with performing read data processing operations in a distributed storage network using a sequence of data object names and a sharded mapping to a single data storage device, according to some embodiments. Referring to fig. 5, communication flows occur between computer device 104, network switch 116, and storage devices 128, 130, and 132.
Initially, the computer device 104 sends a request 502 for a data processing operation, such as a read data processing operation, that includes a data object identifier (e.g., a data object name) of the data object, a SIT (including a shard instance identifier and hash input information), availability information (identifying a version of the data object from among multiple versions), and an offset indicating a starting memory location for the read data processing operation.
The network switch 116 includes a shard map 146 having shard instances, similar to the shard map 144 shown in fig. 3. Each of the sharded instances of sharded map 146 corresponds to a subset of storage devices and a hash function of a set of available storage devices (e.g., a set of available storage devices 128-134). For example, the first SHARD instance of SHARD map 146 is "shard_1:a, B, C, sha256," which indicates that SHARD instance identifier shard_1 corresponds to a subset of storage devices a (e.g., storage device 128), B (e.g., storage device 130), and C (e.g., storage device 132), and hash function SHA256. In operation 504, the network switch 116 performs a shard mapping based on the shard instance identifier and the hash input information received with the request 502. More specifically, the shard instance identifier is mapped to the first shard instance and a hash function (e.g., SHA 256) of the first shard instance is applied to the hash input information to obtain storage device B (e.g., storage device 130) as a receiving device for the read data processing operation of request 502.
The network switch 116 generates a modification (or update) request 506 for a data processing operation (e.g., a modified additional data processing operation) that is sent to the storage device 130. More specifically, original request 502 for a read data processing operation is modified to include the DONS of the data object read at storage device 130. For example, the original request 502 for a read data processing operation includes a data object identifier (e.g., data object name) D1, a SIT including a SHARD instance identifier of SHARD_1 and hash input information X, availability information (e.g., R1/2), and an Offset (e.g., offset 1) for performing the read. The network switch 116 uses the SHARD map 146 and selects the SHARD instance "shard_1:a, b, c, sha256" according to the SHARD instance identifier shard_1. According to this sharding example, by applying a hash function to the hash input information X, the network switch 116 determines that the storage device 130 is to be used to perform the requested read data processing operation associated with the data object identifier. The network switch 116 generates a modification request 506 for transmission to the storage device 130. The modification request 506 for the read data processing operation of the storage device 130 includes DONS (e.g., { SHARD_1, X }) with the data object identifier D1 and SIT.
A modification request 506 for a read data processing operation is sent to the storage device 130 for execution. The modification request 506 is executed at the storage device 130 and a notification 508 of the execution result is sent back to the network switch 116. The network switch 116 forwards the received notification 508 to the computer device 104 as a notification 510, wherein the original request 502 for additional data processing operations originated from the computer device 104.
FIG. 6 is a communication flow diagram 600 of exemplary communications associated with performing delete data processing operations in a distributed storage network using a sequence of data object names and a sharding map to a plurality of storage devices, according to some embodiments. Referring to fig. 6, communication flows occur between computer device 104, naming server 102, and storage devices 128, 130, and 132.
Initially, the computer device 104 sends a request 602 for a data processing operation, such as a delete data processing operation, that includes a data object identifier (e.g., a data object name) of the data object, a shard instance identifier, and availability information (identifying a version of the plurality of versions of the data object to be deleted). For example, the availability information may include a suffix R1/2 indicating that a delete data processing operation should be applied to a first copy version of a data object in a total of two copies. In some aspects, if availability information is omitted from request 602, data processing operations are applied to all storage devices obtained after performing the shard mapping according to the shard instance identifier.
Naming server 102 includes a shard map 144. Each of the sharded instances of sharded map 144 corresponds to a subset of storage devices and a hash function of a set of available storage devices (e.g., a set of available storage devices 128-134). For example, the first SHARD instance of SHARD map 144 is "shard_1:a, B, C, sha256," which indicates that SHARD instance identifier shard_1 corresponds to a subset of storage devices a (e.g., storage device 128), B (e.g., storage device 130), and C (e.g., storage device 132), and hash function SHA256. Naming server 102 performs a shard mapping based on the shard instance identifier received with request 602. More specifically, the shard instance identifier is mapped to the first shard instance to obtain the storage device A, B, C (e.g., storage devices 128-130) as a receiving device for the delete data processing operation of request 602.
At operation 604, the naming server 102 generates a modification (or update) request to delete data processing operations for each of the storage devices 128-132. A modification request to delete a data processing operation is sent to storage devices 128, 130, and 132 for execution via corresponding communication links 606, 608, and 610, respectively. After the delete data processing operation is performed, a consensus operation 612 is performed between the storage devices 128-132 to determine a result of the execution at each storage device that is received at the naming server 102 as a notification 614. Naming server 102 forwards received notification 614 to computer device 104 as notification 616, wherein original request 602 for additional data processing operations originated from computer device 104.
FIG. 7 is a block diagram 700 of performing an update to a hash value conversion table at a network attached storage device using a data object naming sequence in accordance with some embodiments. Referring to fig. 7, a storage controller of a network attached storage device (which may be any of the storage devices of network architecture 100 in fig. 1) generates and maintains a hash value conversion table 708.
In operation, a request for a data processing operation (e.g., one of the data processing operations described in connection with fig. 3-6) arrives at a storage device and the DONS 702 is detected as part of the request. In some embodiments, the DONS 702 includes a data object identifier 704 (e.g., a data object name), sharding information (which may include a sharding instance identifier or SIT), and availability information. The storage controller applies a hash function to the data object identifier 704 to obtain a hash value 706 and uses a comparator 710 to determine if the obtained hash value 706 (and any remaining portions of the DONS 702, such as fragments or availability information) are present in the hash value conversion table 708.
As shown in fig. 7, the hash value conversion table 708 includes a plurality of rows arranged according to Hash Values (HV) of HV 712, 714, 716, … …, 718, and the like. For a given HV, the rows in hash value conversion table 708 include DONS and physical address information for the storage device storing the version of the data object associated with DONS. Since the same data object identifier may be used in conjunction with formation or availability information, a given HV may include multiple rows in hash value conversion table 708 according to different DONS. For example, HV1 712 is associated with DONS 720 (for the corresponding data object stored at physical location 724) and DONS 722 (for the corresponding data object stored at physical location 726). Similarly, HV2 714 is associated with DONS 728 (for the corresponding data object stored at physical location 730), HV3 716 is associated with DONS 732 (for the corresponding data object stored at physical location 734), and HV_N 718 is associated with DONS 736 (for the corresponding data object stored at physical location 738). Example create and additional data processing operations performed at a storage device are shown in fig. 8A and 8B.
FIG. 8A is a flowchart 800A of a create data processing operation performed on a network attached storage device, according to some embodiments. Referring to fig. 8A, flowchart 800A includes operations 802, 804, 806, 808, 810, and 812 that may be performed by a storage controller of any one of storage devices 128, 130, and 132 receiving a request to create a data processing operation (data processing operation, DPO). In operation 802, a request to create a DPO is received that includes a DONS sent by a naming server. In operation 804, a hash value is determined using a data object identifier (e.g., a data object name) within the received DONS (e.g., as described in connection with fig. 7). In operation 806, a determination is made as to whether the determined hash value is associated with a conflict in a hash value conversion table at the storage device. For example, if there are already entries in the hash value conversion table that have the same hash value, then a conflict may exist. If the determined hash value is associated with a conflict, then in operation 808 the hash value is indicated as invalid for the consensus operation result and processing continues in operation 812. If the determined hash value is not associated with a conflict, then in operation 810 the hash value is indicated as valid for the consensus operation result. In operation 812, a consensus is performed between the storage devices receiving the DPO created using the determined consensus operation result. The result notification of the consensus operation result is sent back to the naming server, which sends the create DPO to the storage devices 128, 130 and 132.
FIG. 8B is a flowchart 800B of performing additional data processing operations on a network additional storage device, according to some embodiments. Referring to fig. 8B, flowchart 800B includes operations 814, 816, 818, 820, and 822, which may be performed by a storage controller of storage device 128 receiving a request for an additional data processing operation (data processing operation, DPO), as described in connection with fig. 4A.
At operation 814, a request for additional DPO is received at the storage device 128, wherein the request includes the corresponding DONS and the data to be processed. In operation 816, a hash value (e.g., as described in connection with fig. 7) is determined using the data object identifier (e.g., data object name) within the received DONS and a determination is made as to whether the hash value is present in a hash value conversion table at the storage device. If the hash value does not exist, then in operation 822 an indicator of the Invalid data object identifier (e.g., an Invalid_Name indicator) is added to a completion queue that includes status information for commands executed by the memory controller. If the hash value exists in the hash value conversion table at the storage device, then in operation 818, a determination is made as to whether there is a corresponding matching entry in the hash value conversion table with matching SIT and availability information (as occurs in the received DONS). If there is no matching entry in the hash value conversion table, processing continues in operation 822. If there is a matching entry in the hash value conversion table, processing continues in operation 820 when adding an additional DPO for appending a corresponding data object associated with DONS to a list of pending data entries having DPOs to be processed for execution at the storage device.
Fig. 9 is a flowchart 900 of a method for processing data in a distributed storage network according to an exemplary embodiment. Referring to fig. 9, method 900 includes operations 902, 904, 906, and 908, which may be performed by a memory controller of any of the memory devices shown in fig. 3-6. In operation 902, a request for a data processing operation is detected originating from a network node in a distributed storage network. For example, with respect to FIG. 3, naming server 102 receives a request 302 from computer device 104 to create a data processing operation. The request includes a data object identifier for a data object, a shard instance identifier, and availability information identifying multiple versions of the data object. For example, request 302 includes a data object identifier (e.g., D1), a SHARD instance identifier (e.g., SHARD_1), and availability information (e.g., availability information may indicate a total of three copies to be generated for a data object during a create data processing operation).
At operation 904, a shard instance is selected from a plurality of available shard instances by a shard instance identifier. The shard instance identifies a hash function and a subset of storage devices of the set of available storage devices within the distributed storage network. For example, naming server 102 selects a shard instance by way of shard map 144 and a shard instance identifier that identifies a subset of hash functions (e.g., SHA 256) and storage devices (e.g., storage devices 128, 130, and 132).
At operation 906, a modification request for the data processing operation is generated for each storage device in the subset of storage devices. The modification request includes the data object identifier and a corresponding version of the plurality of versions of the data object. For example, naming server 102 generates a modification request 304 to create a data processing operation.
At operation 908, a modification request for the data processing operation is sent to each storage device in the subset of storage devices for execution. For example, a modification request 304 to create a data processing operation is sent to storage devices 128, 130, and 132 for execution via communication links 306, 308, and 310, respectively.
FIG. 10 is a block diagram of a representative software architecture that can be used in connection with the various device hardware described herein in accordance with an exemplary embodiment. Fig. 10 is merely a non-limiting example of a software architecture 1002, it being understood that many other architectures may be implemented to facilitate the functionality described herein. The software architecture 1002 executes on hardware such as any of the computer devices 104-108, 120-126 and naming server 102 shown in FIG. 1. The naming server 102 may be identical to the device 1100 shown in FIG. 11, the device 1100 including a processor 1105, memory 1110, storage devices 1115 and/or 1120, and I/O interfaces 1125 and 1130, etc.
A representative hardware layer 1004 is shown, the representative hardware layer 1004 may represent a device 1100 such as that shown in fig. 11. The representative hardware layer 1004 includes one or more processing units 1006, the one or more processing units 1006 having associated executable instructions 1008. Executable instructions 1008 represent executable instructions of software architecture 1002, including implementations of the methods, modules, etc. shown in fig. 1-9. The hardware layer 1004 also includes a memory or storage module 1010 that also has executable instructions 1008. The hardware layer 1004 may also include other hardware 1012 that represents any other hardware of the hardware layer 1004, such as the other hardware shown as part of the apparatus 1100 shown in fig. 11. The memory or storage module 1010 may include a storage device (e.g., at least one of the storage devices 128-134 with corresponding storage controllers 136-142).
In the exemplary architecture shown in fig. 10, the software architecture 1002 may be conceptualized as a layer stack, with each layer providing specific functionality. For example, the software architecture 1002 may include layers such as an operating system 1014, libraries 1016, framework/middleware 1018, applications 1020, and presentation layer 1044. In operation, an application 1020 or other component within each layer may call an application programming interface (application programming interface, API) call 1024 through a software stack, receive a response, return value, etc. as indicated by message 1026 in response to the API call 1024. The layers shown in fig. 10 are representative in nature and not all software architectures 1002 have all layers. For example, some mobile or dedicated operating systems may not provide the framework/middleware 1018, while others may provide such a layer. Other software architectures may include additional or different layers.
The operating system 1014 may manage hardware resources and provide common services. For example, operating system 1014 may include kernel 1028, services 1030, and drivers 1032. The core 1028 may act as an abstraction layer between hardware and other software layers. For example, the core 1028 may be responsible for memory management, processor management (e.g., scheduling), component management, networking, security settings, and the like. The service 1030 may provide other common services for other software layers. The driver 1032 may be responsible for controlling or interacting with the underlying hardware. For example, drivers 1032 may include a display driver, a camera driver, a bluetooth driver, a flash memory driver, a serial communication driver (e.g., universal serial bus (Universal Serial Bus, USB) driver), a wireless fidelity driver, an audio driver, a power management driver, and so forth, depending on the hardware configuration.
Library 1016 may provide a common infrastructure that may be used by applications 1020 or other components or layers. Library 1016 generally allows other software modules to perform tasks in an easier manner than interacting directly with the functionality of the underlying operating system 1014 (e.g., kernel 1028, services 1030, and/or drivers 1032). The library 1016 may include a system library 1034 (e.g., a C-standard library) that may provide functions such as memory allocation functions, string manipulation functions, mathematical functions, and the like. In addition, libraries 1016 may include API libraries 1036 such as media libraries (e.g., libraries that support presentation and manipulation of various media formats such as MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.), graphics libraries (e.g., openGL framework that may be used to render 2D and 3D of graphics content on a display), databases (e.g., SQLite that may provide various relational database functions), web libraries (e.g., webKit that may provide web browsing functions), etc. The library 1016 may also include a variety of other libraries 1038 to provide many other APIs to applications 1020 and other software components/modules.
The framework/middleware 1018 (also sometimes referred to as middleware) may provide a higher level of public infrastructure that may be used by applications 1020 or other software components/modules. For example, the framework/middleware 1018 may provide various graphical user interface (graphic user interface, GUI) functions, advanced resource management, advanced location services, and the like. The framework/middleware 1018 may provide a wide range of other APIs that may be used by applications 1020 or other software components/modules, some of which may be specific to a particular operating system 1014 or platform.
Applications 1020 include built-in applications 1040 and third party applications 1042. Examples of representative built-in applications 1040 may include, but are not limited to, a contacts application, a browser application, a book reader application, a location application, a media application, a messaging application, or a gaming application. Third party application 1042 may include any of a variety of built-in applications 1040 and various other applications. In a particular example, third party application 1042 (e.g., an entity other than a particular platform provider uses Android TM Or iOS TM The application developed by the software development kit (software development kit, SDK) may be, for example, in the iOS TM 、Android TMMobile software running on a mobile operating system such as a Phone or other mobile operating system. In this example, third party application 1042 may call API call 1024 provided by a mobile operating system, such as operating system 1014, to implement the functionality described herein.
In some embodiments, the application 1020 includes a data object naming module 1060 that may be used to perform the functions of the naming server 102 and network switch 116 discussed herein in connection with fig. 1-9. In other embodiments, the data object naming module 1060 may be used as part of the operating system 1014, or may be firmware as part of the memory/storage module 1010 in the hardware layer 1004.
The application 1020 may utilize built-in operating system functions (e.g., kernel 1028, services 1030, and drivers 1032), libraries (e.g., system libraries 1034, API libraries 1036, and other libraries 1038), and framework/middleware 1018 to create a user interface to interact with a user of the system. Alternatively or additionally, in some systems, interaction with the user may be achieved through a presentation layer, such as presentation layer 1044. In these systems, the application/module "logic" may be separate from aspects of the application/module that the user interacts with.
Some software architectures may use virtual machines. In the example of fig. 10, virtual machine 1048 is illustrated. The virtual machine creates a software environment in which applications/modules may execute as if they were executing on a hardware machine (e.g., device 1100 shown in fig. 11). Virtual machine 1048 is hosted by a host operating system (e.g., operating system 1014), and typically (but not always) has a virtual machine monitor 1046, virtual machine monitor 1046 managing the operation of virtual machine 1048 and the interface with the host operating system (i.e., operating system 1014). The software architecture 1002 runs within a virtual machine 1048, such as an operating system 1050, libraries 1052, framework/middleware 1054, applications 1056, or presentation layer 1058. These layers of the software architecture running within virtual machine 1048 may be the same as the corresponding layers previously described, or may be different.
Fig. 11 is a block diagram of circuitry of an apparatus for implementing an algorithm and performing a method according to an example embodiment. All components need not be used in the various embodiments. For example, the client, server, and cloud-based network devices may each use a different set of components, or in the case of a server, a larger storage device.
One exemplary computing device in the form of a computer 1100 (also referred to as computing device 1100, computer system 1100, or computer 1100) may include a processor 1105, memory 1110, removable storage 1115, non-removable storage 1120, input interface 1125, output interface 1130, and communication interface 1135, all connected by a bus 1140. While an exemplary computing device is illustrated and described as computer 1100, in different embodiments, the computing device can take different forms.
The memory 1110 may include a volatile memory 1145 and a nonvolatile memory 1150, and may store programs 1155. Computing device 1100 may include or access a computing environment. The computing environment includes a variety of computer-readable media such as volatile memory 1145, nonvolatile memory 1150, removable storage 1115, and non-removable storage 1120. Computer storage devices include random access memory (random access memory, RAM), read Only Memory (ROM), erasable programmable read only memory (erasable programmable read-only memory, EPROM), electrically erasable programmable read only memory (electrically erasable programmable read-only memory, EEPROM), flash memory or other memory technology, compact disc read-only memory (CD ROM), digital versatile disks (digital versatiledisk, DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer readable instructions.
Computer readable instructions stored on a computer readable medium (e.g., program 1155 stored in memory 1110) may be executed by processor 1105 of computing device 1100. Hard disk drives, CD-ROMs, and RAMs are some examples of components comprising non-transitory computer readable media, such as storage devices. The terms "computer-readable medium" and "storage device" do not include a carrier wave in which case the carrier wave is considered too brief. "non-transitory computer readable medium" includes all types of computer readable media including magnetic storage media, optical storage media, flash memory media, and solid state storage media. It should be understood that the software may be installed in a computer and sold with it. Alternatively, the software may be obtained and loaded into a computer, including obtaining the software through a physical medium or a distributed system, including for example, obtaining the software from a server owned by the software creator or from a server not owned but used by the software creator. For example, the software may be stored in a server for distribution over the internet. The terms "computer-readable medium" and "machine-readable medium" are used interchangeably herein.
In some embodiments, program 1155 may utilize a data object naming module 1160, which may be used to perform the functions of naming server 102 and network switch 116 discussed herein in connection with fig. 1-9.
Any one or more of the modules described herein may be implemented using hardware (e.g., a processor of a machine, an application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA), or any suitable combination thereof). Furthermore, any two or more of these modules may be combined into a single module, and the functionality of a single module described herein may be subdivided among multiple modules. Furthermore, according to various exemplary embodiments, modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.
Although a few embodiments have been described in detail above, other modifications are possible. For example, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. Other steps may be provided to, or steps in, the described flows, and other components may be added to, or components of the described systems. Other embodiments may be within the scope of the following claims.
It should also be appreciated that software comprising one or more computer-executable instructions that facilitate the processing and operation of any or all of the steps described above with respect to the present invention may be installed on and sold with one or more computing devices consistent with the present invention. Alternatively, the software may be obtained and loaded into one or more computing devices, including obtaining the software through a physical medium or distribution system, including for example, obtaining the software from a server owned by the software creator or from a server not owned but used by the software creator. For example, the software may be stored in a server for distribution over the internet.
Furthermore, it will be appreciated by persons skilled in the art that the invention is not limited in its application to the details of construction and the arrangement of components set forth in the description or illustrated in the drawings. The embodiments herein are capable of other embodiments and of being practiced or of being carried out in various ways. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising," or "having" and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. The terms "connected," "coupled," and "mounted," and variations thereof, as used herein, are used broadly and encompass both direct and indirect connections, couplings, and mountings, unless otherwise limited. Furthermore, the terms "connected" and "coupled" and their variants are not limited to physical or mechanical connections or couplings. Further, the terms "upward", "downward", "bottom" and "top" are relative, for purposes of illustration only, and are not limited thereto.
The components of the illustrative devices, systems, and methods employed by the illustrated embodiments may be implemented at least in part in digital electronic circuitry, analog electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. For example, these components may be implemented as a computer program product, such as a computer program, program code, or computer instructions tangibly embodied in an information carrier or in a machine-readable storage device for execution by, or to control the operation of, data processing apparatus, such as a programmable processor, a computer, or multiple computers.
A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be run on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network. Furthermore, functional programs, codes, and code segments for accomplishing the techniques described herein may be easily construed as being within the scope of claims by those skilled in the art to which the techniques described herein pertains. Method steps associated with the illustrative embodiments may be performed by one or more programmable processors executing a computer program, code, or instructions to perform functions (e.g., by operating on input data or generating output). Method steps may also be performed by, and means for performing, special purpose logic circuitry, e.g., a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (digital signalprocessor, DSP), an ASIC, an FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the general purpose processor may be any processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. Elements of a computer system include a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from and/or transfer data to, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable ROM (electrically erasable programmable ROM, EEPROM), flash memory devices, or data storage disks (e.g., magnetic disks, internal hard disks, removable disks, magneto-optical disks, or CD-ROM and DVD-ROM disks). The processor and the memory may be supplemented by, or incorporated in, special purpose logic circuitry.
Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
As used herein, a "machine-readable medium" (or "computer-readable medium") includes a device capable of temporarily or permanently storing instructions and data, which may include, but is not limited to, random-access Memory (RAM), read-Only Memory (ROM), cache Memory, flash Memory, optical media, magnetic media, cache Memory, other types of storage devices (e.g., erasable programmable read-Only Memory (EEPROM)), or any suitable combination thereof. The term "machine-readable medium" shall be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) that are capable of storing the processor instructions. The term "machine-readable medium" shall also be taken to include any medium or combination of media that is capable of storing instructions for execution by one or more processors such that the instructions, when executed by the one or more processors, cause the one or more processors to perform any one or more of the methodologies described herein. Thus, a "machine-readable medium" refers to a single storage device or apparatus, and a "cloud-based" storage system or storage network that includes multiple storage devices or apparatus. The term "machine-readable medium" as used herein does not include signals per se.
In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present invention. Other items shown or described as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component, whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the scope of the present invention.
While the invention has been described with reference to specific features and embodiments thereof, it will be apparent that various modifications and combinations of the invention can be made without departing from the scope of the invention. For example, other components may be added to or removed from the system. Accordingly, the specification and drawings are to be regarded only as illustrative of the invention as defined in the appended claims, and are intended to cover any modifications, variations, combinations, or equivalents that fall within the scope of the invention. Other aspects may be within the scope of the following claims. Finally, the conjunctive "or" as used herein refers to a non-exclusive "or" unless specifically stated otherwise.

Claims (20)

1. A computer-implemented method for processing data in a distributed storage network, the method comprising:
detecting a request for a data processing operation originating from a network node, the request comprising a data object identifier of a data object, a shard instance identifier, and availability information identifying multiple versions of the data object;
selecting a shard instance from a plurality of available shard instances by the shard instance identifier, the shard instance identifying a hash function and a subset of storage devices of a set of available storage devices within the distributed storage network;
generating, for each storage device in the subset of storage devices, a modification request for the data processing operation, the modification request including the data object identifier and a corresponding version of the plurality of versions of the data object; and
the modification request for the data processing operation is sent to each storage device of the subset of storage devices for execution.
2. The computer-implemented method of claim 1, wherein the data processing operation is a create operation, and wherein sending the modification request to each storage device in the subset of storage devices causes the create operation to be performed at each storage device in the subset of storage devices and generates the data object.
3. The computer-implemented method of any of claims 1-2, wherein sending the modification request to each of the subset of storage devices further causes a data object naming sequence to be generated at each of the subset of storage devices that identifies the data object.
4. The computer-implemented method of claim 3, wherein the data object naming sequence includes the data object identifier, sharding information based on the sharding instance identifier, and the corresponding version of the plurality of versions of the data object based on the availability information.
5. The computer-implemented method of claim 3, further comprising:
a confirmation of successful execution of the data processing operation associated with the data object is received from each storage device of the subset of storage devices, the confirmation including the data object naming sequence identifying the data object at the storage device.
6. The computer-implemented method of claim 5, further comprising
Sending a notification to the network node of the successful execution, the notification comprising the data object naming sequence identifying the data object at each storage device in the subset of storage devices.
7. The computer-implemented method of any of claims 1-6, wherein the plurality of versions of the data object identified by the availability information comprises a plurality of erasure code versions associated with data information or parity information of the data object.
8. The computer-implemented method of claim 7, further comprising:
receiving an acknowledgement from a first storage device of the subset of storage devices indicating that the data processing operation associated with the data object stored at the first storage device was not successfully performed;
selecting a storage device from a second subset of storage devices of the set of available storage devices; and
a data object reconstruction operation is performed to generate the data object at the selected storage device using the data object stored by at least a second storage device in the subset and based on the plurality of erasure code versions associated with the data information or the parity information of the data object.
9. The computer-implemented method of any one of claims 1 to 8, further comprising:
Detecting a second request for a second data processing operation originating from the network node, the second request comprising the data object identifier of the data object, a Shard Information Tuple (SIT) with the shard instance identifier and hash input information, and the availability information;
applying the hash function of the shard instance corresponding to the shard instance identifier to the hash input information of the SIT to identify a storage device in the subset of storage devices;
sending the second request for the second data processing operation to the storage devices in the subset of storage devices for execution; and
in response to an acknowledgement from the storage device of successful execution of the second data processing operation of the data object, a notification of the successful execution is sent to the network node.
10. The computer-implemented method of claim 9, wherein the second data processing operation is an append operation, and the second request further comprises a data segment for appending to the data object during execution of the append operation.
11. The computer-implemented method of any of claims 9 to 10, wherein the second data processing operation is a read operation, the second request further comprising an offset segment indicating a starting address for performing the read operation on the data object at the storage device.
12. The computer-implemented method of any of claims 9 to 11, wherein the plurality of versions of the data object identified by the availability information includes a plurality of redundancy versions associated with a corresponding plurality of copies of the data object.
13. The computer-implemented method of claim 12, further comprising:
receiving an acknowledgement from the storage devices of the subset of storage devices indicating that the second data processing operation associated with a first redundancy version of the plurality of redundancy versions of the first copy of the data object stored at the storage device was not successfully performed;
selecting another storage device from the subset of storage devices, the other storage device storing a second redundancy version of the plurality of redundancy versions of the second copy of the data object; and
the second request for the second data processing operation is sent to the other storage device for execution using the second copy of the data object associated with the second redundancy version.
14. A system for processing data in a distributed storage network, the system comprising:
A plurality of storage devices associated with the distributed storage network;
processing circuitry communicatively coupled to the plurality of storage devices, the processing circuitry to perform operations comprising:
detecting a request for a data processing operation originating from a network node, the request comprising a data object identifier of a data object, a shard instance identifier, and availability information identifying multiple versions of the data object;
selecting a shard instance from a plurality of available shard instances by the shard instance identifier, the shard instance identifying a hash function and a subset of storage devices of the plurality of storage devices within the distributed storage network;
generating, for each storage device in the subset of storage devices, a modification request for the data processing operation, the modification request including the data object identifier and a corresponding version of the plurality of versions of the data object; and
the modification request for the data processing operation is sent to each storage device of the subset of storage devices for execution.
15. The system of claim 14, wherein the plurality of versions of the data object identified by the availability information comprises a plurality of erasure code versions associated with data information or parity information of the data object.
16. The system of claim 15, wherein the processing circuit is configured to perform operations comprising:
receiving an acknowledgement from a first storage device of the subset of storage devices indicating that the data processing operation associated with the data object stored at the first storage device was not successfully performed;
selecting a storage device from a second subset of storage devices of the plurality of storage devices; and
a data object reconstruction operation is performed to generate the data object at the selected storage device using the data object stored by at least a second storage device in the subset and based on the plurality of erasure code versions associated with the data information or the parity information of the data object.
17. The system of any one of claims 14 to 16, wherein the processing circuit is configured to perform operations comprising:
detecting a second request for a second data processing operation originating from the network node, the second request comprising the data object identifier of the data object, a Shard Information Tuple (SIT) with the shard instance identifier and hash input information, and the availability information;
Applying the hash function of the shard instance corresponding to the shard instance identifier to the hash input information of the SIT to identify a storage device in the subset of storage devices;
sending the second request for the second data processing operation to the storage devices in the subset of storage devices for execution; and
in response to an acknowledgement from the storage device of successful execution of the second data processing operation of the data object, a notification of the successful execution is sent to the network node.
18. A non-transitory computer-readable medium storing computer instructions for processing data in a distributed storage network, the instructions, when executed by one or more processors of a computing device, cause the one or more processors to perform operations comprising:
detecting a request for a data processing operation originating from a network node, the request comprising a data object identifier of a data object, a shard instance identifier, and availability information identifying multiple versions of the data object;
selecting a shard instance from a plurality of available shard instances by the shard instance identifier, the shard instance identifying a hash function and a subset of storage devices of a plurality of storage devices within the distributed storage network;
Generating, for each storage device in the subset of storage devices, a modification request for the data processing operation, the modification request including the data object identifier and a corresponding version of the plurality of versions of the data object; and
the modification request for the data processing operation is sent to each storage device of the subset of storage devices for execution.
19. The non-transitory computer-readable medium of claim 18, wherein sending the modification request to each storage device in the subset of storage devices further causes a data object naming sequence to be generated at each storage device in the subset of storage devices that identifies the data object;
the data object naming sequence includes the data object identifier, sharding information based on the sharding instance identifier, and the corresponding version of the plurality of versions of the data object based on the availability information.
20. The non-transitory computer-readable medium of claim 19, wherein executing the instructions causes the one or more processors to perform operations comprising:
receiving a confirmation of successful execution of the data processing operation associated with the data object from each storage device of the subset of storage devices, the confirmation including the data object naming sequence identifying the data object at the storage device; and
Sending a notification to the network node of the successful execution, the notification comprising the data object naming sequence identifying the data object at each storage device in the subset of storage devices.
CN202180093306.4A 2021-02-18 2021-02-18 Distributed naming scheme for network attached storage devices Pending CN116917883A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2021/018513 WO2022177564A1 (en) 2021-02-18 2021-02-18 Distributed naming scheme for network-attached storage devices

Publications (1)

Publication Number Publication Date
CN116917883A true CN116917883A (en) 2023-10-20

Family

ID=74871820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180093306.4A Pending CN116917883A (en) 2021-02-18 2021-02-18 Distributed naming scheme for network attached storage devices

Country Status (2)

Country Link
CN (1) CN116917883A (en)
WO (1) WO2022177564A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2998881B1 (en) * 2014-09-18 2018-07-25 Amplidata NV A computer implemented method for dynamic sharding
US10198319B1 (en) * 2014-12-15 2019-02-05 Amazon Technologies Inc. Computation refinement storage in a data storage system
KR101989074B1 (en) * 2017-08-10 2019-06-14 네이버 주식회사 Migration based on replication log in database sharding environment
US10915554B2 (en) * 2018-05-24 2021-02-09 Paypal, Inc. Database replication system

Also Published As

Publication number Publication date
WO2022177564A1 (en) 2022-08-25

Similar Documents

Publication Publication Date Title
JP6798960B2 (en) Virtual Disk Blueprint for Virtualized Storage Area Networks
US11157457B2 (en) File management in thin provisioning storage environments
US10496671B1 (en) Zone consistency
CN113287286B (en) Input/output processing in distributed storage nodes over RDMA
US9607071B2 (en) Managing a distributed database across a plurality of clusters
JP6488296B2 (en) Scalable distributed storage architecture
US8356017B2 (en) Replication of deduplicated data
CN108733311B (en) Method and apparatus for managing storage system
US20140143367A1 (en) Robustness in a scalable block storage system
US9389950B2 (en) Techniques for information protection in a solid-state device based storage pool
WO2019148841A1 (en) Distributed storage system, data processing method and storage node
US10379942B2 (en) Efficient transfer of objects between containers on the same vault
US20200133867A1 (en) Method, apparatus, and computer program product for providing cache service
CN114641977A (en) System and method for cross-regional data management in a dual-active architecture
US20140359213A1 (en) Differencing disk improved deployment of virtual machines
US20230100323A1 (en) Memory Allocation for Block Rebuilding in a Storage Network
US10776173B1 (en) Local placement of resource instances in a distributed system
WO2021088586A1 (en) Method and apparatus for managing metadata in storage system
US10901767B2 (en) Data locality for hyperconverged virtual computing platform
CN116917883A (en) Distributed naming scheme for network attached storage devices
US10223033B2 (en) Coordinating arrival times of data slices in a dispersed storage network
WO2012171363A1 (en) Method and equipment for data operation in distributed cache system
WO2022177566A1 (en) Data access processing on network-attached storage devices
CN108573049A (en) Data processing method and distributed storage devices
WO2022177561A1 (en) Data access processing agnostic to mapping unit size

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination