US20080201335A1 - Method and Apparatus for Storing Data in a Peer to Peer Network - Google Patents

Method and Apparatus for Storing Data in a Peer to Peer Network Download PDF

Info

Publication number
US20080201335A1
US20080201335A1 US12/023,133 US2313308A US2008201335A1 US 20080201335 A1 US20080201335 A1 US 20080201335A1 US 2313308 A US2313308 A US 2313308A US 2008201335 A1 US2008201335 A1 US 2008201335A1
Authority
US
United States
Prior art keywords
data
plurality
physical nodes
fragments
peer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/023,133
Inventor
Cezary Dubnicki
Leszek Gryz
Krzysztof Lichota
Cristian Ungureanu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Laboratories America Inc
Original Assignee
NEC Laboratories America Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US89066107P priority Critical
Application filed by NEC Laboratories America Inc filed Critical NEC Laboratories America Inc
Priority to US12/023,133 priority patent/US20080201335A1/en
Assigned to NEC LABORATORIES AMERICA, INC. reassignment NEC LABORATORIES AMERICA, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: UNGUREANU, CRISTIAN, DUBNICKI, CEZARY, GRYZ, LESZEK K, LICHOTA, KRZYSZTOF
Priority claimed from US12/038,296 external-priority patent/US8090792B2/en
Publication of US20080201335A1 publication Critical patent/US20080201335A1/en
Application status is Abandoned legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers

Abstract

A fixed prefix peer to peer network has a number of physical nodes. The nodes are logically divided into a number of storage slots. Blocks of data are erasure coded into original and redundant data fragments and the resultant fragments of data are stored in slots on separate physical nodes such that no physical node has more than one original and/or redundant fragment. The storage locations of all of the fragments are organized into a logical virtual node (e.g., a supernode). Thus, the supernode and the original block of data can be recovered even if some of the physical nodes are lost.

Description

  • This application claims the benefit of U.S. Provisional Application No. 60/890,661 filed Feb. 20, 2007 which is incorporated herein by reference.
  • BACKGROUND OF THE INVENTION
  • The present invention relates generally to peer to peer networking and more particularly to storing data in peer to peer networks.
  • Peer to peer networks for storing data may be overlay networks that allow data to be distributively stored in the network (e.g., at nodes). In peer to peer networks, there are links between any two peers (e.g., nodes) that communicate with each other. That is, nodes in the peer to peer network may be considered as being connected by virtual or logical links, each of which corresponds to a path in the underlying network (e.g., a path of physical links). Such a structured peer to peer network employs a globally consistent protocol to ensure that any node can efficiently route a search to some peer that has desired data (e.g., a file, piece of data, packet, etc.). A common type of structured peer to peer network uses a distributed hash table (DHT) in which a variant of consistent hashing is used to assign ownership of each file or piece of data to a particular peer in a way analogous to a traditional hash table's assignment of each key to a particular array slot.
  • However, traditional DHTs do not readily support data redundancy and may compromise the integrity of data stored in systems using DHTs. To overcome these obstacles in existing peer to peer networks, files or pieces of data are N-way replicated, but this results in high storage overhead and often requires multiple hashing functions to locate copies of the data. Further, it is difficult to add support for monitoring data resiliency and automatic rebuilding of missing data.
  • Accordingly, improved systems and methods of organizing and storing data in peer to peer networks are required.
  • BRIEF SUMMARY OF THE INVENTION
  • The present invention generally provides a method of storing data in a fixed prefix peer to peer network having a plurality of physical nodes. A plurality of data fragments are generated by erasure coding a block of data and each of the data fragments are then stored in different physical nodes. In one embodiment, the erasure coding divides the block of data into a number of original fragments and a number of redundant fragments are created where the number of redundant fragments is equal to a predetermined network cardinality minus the number of original data fragments. The physical nodes in the peer to peer network are logically divided into storage slots and the data fragments are stored in different slots on different physical nodes. The storage locations of the fragments (e.g., the slots) are logically organized into a virtual node.
  • To generate the data fragments by erasure coding, a network cardinality is determined, the block of data is divided into a number of original fragments, and a number of redundant fragments are created wherein the number of redundant fragments is equal to the network cardinality minus the number of original data fragments.
  • The storage locations of the plurality of data fragments are mapped in a data structure in which the storage locations are the physical nodes in which the plurality of data fragments are stored. In some embodiments, the data structure is a distributed hash table.
  • These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a diagram of an exemplary peer to peer network according to an embodiment of the invention;
  • FIG. 2 is a diagram of an exemplary peer to peer network according to an embodiment of the invention;
  • FIG. 3 is a diagram of an exemplary peer to peer network according to an embodiment of the invention;
  • FIG. 4 is an exemplary supernode composition and component description table 400 according to an embodiment of the present invention;
  • FIG. 5 is a depiction of data to be stored in a peer to peer network;
  • FIG. 6 is a flowchart of a method of storing data in a fixed prefix peer to peer network according to an embodiment of the present invention; and
  • FIG. 7 is a schematic drawing of a controller according to an embodiment of the invention.
  • DETAILED DESCRIPTION
  • The present invention extends the concept of Distributed Hash Tables (DHTs) to create a more robust peer to peer network. The improved methods of storing data described herein allow for a simple DHT organization with built-in support for multiple classes of data redundancy which have a smaller storage overhead than previous DHTs. Embodiments of the invention also support automatic monitoring of data resilience and automatic reconstruction of lost and/or damaged data.
  • The present invention provides greater robustness and resiliency to the DHT-based peer to peer network known as a Fixed Prefix Network (FPN) disclosed in U.S. patent application Ser. No. 10/813,504, filed Mar. 30, 2004 and incorporated herein by reference. Unlike traditional peer to peer networks, FPNs and networks according to the present invention, known as FPNs with Supernodes (FPN/SN), are constructed such that the contributed resources (e.g., nodes) are dedicated to the peer to peer system and the systems are accordingly significantly more stable and scalable.
  • FIGS. 1-3 depict various illustrative embodiments of peer to peer networks utilizing FPN/SNs. FIGS. 1-3 are exemplary diagrams to illustrate the various structures and relationships described below and are not meant to limit the invention to the specific network layouts shown.
  • FIG. 1 is a diagram of an exemplary peer to peer network 100 for use with an embodiment of the present invention. The peer to peer network 100 has a plurality of physical nodes 102, 104, 106, and 108 that communicate with each other through an underlying transport network 110 as is known. There is no restriction on the location, grouping, or number of the physical nodes 102-108 with regards to the present invention. Though depicted in FIG. 1 as four physical nodes 102-108, it is understood that any number of nodes in any arrangement may be utilized. Similarly, the physical nodes 102-108 may vary in actual storage space, processing power, and/or other resources.
  • Physical nodes 102-108 each have associated memories and/or storage areas (not shown) as is known. The memories and/or storage areas of physical nodes 102-108 are each logically divided into a plurality of slots approximately proportional to the amount of storage available to each physical node. That is, the memory and/or storage area of physical node 102 is logically divided into approximately equivalent-sized slots 112 a, 112 b, 112 c, and 112 d, the memory and/or storage area of physical node 104 is logically divided into approximately equivalent-sized slots 114 a, 114 b, 114 c, and 114 d, the memory and/or storage area of physical node 106 is logically divided into approximately equivalent-sized slots 116 a, 116 b, 116 c, and 116 d, and the memory and/or storage area of physical node 108 is logically divided into approximately equivalent-sized (e.g., in terms of storage capacity) slots 118 a, 118 b, 118 c, and 118 d. A physical node may be logically divided in that its memory and/or storage allocation may be allocated as different storage areas (e.g., slots). Physical nodes 102-108 may be divided into any appropriate number of slots, the slots being representative of an amount of storage space in the node. In other words, data may be stored in the nodes 102-108 in a sectorized or otherwise compartmentalized manner. Of course, any appropriate division of the storage and/or memory of physical nodes 102-108 may be used and slots 112 a-d, 114 a-d, 116 a-d, and 118 a-d may be of unequal size. Further, slot size may not be static and may grow or shrink and slots may be split and/or may be merged with other slots.
  • Each physical node 102-108 is responsible for the storage and retrieval of one or more objects (e.g., files, data, pieces of data, data fragments, etc.) in the slots 112 a-d, 114 a-d, 116 a-d, and 118 a-d, respectively. Each object may be associated with a preferably fixed-size hash key of a hash function. In operation, one or more clients 120 may communicate with one or more of physical nodes 102-108 and issue a request for a particular object using a hash key.
  • Slots 112 a-d, 114 a-d, 116 a-d, and 118 a-d may also each be associated with a component of a virtual (e.g., logical) node (discussed in further detail below with respect to FIGS. 2 and 3). Herein, components are not physical entities, but representations of a portion of a virtual node. That is, components may be logical representations of and/or directions to or addresses for a set or subset of data that is hosted in a particular location in a node (e.g., hosted in a slot). Storage locations of data fragments (e.g., data fragments discussed below with respect to FIG. 5) are logically organized into a virtual node.
  • FIG. 2 is a diagram of a portion of an exemplary peer to peer network 200 for use with an embodiment of the present invention. The peer to peer network 200 is similar to peer to peer network 100 and has a plurality of physical nodes 202, 204, 206, 208, 210, and 212 similar to physical nodes 102-108. Physical nodes 202-212 are each logically divided into a plurality of slots approximately proportional to the amount of storage available to each physical node. That is, physical node 202 is divided logically into slots 214 a, 214 b, 214 c, and 214 d, physical node 204 is divided logically into slots 216 a, 216 b, 216 c, and 216 d, physical node 206 is divided logically into slots 218 a, 218 b, 218 c, and 218 d, physical node 208 is divided logically into slots 220 a, 220 b, 220 c, and 220 d, physical node 210 is divided logically into slots 222 a, 222 b, 222 c, and 222 d, and physical node 212 is divided logically into slots 224 a, 224 b, 224 c, and 224 d. For simplicity of discussion and depiction in FIG. 2, since each slot 214 a-d, 216 a-d, 218 a-d, 220 a-d, 222 a-d, and 224 a-d hosts a component, the component corresponding to its host slot is referred to herein with the same reference numeral. For example, the component hosted in slot 214 c of physical node 202 is referred to as component 214 c.
  • A grouping of multiple components is referred to as a virtual node (e.g., a “supernode”). In the example of FIG. 2, supernode 226 comprises components 214 b, 216 c, 218 b, 220 d, 222 a, and 224 a. A virtual node (e.g., supernode) is thus a logical grouping of a plurality of storage locations on multiple physical nodes. The supernode may have any number of components—where the number of components is the supernode cardinality (e.g., the number of components in a supernode)—associated with any number of physical nodes in a network and a supernode need not have components from every physical node. However, each component of a supernode must be hosted in slots on different physical nodes. That is, no two components in a supernode should be hosted at the same physical node. The total number of components in a supernode may be given by a predetermined constant—supernode cardinality. In some embodiments, the supernode cardinality may be in the range of 4-6 32. The supernode cardinality may be a predetermined (e.g., desired, designed, etc.) number of data fragments.
  • In some embodiments, a larger supernode cardinality is chosen to increase flexibility in choosing data classes. In alternative embodiments, a smaller supernode cardinality is chosen to provide greater access to storage locations (e.g., disks) in read/write operations. Here, data classes define a level of redundancy where lower data classes (e.g., data class low) have less redundancy and higher data classes (e.g., data class high) have more redundancy. There may be a number of data classes equal to the predetermined supernode cardinality. The lowest data class is defined as having no redundant fragment and the highest class is defined as having (supernode cardinality—1) redundant fragments.
  • In an exemplary embodiment, data class low may refer to a single redundant fragment and data class high may refer to four redundant fragments. Of course, any appropriate number of data fragments may be set for data class low and/or data class high. In this exemplary embodiment, data blocks that are classified by user as data class low will be divided into a number of fragments equal to a supernode cardinality, where there are (supernode cardinality—1) original fragments and one redundant fragment. Accordingly, one fragment may be lost and the data block may be recreated. Using data class high (e.g., four redundant fragments) a block of data will be divided into fragments such that four of them will be redundant. Thus, four fragments may be lost and the original block of data may be recreated. Fragmentation, especially redundant fragments, is discussed in further detail below with respect to FIG. 5.
  • Components of the supernode may be considered peers and may similarly associated (e.g., in a hash table, etc.), addressed, and/or contacted as peer nodes in a traditional peer to peer network.
  • FIG. 3 depicts a high level abstraction of an exemplary peer to peer network 300 according to an embodiment of the invention. Peer to peer network 300 is similar to peer to peer networks 100 and 200 and has multiple physical nodes 302, 304, 306, and 308. Each of the physical nodes 302-308 is divided into multiple slots as described above. In the particular example of FIG. 3, each of the physical nodes 302-308 has eight slots. As in FIG. 2, each slot 310, 312, 314, 316, 318, 320, 322, or 324 hosts a component 310, 312, 314, 316, 318, 320, 322, or 324. Components 310-324 are each associated with a corresponding supernode and are distributed among the physical nodes 302-308. In this way, eight supernodes are formed, each with one component 310-324 on each of the four physical nodes 302-308. For example, a first supernode is formed with four components—component 310 hosted on physical node 302 (e.g., in a slot 310), component 310 hosted in physical node 304 (e.g., in a slot 310), component 310 hosted in physical node 306 (e.g., in a slot 310), and component 310 hosted in physical node 308 (e.g., in a slot 310). The first supernode, comprising components 310, is shown as dashed boxes. A second supernode comprises the four components 312 hosted in physical nodes 302-308 and is shown as a trapezoid. Of course, these are merely graphical representations to highlight the different components comprising different supernodes and are not meant to be literal representations of what a slot, component, node, or supernode might look like. The remaining six supernodes are formed similarly.
  • To facilitate data storage using the supernodes as described and shown in FIGS. 1-3, the fixed prefix network model of DHTs (e.g., FPN) may be extended to use supernodes. Any advantageous hashing function that maps data (e.g., objects, files, etc.) to a fixed-size hash key may be utilized in the context of the present invention. The hash keys may be understood to be fixed-size bit strings (e.g., 5 bits, 6 bits, etc.) in the space containing all possible combinations of such strings. A subspace of the hashkey space is associated with a group of bits of the larger bit string as is known. For example, a group of hash keys beginning with 110 in a 5 bit string would include all hash keys except those beginning with 000, 001, 010, 011,100, and 101. That is, the prefix is 110. Such a subspace of the hashkey space may be a supernode and a further specification may be a component of the supernode. The prefix may be fixed for the life of a supernode and/or component. In such embodiments, the peer to peer network is referred to as a fixed-prefix peer to peer network. Other methods of hashing may be used as appropriate.
  • FIG. 4 is an exemplary supernode composition and component description table 400 according to an embodiment of the present invention. The supernode composition and component description table 400 may be used in conjunction with the peer to peer network 200, for example. Each supernode (e.g., supernode 226) is described by a supernode composition (e.g., with supernode composition and component description table 400) comprising the supernode prefix 402, an array 404 of the component descriptions, and a supernode version 406. Since each component has a description as described below, the array 402 size is equal to the supernode cardinality. The supernode version 406 is a sequence number corresponding to the current incarnation of the supernode. Each supernode is identified by a fixed prefix 402 as described above and in U.S. patent application Ser. No. 10/813,504. For example, in hashing and/or storing data in peer to peer network 200 according to supernode composition and component description table 400 in which hash keys are fixed size bit strings, the supernode 226 has a fixed prefix of 01101. Therefore, any data that has a hash key beginning with 01101 will be associated with supernode 226.
  • In operation, each component (e.g., 214 b, 216 c, 218 b, 220 d, 222 a, 224 a, etc.) in the component array 404 is described by a component description comprising a fixed prefix 408, a component index 410, and a component version 412. All components of the supernode (e.g., in array 404) are also assigned the same fixed prefix for their lifetime. The component index 410 of each component corresponds to a location in the supernode array. A component's index is fixed for the component's lifetime and is an identification number pointing to the particular component. A component index is a number between 0 and (supernode cardinality—1). A component's version is a version number sequentially increased whenever the component changes hosts (e.g., nodes). In some embodiments, described in detail in concurrently filed U.S. patent application Ser. No. ______, entitled “Methods for Operating a Fixed Prefix Peer to Peer Network”, Attorney Docket No. 06083A, incorporated by reference herein, a component may be split or moved from one physical node to another and its version is increased in such instances.
  • Supernode composition and component description table 400 is an example of an organization of the information related to physical nodes, supernodes, and their respective components. Of course, one skilled in the art would recognize other methods of organizing and providing such information, such as storing the information locally on physical nodes in a database, storing the information at a remote location in a communal database, etc.
  • Updated indications of the supernode composition are maintained (e.g., in supernode composition and component description table 400, etc.) to facilitate communication amongst peers. Further, physical nodes associated with the components maintain compositions of neighboring physical and/or virtual nodes. To maintain such compositions, physical nodes associated with components ping peers and neighbors as is known. In this way, a physical node associated with a component may internally ping physical nodes associated with peers in the component's supernode to determine virtual node health and/or current composition. Further, a physical node associated with a component may externally ping physical nodes associated with neighbors (e.g., components with the same index, but belonging to a different supernode) to propagate and/or collect composition information. Of course, other systems and methods of organizing and/or keeping track of supernodes and their components, including version/incarnation information may be used as appropriate.
  • FIG. 5 is a generalized drawing of data that may be stored in peer to peer networks 100, 200, and/or 300. A block 502 of data may be divided into multiple pieces 504 of data according to any conventional manner. In at least one embodiment, the block of data 502 may be fragmented into multiple original pieces (e.g., fragments) 506 and a number of redundant fragments 508 may also be generated. Such fragmentation and/or fragment generation may be accomplished by erasure coding, replication, and/or other fragmentation means.
  • FIG. 6 depicts a flowchart of a method 600 of organizing data in a fixed prefix peer to peer network according to an embodiment of the present invention with particular reference to FIGS. 2 and 5 above. Though discussed with reference to the peer to peer network 200 of FIG. 2, the method steps described herein also may be used in peer to peer networks 100 and 300, as appropriate. The method begins at step 602.
  • In step 604, a network cardinality is determined. Network cardinality may be a predetermined constant for an entire system and may be determined in any appropriate fashion.
  • In step 606, a plurality of data fragments 506-508 are generated. In at least one embodiment, the data fragments 506-508 are generated from a block of data 502 by utilizing an erasure code. Using the erasure code transforms a block 502 of n (here, four) original pieces of data 504 into more than n fragments of data 506-508 (here, four original fragments and two redundant fragments) such that the original block 502 of n pieces (e.g., fragments) of data 504 can be recovered from a subset of those fragments (e.g., fragments 506-508). The fraction of the fragments 506-508 required to recover the original n pieces of data 504 is called the rate r. In some embodiments, optimal erasure codes may be used. An optimal erasure code produces n/r fragments of data where any n fragments may be used to recover the original n pieces of data. In alternative embodiments, near optimal erasure codes may be used to conserve system resources. In the same or alternative embodiments, the block of data 502 is divided into n pieces 506. Based on the original n pieces 506, m redundant fragments 508 are created where (m=supernode cardinality−n) and the fragment size is equal to the size of the original block of data 502 divided by n. It may be understood that the erasure coding and creation of redundant fragments 508 allows recreation of the original block of data 502 with half plus one redundant fragments 508 and/or original fragments 506. In the example shown in FIG. 5, only four total fragments from the group of fragments 506-508 are needed to reconstruct original block of data 502. Of course, any other erasure coding scheme may be used.
  • In step 608, the data fragments 506-508 are stored in different physical nodes 202-212. Each of the data fragments 506, representing the original pieces of the data block 502, and the redundant fragments 508 are stored in separate physical nodes 202-212 using any appropriate methods of storing data in a peer to peer network. In at least one embodiment, data fragments 506-508 are stored in separate slots 214 a-d, 216 a-d, 218 a-d, 220 a-d, 222 a-d, 224 a-d of the physical nodes 202-212. For example, one fragment from fragments 508 and 508 may be stored in each of slots 214 b, 216 c, 218 b, 220 d, 222 a, and 224 a.
  • A hash may be computed based on the original block of data 502. A virtual node (e.g., virtual node 226) is then found that has the same fixed prefix as the prefix of the computed hash. Since, virtual node 226 comprises components 214 b, 216 c, 218 b, 220 d, 222 a, and 224 a, the data fragments 506-508 are then stored in the slots 214 b, 216 c, 218 b, 220 d, 222 a, and 224 a corresponding to components 214 b, 216 c, 218 b, 220 d, 222 a, and 224 a.
  • In step 610, the storage locations of the data fragments 506-508 are recorded (e.g., mapped, etc.) in a data structure. The data structure may be a hash table, a DHT, a DHT according to the FPN referenced above, the data structures described in co-pending and concurrently filed U.S. patent application Ser. No. ______, entitled “Methods for Operating a Fixed Prefix Peer to Peer Network”, Attorney Docket No. 06083A, incorporated by reference herein, or any other appropriate data structure. The data structure may facilitate organization, routing, look-ups, and other functions of peer to peer networks 100, 200, and 300. Fragments 506-508 may be numbered (e.g., from 0 to a supernode cardinality minus one) and fragments of the same number may be stored (e.g., grouped, arranged, etc.) in a logical entity (e.g., a virtual node component).
  • In step 612, the data structure facilitates organization of information about the data fragments 506-508 into virtual nodes (e.g., supernode 226, supernodes 310-324, etc.). That is, the storage locations (e.g., the slots in the physical nodes) storing each of the original fragments 506 and each of the redundant fragments 408 are organized into and/or recorded as a grouping (e.g., a virtual node/supernode as described above). Accordingly, the fragments 506-508 may be organized into and hosted in supernode 226 as described above so that location, index, and version information about the fragments of data 506-508 may be organized as components of supernode 226.
  • The method ends at step 614.
  • FIG. 7 is a schematic drawing of a controller 700 according to an embodiment of the invention. Controller 700 contains a processor 702 that controls the overall operation of the controller 700 by executing computer program instructions that define such operation. The computer program instructions may be stored in a storage device 704 (e.g., magnetic disk, database, etc.) and loaded into memory 706 when execution of the computer program instructions is desired. Thus, applications for performing the herein-described method steps, such as erasure coding, storing data, and DHT organization, in method 600 are defined by the computer program instructions stored in the memory 706 and/or storage 704 and controlled by the processor 702 executing the computer program instructions. The controller 700 may also include one or more network interfaces 608 for communicating with other devices via a network (e.g., a peer to peer network, etc.). The controller 700 also includes input/output devices 710 (e.g., display, keyboard, mouse, speakers, buttons, etc.) that enable user interaction with the controller 700. Controller 700 and/or processor 702 may include one or more central processing units, read only memory (ROM) devices and/or random access memory (RAM) devices. One skilled in the art will recognize that an implementation of an actual controller could contain other components as well, and that the controller of FIG. 7 is a high level representation of some of the components of such a controller for illustrative purposes.
  • According to some embodiments of the present invention, instructions of a program (e.g., controller software) may be read into memory 706, such as from a ROM device to a RAM device or from a LAN adapter to a RAM device. Execution of sequences of the instructions in the program may cause the controller 700 to perform one or more of the method steps described herein, such as those described above with respect to method 600 and/or erasure coding as described above with respect to FIG. 5. In alternative embodiments, hard-wired circuitry or integrated circuits may be used in place of, or in combination with, software instructions for implementation of the processes of the present invention. Thus, embodiments of the present invention are not limited to any specific combination of hardware, firmware, and/or software. The memory 706 may store the software for the controller 700, which may be adapted to execute the software program and thereby operate in accordance with the present invention and particularly in accordance with the methods described in detail below. However, it would be understood by one of ordinary skill in the art that the invention as described herein could be implemented in many different ways using a wide range of programming techniques as well as general purpose hardware sub-systems or dedicated controllers.
  • Such programs may be stored in a compressed, uncompiled and/or encrypted format. The programs furthermore may include program elements that may be generally useful, such as an operating system, a database management system, and device drivers for allowing the controller to interface with computer peripheral devices, and other equipment/components. Appropriate general purpose program elements are known to those skilled in the art, and need not be described in detail herein.
  • The inventive methods of organizing a peer to peer network described herein improve network resiliency. Since each supernode includes the fragments derived from an original block of data (e.g., by erasure coding) and each of the fragments is thus stored on a separate physical node, the network is less susceptible to failure due to network changes. That is, changes to the peer physical nodes such as failures and node departures are less likely to affect the peer to peer network because of the distributed nature of the data.
  • Accordingly, the inventive methods may be employed on a peer to peer network. A controller (e.g., controller 700) may perform hashing functions store and/or look up one or more pieces of data in the peer to peer network. The controller may further be configured to recover the stored data should one or more of the physical nodes be lost (e.g., through failure, inability to communicate, etc.) Of course, the physical nodes in the peer to peer network may be configured to perform one or more of the functions of the controller instead.
  • The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.

Claims (18)

1. A method of storing data in a fixed prefix peer to peer network having a plurality of physical nodes comprising:
generating a plurality of data fragments by erasure coding a block of data;
storing each of the plurality of data fragments in different physical nodes.
2. The method of claim 1 wherein storing each of the plurality of data fragments in different physical nodes comprises:
logically dividing each of the physical nodes into a plurality of slots; and
storing each of the plurality of data fragments in different slots on different physical nodes.
3. The method of claim 3 further comprising:
associating the different slots on different physical nodes as a virtual node.
4. The method of claim 1 wherein generating a plurality of data fragments by erasure coding a block of data comprises:
determining a network cardinality;
dividing the block of data into a number of original fragments; and
creating a plurality of redundant fragments wherein the number of redundant fragments is equal to the network cardinality minus the number of original data fragments.
5. The method of claim 1 further comprising mapping storage locations of the plurality of data fragments in a data structure wherein the storage locations are the physical nodes in which the plurality of data fragments are stored.
6. The method of claim 5 wherein the data structure is a distributed hash table.
7. A peer to peer network for storing fragments of data comprising:
a plurality of physical nodes each logically divided into a plurality of slots;
a plurality of controllers associated with each of the physical nodes and configured to associate the plurality of physical nodes as one or more logical nodes comprising a grouping of slots wherein each of the one or more logical nodes includes slots from more than one of the physical nodes.
8. The peer to peer network of claim 7 wherein the controllers are further configured to store a plurality of data fragments in the grouping of slots.
9. The peer to peer network of claim 8 wherein the controllers are further configured to map storage locations of the plurality of data fragments in a data structure wherein the storage locations are the physical nodes in which the plurality of data fragments are stored.
10. A machine readable medium having program instructions stored thereon, the instructions capable of execution by a processor and defining the steps of:
generating a plurality of data fragments by erasure coding a block of data;
storing each of the plurality of data fragments in different physical nodes.
11. The machine readable medium of claim 10 wherein the instructions further define the steps of:
logically dividing each of the physical nodes into a plurality of slots; and
storing each of the plurality of data fragments in different slots on different physical nodes.
12. The machine readable medium of claim 10 wherein the instructions further define the step of:
associating the different slots on different physical nodes as a virtual node.
13. The machine readable medium of claim 10 wherein the instructions further define the step of:
mapping storage locations of the plurality of data fragments in a data structure wherein the storage locations are the physical nodes in which the plurality of data fragments are stored.
14. The machine readable medium of claim 13 wherein the instructions further define the step of:
mapping the storage locations of the plurality of data fragments in a distributed hash table.
15. An apparatus for storing data in a fixed prefix peer to peer network having a plurality of physical nodes comprising:
means for generating a plurality of data fragments by erasure coding a block of data;
means for storing each of the plurality of data fragments in different physical nodes.
16. The apparatus of claim 15 wherein the means for storing each of the plurality of data fragments in different physical nodes comprises:
means for logically dividing each of the physical nodes into a plurality of slots; and
means for storing each of the plurality of data fragments in different slots on different physical nodes.
17. The apparatus of claim 16 further comprising:
means for associating the different slots on different physical nodes as a virtual node.
18. The apparatus of claim 15 further comprising:
means for mapping storage locations of the plurality of data fragments in a data structure wherein the storage locations are the physical nodes in which the plurality of data fragments are stored.
US12/023,133 2007-02-20 2008-01-31 Method and Apparatus for Storing Data in a Peer to Peer Network Abandoned US20080201335A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US89066107P true 2007-02-20 2007-02-20
US12/023,133 US20080201335A1 (en) 2007-02-20 2008-01-31 Method and Apparatus for Storing Data in a Peer to Peer Network

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US12/023,133 US20080201335A1 (en) 2007-02-20 2008-01-31 Method and Apparatus for Storing Data in a Peer to Peer Network
PCT/US2008/053564 WO2008103568A1 (en) 2007-02-20 2008-02-11 Method and apparatus for storing data in a peer to peer network
PCT/US2008/053568 WO2008103569A1 (en) 2007-02-20 2008-02-11 Methods for operating a fixed prefix peer to peer network
TW097105753A TWI433504B (en) 2007-02-20 2008-02-19 Method and apparatus for storing data in a peer to peer network
US12/038,296 US8090792B2 (en) 2007-03-08 2008-02-27 Method and system for a self managing and scalable grid storage
TW97108198A TWI437487B (en) 2007-03-08 2008-03-07 Method and system for a self managing and scalable grid storage

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/038,296 Continuation-In-Part US8090792B2 (en) 2007-02-20 2008-02-27 Method and system for a self managing and scalable grid storage

Publications (1)

Publication Number Publication Date
US20080201335A1 true US20080201335A1 (en) 2008-08-21

Family

ID=39707530

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/023,141 Active 2029-04-15 US8140625B2 (en) 2007-02-20 2008-01-31 Method for operating a fixed prefix peer to peer network
US12/023,133 Abandoned US20080201335A1 (en) 2007-02-20 2008-01-31 Method and Apparatus for Storing Data in a Peer to Peer Network

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/023,141 Active 2029-04-15 US8140625B2 (en) 2007-02-20 2008-01-31 Method for operating a fixed prefix peer to peer network

Country Status (4)

Country Link
US (2) US8140625B2 (en)
AR (1) AR076255A1 (en)
TW (2) TWI433504B (en)
WO (2) WO2008103568A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100064166A1 (en) * 2008-09-11 2010-03-11 Nec Laboratories America, Inc. Scalable secondary storage systems and methods
US20100070698A1 (en) * 2008-09-11 2010-03-18 Nec Laboratories America, Inc. Content addressable storage systems and methods employing searchable blocks
US7716179B1 (en) 2009-10-29 2010-05-11 Wowd, Inc. DHT-based distributed file system for simultaneous use by millions of frequently disconnected, world-wide users
US20100174968A1 (en) * 2009-01-02 2010-07-08 Microsoft Corporation Heirarchical erasure coding
US20110099200A1 (en) * 2009-10-28 2011-04-28 Sun Microsystems, Inc. Data sharing and recovery within a network of untrusted storage devices using data object fingerprinting
US20140129881A1 (en) * 2010-12-27 2014-05-08 Amplidata Nv Object storage system for an unreliable storage medium

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5176835B2 (en) * 2008-09-29 2013-04-03 ブラザー工業株式会社 Monitoring device, the information processing apparatus, information processing method, and program
US8051205B2 (en) * 2008-10-13 2011-11-01 Applied Micro Circuits Corporation Peer-to-peer distributed storage
EP2433226B1 (en) 2009-06-26 2015-08-12 Simplivity Corporation File system
US8478799B2 (en) 2009-06-26 2013-07-02 Simplivity Corporation Namespace file system accessing an object store
CN102577135B (en) * 2009-11-13 2014-12-03 松下电器(美国)知识产权公司 Encoding method, decoding method, coder and decoder
US9436748B2 (en) 2011-06-23 2016-09-06 Simplivity Corporation Method and apparatus for distributed configuration management
CN103229480B (en) * 2011-11-29 2017-10-17 华为技术有限公司 Data processing apparatus and method for distributed storage system, the client
US8886993B2 (en) * 2012-02-10 2014-11-11 Hitachi, Ltd. Storage device replacement method, and storage sub-system adopting storage device replacement method
US9032183B2 (en) 2012-02-24 2015-05-12 Simplivity Corp. Method and apparatus for content derived data placement in memory
US9043576B2 (en) 2013-08-21 2015-05-26 Simplivity Corporation System and method for virtual machine conversion
US8874835B1 (en) 2014-01-16 2014-10-28 Pure Storage, Inc. Data placement based on data properties in a tiered storage device system
US9213485B1 (en) 2014-06-04 2015-12-15 Pure Storage, Inc. Storage system architecture
US8850108B1 (en) 2014-06-04 2014-09-30 Pure Storage, Inc. Storage cluster
US9367243B1 (en) 2014-06-04 2016-06-14 Pure Storage, Inc. Scalable non-uniform storage sizes
US9612952B2 (en) 2014-06-04 2017-04-04 Pure Storage, Inc. Automatically reconfiguring a storage memory topology
US9836234B2 (en) 2014-06-04 2017-12-05 Pure Storage, Inc. Storage cluster
US10114757B2 (en) 2014-07-02 2018-10-30 Pure Storage, Inc. Nonrepeating identifiers in an address space of a non-volatile solid-state storage
US9747229B1 (en) 2014-07-03 2017-08-29 Pure Storage, Inc. Self-describing data format for DMA in a non-volatile solid-state storage
US8874836B1 (en) 2014-07-03 2014-10-28 Pure Storage, Inc. Scheduling policy for queues in a non-volatile solid-state storage
US9766972B2 (en) * 2014-08-07 2017-09-19 Pure Storage, Inc. Masking defective bits in a storage array
US9082512B1 (en) 2014-08-07 2015-07-14 Pure Storage, Inc. Die-level monitoring in a storage cluster
US9558069B2 (en) 2014-08-07 2017-01-31 Pure Storage, Inc. Failure mapping in a storage array
US9495255B2 (en) * 2014-08-07 2016-11-15 Pure Storage, Inc. Error recovery in a storage cluster
US9483346B2 (en) 2014-08-07 2016-11-01 Pure Storage, Inc. Data rebuild on feedback from a queue in a non-volatile solid-state storage
US10021181B2 (en) * 2014-12-22 2018-07-10 Dropbox, Inc. System and method for discovering a LAN synchronization candidate for a synchronized content management system
US9948615B1 (en) 2015-03-16 2018-04-17 Pure Storage, Inc. Increased storage unit encryption based on loss of trust
US9940234B2 (en) 2015-03-26 2018-04-10 Pure Storage, Inc. Aggressive data deduplication using lazy garbage collection
US10082985B2 (en) 2015-03-27 2018-09-25 Pure Storage, Inc. Data striping across storage nodes that are assigned to multiple logical arrays
US10178169B2 (en) 2015-04-09 2019-01-08 Pure Storage, Inc. Point to point based backend communication layer for storage processing
US9817576B2 (en) 2015-05-27 2017-11-14 Pure Storage, Inc. Parallel update to NVRAM
US20170060700A1 (en) * 2015-08-28 2017-03-02 Qualcomm Incorporated Systems and methods for verification of code resiliency for data storage
US9768953B2 (en) 2015-09-30 2017-09-19 Pure Storage, Inc. Resharing of a split secret
US9843453B2 (en) 2015-10-23 2017-12-12 Pure Storage, Inc. Authorizing I/O commands with I/O tokens
TWI584617B (en) 2015-11-18 2017-05-21 Walton Advanced Eng Inc
US10007457B2 (en) 2015-12-22 2018-06-26 Pure Storage, Inc. Distributed transactions with token-associated execution
US10261690B1 (en) 2016-05-03 2019-04-16 Pure Storage, Inc. Systems and methods for operating a storage system
US9672905B1 (en) 2016-07-22 2017-06-06 Pure Storage, Inc. Optimize data protection layouts based on distributed flash wear leveling
US10216420B1 (en) 2016-07-24 2019-02-26 Pure Storage, Inc. Calibration of flash channels in SSD
US10203903B2 (en) 2016-07-26 2019-02-12 Pure Storage, Inc. Geometry based, space aware shelf/writegroup evacuation
US10141050B1 (en) 2017-04-27 2018-11-27 Pure Storage, Inc. Page writes for triple level cell flash memory
US10210926B1 (en) 2017-09-15 2019-02-19 Pure Storage, Inc. Tracking of optimum read voltage thresholds in nand flash devices

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040215622A1 (en) * 2003-04-09 2004-10-28 Nec Laboratories America, Inc. Peer-to-peer system and method with improved utilization
US20050187946A1 (en) * 2004-02-19 2005-08-25 Microsoft Corporation Data overlay, self-organized metadata overlay, and associated methods
US20070208748A1 (en) * 2006-02-22 2007-09-06 Microsoft Corporation Reliable, efficient peer-to-peer storage
US20080005334A1 (en) * 2004-11-26 2008-01-03 Universite De Picardie Jules Verne System and method for perennial distributed back up
US7466810B1 (en) * 2004-12-20 2008-12-16 Neltura Technology, Inc. Distributed system for sharing of communication service resources between devices and users

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7130921B2 (en) * 2002-03-15 2006-10-31 International Business Machines Corporation Centrally enhanced peer-to-peer resource sharing method and apparatus
KR20040084530A (en) * 2003-03-28 2004-10-06 엘지전자 주식회사 Software upgrade method for mobile communication device using infrared
US7418454B2 (en) * 2004-04-16 2008-08-26 Microsoft Corporation Data overlay, self-organized metadata overlay, and application level multicasting
JP2006319909A (en) * 2005-05-16 2006-11-24 Konica Minolta Holdings Inc Data communication method, peer-to-peer network, and information processing apparatus
US8060648B2 (en) * 2005-08-31 2011-11-15 Cable Television Laboratories, Inc. Method and system of allocating data for subsequent retrieval

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040215622A1 (en) * 2003-04-09 2004-10-28 Nec Laboratories America, Inc. Peer-to-peer system and method with improved utilization
US20050135381A1 (en) * 2003-04-09 2005-06-23 Nec Laboratories America, Inc. Peer-to-peer system and method with prefix-based distributed hash table
US20050187946A1 (en) * 2004-02-19 2005-08-25 Microsoft Corporation Data overlay, self-organized metadata overlay, and associated methods
US20080005334A1 (en) * 2004-11-26 2008-01-03 Universite De Picardie Jules Verne System and method for perennial distributed back up
US7466810B1 (en) * 2004-12-20 2008-12-16 Neltura Technology, Inc. Distributed system for sharing of communication service resources between devices and users
US20070208748A1 (en) * 2006-02-22 2007-09-06 Microsoft Corporation Reliable, efficient peer-to-peer storage

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100064166A1 (en) * 2008-09-11 2010-03-11 Nec Laboratories America, Inc. Scalable secondary storage systems and methods
US20100070698A1 (en) * 2008-09-11 2010-03-18 Nec Laboratories America, Inc. Content addressable storage systems and methods employing searchable blocks
CN101676855A (en) * 2008-09-11 2010-03-24 美国日本电气实验室公司 Scalable secondary storage systems and methods
US8335889B2 (en) 2008-09-11 2012-12-18 Nec Laboratories America, Inc. Content addressable storage systems and methods employing searchable blocks
US7992037B2 (en) 2008-09-11 2011-08-02 Nec Laboratories America, Inc. Scalable secondary storage systems and methods
US20100174968A1 (en) * 2009-01-02 2010-07-08 Microsoft Corporation Heirarchical erasure coding
US20110099200A1 (en) * 2009-10-28 2011-04-28 Sun Microsystems, Inc. Data sharing and recovery within a network of untrusted storage devices using data object fingerprinting
US8121993B2 (en) * 2009-10-28 2012-02-21 Oracle America, Inc. Data sharing and recovery within a network of untrusted storage devices using data object fingerprinting
US20110106758A1 (en) * 2009-10-29 2011-05-05 Borislav Agapiev Dht-based distributed file system for simultaneous use by millions of frequently disconnected, world-wide users
US8296283B2 (en) 2009-10-29 2012-10-23 Google Inc. DHT-based distributed file system for simultaneous use by millions of frequently disconnected, world-wide users
US7716179B1 (en) 2009-10-29 2010-05-11 Wowd, Inc. DHT-based distributed file system for simultaneous use by millions of frequently disconnected, world-wide users
US20140129881A1 (en) * 2010-12-27 2014-05-08 Amplidata Nv Object storage system for an unreliable storage medium
US9135136B2 (en) * 2010-12-27 2015-09-15 Amplidata Nv Object storage system for an unreliable storage medium

Also Published As

Publication number Publication date
TWI432968B (en) 2014-04-01
TW200843410A (en) 2008-11-01
TW200847689A (en) 2008-12-01
WO2008103569A1 (en) 2008-08-28
AR076255A1 (en) 2011-06-01
WO2008103568A1 (en) 2008-08-28
US8140625B2 (en) 2012-03-20
TWI433504B (en) 2014-04-01
US20080201428A1 (en) 2008-08-21

Similar Documents

Publication Publication Date Title
JP5639640B2 (en) Of backup data intelligent hierarchy
US9940197B2 (en) Method and apparatus for slice partial rebuilding in a dispersed storage network
US9507788B2 (en) Methods and apparatus for distributed data storage
US9405627B2 (en) Flexible data storage system
US7146377B2 (en) Storage system having partitioned migratable metadata
US7590672B2 (en) Identification of fixed content objects in a distributed fixed content storage system
JP5423896B2 (en) Storage system
US7685459B1 (en) Parallel backup
JP4568502B2 (en) Information processing systems and management device
US7788303B2 (en) Systems and methods for distributed system scanning
US8996611B2 (en) Parallel serialization of request processing
US9454533B2 (en) Reducing metadata in a write-anywhere storage system
US9148174B2 (en) Erasure coding and replication in storage clusters
JP5479490B2 (en) Asynchronous distributed garbage collection for the replication storage cluster
US8560798B2 (en) Dispersed storage network virtual address space
JP6346565B2 (en) Method and apparatus for assigning erasure coded data in the disk storage
CN1258921C (en) Distributive video order program system and its data recording and accessing method
JP5411250B2 (en) Data arrangement according to the instructions of the redundant data storage system
CA2811437C (en) Distributed storage system with duplicate elimination
US8762353B2 (en) Elimination of duplicate objects in storage clusters
EP2993585B1 (en) Distributed object storage system comprising performance optimizations
US20070143359A1 (en) System and method for recovery from failure of a storage server in a distributed column chunk data store
US7464247B2 (en) System and method for updating data in a distributed column chunk data store
CA2676593C (en) Scalable secondary storage systems and methods
US20130232152A1 (en) Listing data objects using a hierarchical dispersed storage index

Legal Events

Date Code Title Description
AS Assignment

Owner name: NEC LABORATORIES AMERICA, INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DUBNICKI, CEZARY;GRYZ, LESZEK K;LICHOTA, KRZYSZTOF;AND OTHERS;REEL/FRAME:020560/0407;SIGNING DATES FROM 20080218 TO 20080225