WO2012138296A1 - Method and system for storing data in a cloud network - Google Patents

Method and system for storing data in a cloud network Download PDF

Info

Publication number
WO2012138296A1
WO2012138296A1 PCT/SG2011/000138 SG2011000138W WO2012138296A1 WO 2012138296 A1 WO2012138296 A1 WO 2012138296A1 SG 2011000138 W SG2011000138 W SG 2011000138W WO 2012138296 A1 WO2012138296 A1 WO 2012138296A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
slices
client system
devices
server
Prior art date
Application number
PCT/SG2011/000138
Other languages
French (fr)
Inventor
Kheng Kok MAR
Original Assignee
Nanyang Polytechnic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanyang Polytechnic filed Critical Nanyang Polytechnic
Priority to SG2013071840A priority Critical patent/SG193616A1/en
Priority to PCT/SG2011/000138 priority patent/WO2012138296A1/en
Publication of WO2012138296A1 publication Critical patent/WO2012138296A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2211/00Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
    • G06F2211/10Indexing scheme relating to G06F11/10
    • G06F2211/1002Indexing scheme relating to G06F11/1076
    • G06F2211/1028Distributed, i.e. distributed RAID systems with parity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2101Auditing as a secondary aspect
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2151Time stamp

Definitions

  • This invention relates to processing systems connected to a cloud computing network. More particularly, this invention relates to storing data from a client computer to multiple devices connected to the cloud computing network. Still more particularly, this invention relates to securely storing data from a client computer to multiple devices connected to the cloud computing network in a manner that prevents unauthorized users from obtaining the data and assures that the data will be available if one or more of the devices storing data is not connected to the network.
  • Cloud computing and/or “cloud” networking is common way of maximizing the use of computer resources for networked computer systems.
  • the computer system or server providing a resource is not static and as such is assigned at the time a client system requests a resource.
  • One of the problems in storing data using devices in a cloud network is that the network configuration changes dynamically. In other words, devices may connect or disconnect from the system periodically due to communication problems, network policy, routine maintenance, or many other reasons. As such, there is no guarantee to a client system that data stored by one device will be available when the client needs to read the stored data at a later time.
  • cloud computing network providers typically store three or more copies of the data in its entirety on separate network devices.
  • One method, described in US Patent number 5,485,474 issued on 16 January 1996 to Rabin, for storing portions of data on multiple devices is to transform the data to generate n slices of data such that m slices of the data may be used to reconstruct the original data where n>m.
  • the n slices of data are generated by dividing the data into x consecutive portions of m length where x may be any number.
  • Each of the x portions is then multiplied by an n by m transform matrix to generate a vector.
  • Each of the y elements of the resulting vector is stored in an x th position of the y th slice.
  • the 2nd portion of data is multiplied by the transform matrix to generate a 2nd resulting vector.
  • the first element of the 2nd resulting vector is placed in the second position of the first slice, the second element of the 2nd resulting vector is placed in the second position of the second slice, etc. until all y elements are placed in a slice.
  • the generated slices are then stored on different devices connected to the network.
  • the original data may then be obtained by receiving m slices from the connected devices and multiplying the slices by an inverse transform matrix to determine the data in the original x portions of data. This process works well for ensuring that data is available if one or more of the devices storing data are disconnected from the network. However, this method does not prevent an unauthorized party from accessing the slices and obtaining the data.
  • US Patent Number 7,574,579 issued on 11 August 2009 to Gladwin et al. discloses a system for storing data in a dispersed storage system.
  • the data is transformed into multiple slices of data and each of the slices is stored in a separate device.
  • a management system then stores metadata about the stored data in a database for use in maintaining the system.
  • US Patent Publication 2010/0306578 published on 2 December 2010 in the name of Thornton et al. discloses a system in which data stored in slices on multiple devices connected to a network is re-assembled and checked using a checksum stored on the device along with each data slice.
  • the slices may include data that can be used to re-assemble the data.
  • an unauthorized party may be able to reconstruct the data from obtaining some or all of the slices. Further, as error detection data such as a checksum are stored with the data, a change to the original data and error detection data by an unauthorized party may corrupt the data with a user being aware of the corruption.
  • error detection data such as a checksum
  • a second problem is the updating of stored data.
  • all of the entirety of the data must be re-assembled and new slices representing the changed data must be generated.
  • the requirement of re-assembling the data and generating new slices may cause unacceptable delays in both reading and updating of the file.
  • a first advantage of a method and system in accordance with this invention is that data may be securely stored on multiple devices connected to a network.
  • a second advantage of a method and system in accordance with this invention is that the data stored may be updated without the need to re-assemble and re-generate the whole data from the stored slices.
  • the method and system for storing data in accordance with this invention are provided in the following manner.
  • a registry server receives data storage information for a plurality of slices of data representing a set of data.
  • the data storage information includes an identifier for each of the slices, an identifier of the device in the cloud network storing the data, and error detection data for each of the slices.
  • the registry server stores the data storage information in a memory.
  • the registry server also maintains a list of devices connected to the cloud network that are available to store data.
  • the registry server receives an availability message from a network device indicating the device is available to store data; and stores an identifier of the device in the list of available devices in response to receiving the availability message.
  • the registry server reads a list of namespaces from the availability message. The list of namespaces identifies each namespace that may store data on the device that transmitted the availability message. For purposes of this discussion, a namespace is an identifier for a collection of a plurality of sets of data.
  • the registry server stores an indication in memory of each namespace in the list that may store data on the network device.
  • registry server determines the Internet Protocol (IP) address of the device sending the message and stores the IP address in memory.
  • IP Internet Protocol
  • the registry server performs an authentication process with the network device in response to receiving the availability message and transmits an acknowledgement to the device in response to the device being successfully authenticated.
  • the registry server transmits communication information to the device in response to a successful authentication.
  • the communication information includes information needed to communicate with client devices.
  • the communication information includes a secret key and/or an encryption key.
  • a network device may perform the following process.
  • the network devices transmit an availability message, perform authentication with the registry server, and receive communication information in response to a successful authentication process.
  • the availability message includes a list of namespaces.
  • the list of namespaces includes identifiers of those namespaces that may store data on the device.
  • the availability message includes an identifier of the device and an IP address of the device.
  • the registry server performs the following process when a client system performs a write operation to store data to the system.
  • the registry server receives a write data request from the client system to store a set of data.
  • the registry server transmits write information to the client system.
  • the write information includes information about each network device in a group of devices that are available to store the set of data.
  • the registry server determines the group of devices available to store the set of data in response to receiving the write request.
  • the registry server determines the group of devices by reading a namespace for the set of data from the write request. The server then determines each of the devices in the list of devices that is available to store data for the namespace from information stored in memory. Each of the devices determined to be available to store data for the namespace is then added to the group.
  • a client system performs the following process to store data on devices in the network.
  • the client system receives a request to store data.
  • the client system generates a write request and transmits the request to the registry server.
  • the client system receives write information from the registry server.
  • the write information includes information for each device in a group of devices that are available to store the set of data.
  • the client system then generates a plurality of slices that represent the data in the set of data. Each of the slices is then transmitted to one of the network devices selected from the group of devices.
  • the storage information for the set of data including information about each of the slices is generated and transmitted to the registry server.
  • the slices are generated by applying Rabin's information dispersal algorithm to the data in the set of data.
  • the slices are generated in the following manner.
  • the client system divides the data from the set of data into X consecutive portions of M length where M is the minimum number of slices needed to re-construct data and X is determined by dividing the total amount of data in the set by .
  • the conversion transform matrix is N x M in size and N > M.
  • the client system then inserts each y th element of each of the x resulting vectors in x th position of y th one of the slices.
  • error detection information is then determined for each slice and the determined error detection information is inserted into the storage information to be transmitted to the registry server.
  • the conversion transform matrix is generated from the client system using matrix information read from the write information received from the registry server. In other of these embodiments, the conversion transform matrix is read from the write information received from the registry server.
  • padding is added to the end of the last of the x portions of data to make the last portion M length and the client system inserts an indication of the padding added into the storage information to be sent to registry server.
  • the client system transmits each of the slices to one of the network devices and inserts an identification of the network device storing each of the slices of data into the storage information transmitted to the registry server.
  • the client may select the network device that receives each of the slices from the list of available devices read from the write information received from the registry server.
  • the client system receives an acknowledgment from a network device in response to transmitting a slice to the device.
  • the client system performs an error recovery in response to the acknowledgement from the device not being received by the client system.
  • the client system determines when a specified amount of each slice has been generated and transmits the specified amount to the selected network device and repeats the process until all of the data of each slice is transmitted.
  • the client system reads a section of the set of data from the network devices in the following manner.
  • the process begins by the client system receiving a request to retrieve a section of the set of data.
  • the client system generates a read request that includes an identifier for the set of data.
  • the read request is then transmitted to the registry server.
  • the client system receives storage information for the set of data from the server.
  • the storage information includes information for each of the slices including the network device storing a slice, an order of the slice, and error detection information for the slice.
  • the client system reads the information for each slice from the storage information and transmits a request for each slice to each network device storing each slice.
  • the request includes the portions of the slices needed to re-construct the requested data.
  • the client system receives the requested portions of the slices from the devices in the network. The client system then determines whether portions from a pre-determined number of slices are received in response to the request. If portions from the pre- determined number of slices have been received, the set of data is re-assembled from the slices. In accordance with some embodiments, padding is removed from re-assembled data.
  • the client system reassembles the data in the following manner.
  • the client system reads matrix information from the storage information received from the registry servers and generates an inverse conversion transform matrix from the information.
  • Each of the received slices is then multiplied by the inverse conversion transform matrix to obtain each of the X consecutive portions of data of the set of data that include the requested section of data.
  • error detection information for each slice is read from the storage information read from the registry server and error checking of portion of data received for each slice is performed in response to receiving the portions of each slice from the network devices.
  • a slice is disregarded if an error is detected and more than the pre-determined number of slices have been received. Otherwise, an error recovery process may be performed by the client system.
  • the client system store data appended to the end of the set of data in the following manner.
  • the client receives a request to append the data to the end of the set of data.
  • the client system then generates a storage request and transmits the request to the registry server.
  • the client system requests the portion of each of the slices that can be used to generate the last of the N portions of the set of data.
  • the client system then assembles the last of the N portions of data and removes the padding.
  • the client determines if the data to be appended is less than the amount of padding. If so, the client system adds data to append to the end of the data in the last of N portions to obtain a new last portion.
  • a first portion of the appended data equal to the amount of padding is added to the last portion and a normal write operation is performed for the remainder of the appended data.
  • the client system then multiplies the new last portion by a conversion transform matrix to determine an append vector.
  • the client determines each device storing each of the slices and transmits each of element of the append vector to the device storing to a corresponding slice with an offset for the last position in the slice.
  • the client system then generates error detection data for each of the updated slices and transmits updated storage information to insert the error detection data in the last position for the error detection data of each of the slices stored by the registry server.
  • the registry server receives the error detection information for each of the updated slices, and updates the error detection information of each of the corresponding slices with the error detection information of the element as the last element of the corresponding slice.
  • the client system updates data in the set of data in the following manner.
  • the client system receives a request to update a portion of data in the data set.
  • the client system determines each of the consecutive portions of M length that include the updated data where M is minimum number of slices needed to reconstruct the data from the data set.
  • Each of the consecutive portions including updated data is then multiplied by a conversion transform matrix to determine an updated vector.
  • the client system also requests storage information from the registry server.
  • the client system reads the storage information for each of the devices storing each of the slices.
  • the client system transmits a request for each element of each updated vector to a device storing in corresponding slice for each element to store the element in the proper position of the slice.
  • the client system then generates the error detection information for each of the updated slices and transmits the error detection information to registry server to update the error detection information stored for the updated slices.
  • the registry then updates the storage information accordingly.
  • FIG. 1 illustrating a client system and registry server connected to a cloud computing network in accordance with an embodiment of this invention
  • Figure 2 illustrating an exemplary processing system such as those in a client system, registry server, and network devices in accordance with the shown embodiment of this invention
  • FIG. 3 illustrating a block diagram of applications being executed by a client computer in accordance with an embodiment of this invention
  • FIG. 4 illustrating a representation of the information stored by a registry server in accordance with an embodiment of this invention
  • FIG. 5 illustrating a flow diagram of a process performed by a network device to register with a registry server in accordance with an embodiment of this invention
  • FIG. 6 illustrating a flow diagram of a process performed by a registry server to register a network device as available to store data in accordance with an embodiment of this invention
  • FIG. 7 illustrating a flow diagram of a process performed by a client system to register with a registry server in accordance with this invention
  • FIG. 8 illustrating a flow diagram of a process performed by a registry server to register a client system in accordance with this invention
  • FIG. 9 illustrating a flow diagram of a process performed by a client system to store data from a data set to devices connected to the network in accordance with an embodiment of this invention
  • FIG. 10 illustrating a flow diagram of a process performed by a registry server to respond to request to store data received from a client in accordance with an embodiment of this invention
  • FIG. 11 illustrating a flow diagram of a process performed by a registry server to store information for a set of data stored by devices connected to the network in accordance with an embodiment of this invention
  • Figure 12 illustrating a flow diagram of a process performed by a client system to generate slices representing the set of data to be stored by the network devices in accordance with an embodiment of this invention
  • FIG. 13 illustrating a flow diagram of a process performed by a client system to store slices representing the data to network devices in accordance with an embodiment of this invention
  • Figure 14 illustrating a flow diagram of a process performed by a client system to read a set of data stored on network devices in accordance with an embodiment of this invention
  • Figure 15 illustrating a flow diagram of a process performed by a registry server to provide a client system with storage information to read a set of data stored on network devices in accordance with an embodiment of this invention
  • Figure 16 illustrating a flow diagram of a process performed by a client system for appending data to an end of a set of data stored on the network devices in accordance with an embodiment of this invention
  • FIG. 17 illustrating a flow diagram of a process performed by a client system to update data in a set of data stored on the network devices in accordance with an embodiment of this invention.
  • Figure 18 illustrating a flow diagram of a process performed by a registry server to update information for a set of data stored by devices connected to the network in accordance with an embodiment of this invention.
  • This invention relates to processing systems connected to a cloud computing network. More particularly, this invention relates to storing data from client computer to multiple devices connected to the cloud computing network. Still more particularly, this invention relates to securely storing data from a client computer to multiple devices connected to the cloud computing network in a manner that prevents unauthorized users from obtaining the data and assures that the data will be available if one or more of the devices storing data is not connected to the network.
  • a cloud computing network is a group of processing devices communicatively connected over a network such as the Internet to share resources.
  • a user may not be a proprietor of the network and/or may not have control of network resources and/or of accessibility of data stored by the network resources. Further, a user in accordance with this invention may use resources from one or more cloud computing networks.
  • Figure 1 illustrates an embodiment of this invention in which client system 115 of a user communicatively connects to a cloud computing network 100 and a registry server 110.
  • Client system 115 is a typical processing system such as a desktop computer, laptop computer, or other computer terminal that connects to network 100 via a conventional wire connection, wireless connection or any other method.
  • Client computer 115 executes applications that perform the client system processes in accordance with this invention.
  • client device 115 is shown, any number of clients systems may be connected without departing from this invention.
  • the client device may be connected within a cloud computing network without departing from this invention.
  • Registry server 110 is a processing system that connects to network 100 via a conventional wire connection, wireless connection or any other method.
  • Registry server 110 executes applications for managing the storage of data from client system 115 to devices in network 100 in accordance with embodiments of this invention.
  • communications between devices in network 100 and registry server 110 are protected by SSL.
  • SSL Secure Sockets Layer
  • other means for securing transmissions may be used without departing from this invention.
  • Cloud network 100 includes network devices 130-132 and 135-137.
  • Network devices 130-132 and 135-137 are processing systems that provide resources to clients systems, such as client system 115, over network 100 and are connected to network 100 via a conventional wire connection, wireless connection or any other method. Resources may include but are not limited to storage, processing time, and applications.
  • Network devices 130-132 and 135-137 may be connected as separate systems in network 100 or connected as parts of separate cloud computing networks. As shown in Figure 1, network devices 130- 132 are connected in first cloud computing network 120 and network devices 135-137 are connected in second cloud computing network 125.
  • Figure 2 illustrates an exemplary processing system 200 that represents the processing systems in registry server 110; client system 115; and processing systems 130- 132 and 135-137 that execute instructions to perform the processes described below in each system in accordance with this invention.
  • the instructions may be stored and/or performed as hardware, firmware, or software without departing from this invention.
  • the exact configuration of each processing system may be different and the exact configuration executing processes in accordance with this invention may vary and processing system 200 shown in Figure 2 is provided by way of example only.
  • Processing system 200 includes Central Processing Unit (CPU) 205.
  • CPU 205 is a processor, microprocessor, or any combination of processors and microprocessors that execute instructions to perform the processes in accordance with the present invention.
  • CPU 205 connects to memory bus 210 and Input/Output (I/O) bus 215.
  • Memory bus 210 connects CPU 205 to memories 220 and 225 to transmit data and instructions between the memories and CPU 205.
  • I/O bus 215 connects CPU 205 to peripheral devices to transmit data between CPU 205 and the peripheral devices.
  • I/O bus 215 and memory bus 210 may be combined into one bus or subdivided into many other busses and the exact configuration is left to those skilled in the art.
  • a non-volatile memory 220 such as a Read Only Memory (ROM), is connected to memory bus 210.
  • Non-volatile memory 220 stores instructions and data needed to operate various sub-systems of processing system 200 and to boot the system at start-up.
  • a volatile memory 225 such as Random Access Memory (RAM)
  • RAM Random Access Memory
  • Volatile memory 225 stores the instructions and data needed by CPU 205 to perform software instructions for processes such as the processes for providing a system in accordance with this invention.
  • RAM Random Access Memory
  • I/O device 230 is any device that transmits and/or receives data from CPU 205.
  • Keyboard 235 is a specific type of I/O device that receives user input and transmits the input to CPU 205.
  • Display 240 receives display data from CPU 205 and display images on a screen for a user to see.
  • Memory 245 is a device that transmits and receives data to and from CPU 205 for storing data to a media.
  • Network interface 250 connects CPU 205 to a network for transmission of data to and from other processing systems.
  • Figure 3 illustrates applications 300 executed by client system 115 to perform the processes for storing data in devices of a cloud computing network in accordance with an embodiment of this invention.
  • Application 305 is a software process that generates, uses, and stores data.
  • Client library 310 is a group of processes or objects that provide the file I/O methods for storing and reading data in accordance with the invention.
  • Client library 310 is a Java library containing a collection of Java classes, a C# library or any other collection of applications or objects that are used to perform the I/O methods.
  • client library 310 may also be implemented as kernel level components without departing from this invention.
  • Client library 310 includes slicer/combiner 311 , key management support module 315, data cache 320, key cache 325, and adaptation module 330.
  • Slicer/combiner 311 is a software module that generates slices of data to be stored on the network devices. The slices represent an original set of data. Slicer/combiner 311 also converts the slices of data into the original sets of data when the slices are read from memory.
  • Key management support 315 is a software module that manages assignment of the keys needed to communicate and perform authentication with devices in a cloud computing network. Preferably, the keys are used in SSL communication in accordance with the described embodiment of this invention. However, the key management system may be used to manage keys for other types of communication protocols used to communicate with the network devices in accordance with other embodiments of this invention.
  • Data cache 320 manages the data received from the set of data during conversion processes.
  • Key cache 325 temporarily stores the keys managed by key management support 315.
  • Adaptation module 330 is one or more software modules that generate the necessary messages in the proper protocol to communicate with a device in a particular cloud computer network.
  • adaptation module may use Window Azure REST interface to store slices as a blob in the Azure blob service and Amazon S3 interface may be used to store the slices as objects in an S3 network.
  • Native file system 335 represents the file systems in memory local to client system 115 and cloud storage 340 is the storage available over cloud computing network 100.
  • cloud storage 340 is the storage available over cloud computing network 100.
  • one or more of the slices may be stored in the native file system instead of devices in cloud computing network 100 without departing from this invention. The processes performed by these modules are set forth below.
  • FIG. 4 illustrates a representation of a table maintained by registry server 10 to manage the reading and writing of data from a client system to the cloud computing network.
  • Master table 400 is a table or other type of data structure that stores the information needed by registry server 110 to manage the storage of data from client systems to devices on the network.
  • Master table 400 preferably resides in a volatile memory, such a RAM, a data cache or the like.
  • the volatile memory is readily accessible by a processor of registry server 115 to minimize delays in providing information about stored data during operation.
  • the table or other structure may be stored in other types of memory without departing from this invention.
  • a transaction log may be kept to recreate table 400 should the stored information be lost for some reason.
  • Table 400 includes a Filelnfo entry 405 for each set of data stored by connected client systems to devices in the cloud computing network in accordance with this invention. For purpose of the discussion, sets of data are discussed as being stored by client systems to the network devices. One skilled in the art will recognize that a set of data may be a file, document, object, or other data structure used to store data. Filelnfo entry 405 includes all of the information required to read, write and reconstruct a set of data stored to the network devices.
  • Filelnfo entry 405 should include a checksum field indicating a check sum for the set of data or other error detection data; a namespace field indicating the namespace in which the set of data is stored; a paddingsize field indicating the amount of padding added to the end of the set of data; a shares field indicating the number of slices that are used to store the data on the network; a quorum field indicating the minimum number of slices needed to reconstruct the set of data; and fileslice fields 410 that store information about each slice used to store the set of data on the network devices.
  • other fields may be added to Fileinfo entry 405 to store other information.
  • information regarding the conversion transform matrix and/or inverse conversion transform matrix may be stored in the entry 405.
  • Other examples are also shown in Figure 4.
  • Fileslice field 410 stores the information for a slice of data representing data in the set of data stored.
  • Fileinfo entry 405 includes a fileslice field for each of the slices stored to network devices representing data of the stored set of data.
  • Fileslice field 410 includes a serverlD filed indicating the device in the network storing the slice; a slicelD field that indicates the position in the arrangement of slices; a sliceChecksum field that stores a checksum, error detection hash or other error detection data for the slice; and sliceName field storing an identifier of the slice.
  • Figure 5 illustrates a flow diagram of process 500 performed by a network device to alert registry server 110 that the device is connected to the system and available to store data in accordance with an embodiment of this invention.
  • Registry server 110 maintains a list or other data structure that indicates the devices connected to the network that is available to store data.
  • the list or data structure may also include a listing of the namespaces that may store data on each particular device.
  • a namespace is a grouping of set of data such as a folder or directory that serve to link sets of data in a user defined manner.
  • namespaces may be used to segregate data from different organizational units and may be used to manage the scalability of the data stored. The exact use of the namespaces is left as a design choice of those skilled in the art.
  • Process 500 begins in step 510 when a device connects to the network.
  • the device transmits a first availability message to registry server 110.
  • the first availability message may contain an identifier for the device, a list of namespaces supported and a current IP address of the device.
  • the namespaces supported may be determined from a configuration file or some other data structure storing the list that is read by the network device when generating the message.
  • the IP address of a device is often not static and is assigned at the time of connection to the network.
  • registry server 110 requires both the IP address and device identifier to facilitate transfers of data between the client system 115 and the network devices.
  • a network may periodically transmit subsequent availability messages to registry server 110 to verify that the device is still connected to the cloud computing network. These subsequent availability messages may include any information
  • step 530 the device performs an authentication process initiated by registry server 110 in response to receiving the availability message.
  • Process 500 ends after step 540 in which the device receives an acknowledgment from registry server 110 in response to a successful authentication.
  • the acknowledgement may include data needed for communications with registry server 110 and client system 115 over the network.
  • the acknowledgement message may include secret keys, encryption/decryption or other data depending on the protocol for communication between the network devices, client systems, and registry server 110.
  • Figure 6 illustrates a flow diagram of process 600 performed by a registry server to register a network device connecting to the network.
  • Process 600 begins in step 610 when registry server 110 receives an availability message from a device connected to the network.
  • registry server 1 10 performs an authentication process with the network device in step 620.
  • registry server transmits an acknowledgment to the network device in step 630.
  • the acknowledgement may include data needed for communications with registry server 110 and client system 115 over the network.
  • the acknowledgement message may include secret keys, encryption/decryption or other data depending on the protocol for communication between the network devices, client systems, and registry server 110.
  • registry server 110 determines the namespaces that may store data in device in step 640. The identifier, list of namespaces, and IP address of the device are then stored in memory in step 650 for use in transmitting data between the client systems and network devices in accordance with this invention. Process 600 then ends.
  • Figures 7 and 8 illustrate flow diagrams of the processes performed by client system 115 and registry server 110 to register a client system.
  • Figure 7 illustrates process 700 that is a process performed by a client system to register with registry server 110 in accordance with an embodiment of this invention.
  • Process 700 begins in step 710.
  • the client system transmits a connection request to registry server 1 10.
  • registry server 110 performs an authentication process with the client system in step 720. Any number of authentication processes may be used and the exact authentication process used is a design choice left to one skilled in the art.
  • the client system receives network information including data needed for communications between registry server 110 and client system 1 15 over the network in step 730.
  • the network information may include secret keys, encryption/decryption or other data depending on the protocol for communication between the network devices, client systems, and registry server 1 10.
  • process 700 ends.
  • FIG. 8 illustrates process 800 performed by registry server 110 in response to a connection request received from a client system in accordance with an embodiment of this invention.
  • Process 800 begins in step 810 when a connection request is received from a client system.
  • registry server 110 performs an authentication process with the client system in step 820. If the authentication is successful, registry server retrieves network information stored in memory in step 830.
  • the network information may include data needed for communications between registry server 110 and client system 115 over the network.
  • the network information may include secret keys, encryption/decryption or other data depending on the protocol for communication between the network devices, client systems, and registry server 110.
  • the network information is then transmitted to the client system in step 840 and process 800 ends.
  • FIG. 9 illustrates a process performed by a client system to write a set of data to the network devices in accordance with an embodiment of this invention.
  • client system 115 performs process 900.
  • Process 900 begins in step 910 by transmitting a write request to registry server 110.
  • the write request should preferably include an identifier for the set of data, a namespace in which the set of data is to be stored.
  • the write request may also include the size of the set of data if known at the time of the write operation.
  • client system 115 receives write information from registry server 110 in step 920.
  • the process performed by registry server 110 to provide the write information is described below with respect to Figure 10.
  • the write information includes a list of network devices available to store the data and the IP address for communicating with the devices.
  • the write information may further include information for generating the slices of data, such as conversion transform matrix information, and/or padding information for adding padding data to the set of data.
  • client system 115 generates slices of data to store on the network devices. Slices of data are generated to allow the data to be stored on multiple devices to reduce the amount of data stored on a single device as well as to add security for the data by preventing any one network device to store all of the data in a particular set of data.
  • the simplest manner of generating slices is to divide the set of data into portions including a specified amount of data.
  • the drawback with this method is that all of the slices are needed to reconstruct the data.
  • the slices are generated in such a manner that each slice is a representation of the data from the data set. As such, only a specified number of the slices are needed to reconstruct the data.
  • each slice is transmitted to one of the network devices in step 940.
  • the network devices that receive each slice are selected from the available network devices provided in the write information received from registry server 110.
  • the network devices are selected such that the slices are well dispersed such that no one device stores more than M slices where N is the total number of slices; and M is a number less than N and is the minimum number of slices needed to reconstruct the set of data.
  • N is the total number of slices
  • M is a number less than N and is the minimum number of slices needed to reconstruct the set of data.
  • the storage information for the set of data that will be transmitted to registry server is then generated in step 950.
  • the storage information includes information regarding the set of data including an identifier, a namespace size and padding, , information regarding the generation of the slices, and information about each slice.
  • the information about each slice includes an identifier of the slice, the position of the slice in the data, the device storing the slice, and error detection information about the slice.
  • the error detection information is stored on the registry instead of being transmitted with the slices as is common in the art. The error detection information will be described in more detail below.
  • FIG. 10 illustrates process 1000 performed by registry server 110 in response to receiving a write request from a client system in accordance with an embodiment of this invention.
  • Process 1000 begins in step 1010 when a write request is received from the client system.
  • registry server 110 reads the namespace identifier for the set of data from the request in step 1020.
  • Registry server 110 searches the list of available devices stored in memory to determine each of the devices available to store information for the namespace and other namespace information in step 1030.
  • Write information to transmit to the client system is then generated in step 1040.
  • the write information includes a list of devices available to store data for the namespace as well as other information for communicating with the devices in the list.
  • Registry server 110 may return a list including each device found that is available to store data for the namespace. Alternatively, registry server 110 may perform a selection algorithm to only select a portion of the devices available to include in the list. In other embodiments, all of the network devices that may store data for the namespace are returned along with usage information to allow a client system to perform load balancing analysis and/or assignment of the network devices. The exact method of selection of the devices is left as a design choice to those skilled in the art. Process 1000 then end after step 1050 with the transmission of the write information to the client system.
  • FIG 11 illustrates a flow diagram of process 1100 performed by registry server 110 in response to receiving storage information for a set of data in accordance with an embodiment of this invention.
  • Process 1100 begins in step 1110 when storage information for a set of data is received from a client system.
  • Registry server 110 then stores the information in step 1120 and process 1100 ends.
  • registry server Preferably, registry server generates a new Filelnfo entry 405 in master table 400 ( Figure 4) and populates the fields with the storage information received.
  • other storage methods may be used without departing from this invention.
  • the set of data should be divided into slices. More preferably, slices of data should be generated that represent portions of the data from the data set such that only a portion of the slices are needed to reconstruct the data set.
  • One method of generating the slices is an Information Dispersal Algorithm (IDA) such as the algorithm described by Rabin in US Patent number 5,485,474.
  • IDA Information Dispersal Algorithm
  • the set of data is represented by N slices and only M slices are required for reconstruction of the original set of data.
  • a conversion transform matrix of N rows and M columns is used to perform the transformation of the set of data into the N slices. N and M are selected such that N>M.
  • the set of data is divided into X consecutive portions of data of M length.
  • X is determined by dividing the total file size, T, divided by M and rounding up to account for an incomplete portion that may be padded with data.
  • T N
  • F is the original set of data where the data is divided into portions represented as the rows and b, is a byte array of F.
  • F Cb t ,b 2 ,b 3 , ⁇ b m l (b m+11 b m+2 b 2m ).. ⁇ b N-m+l.—
  • Each row of C corresponds to a slice of data to be stored. Reconstruction of the data will be discussed below with regards to Figure 14 illustrating a process for reading the data.
  • FIG. 12 illustrates a process performed by the client system to generate slices to be stored using the IDA described above in accordance with one embodiment of this invention.
  • Process 1200 begins in step 1210 by determining a conversion transform matrix.
  • the conversion transform matrix is read from the write information received from registry server 110.
  • the write information may only contain parameter information and the client system performs a process to generate the conversion transform matrix.
  • registry server may store more than one conversion transform matrix and provides one conversion transform matrix to the client system for a specific set of data. However, the exact manner in which the conversion transform matrix is provided is left as a design choice of those skilled in the art.
  • the conversion transform matrix is an N by M matrix.
  • N is the number of slices that are to be produced and M is the number of slices required to reconstruct the entire set of data.
  • N is greater than M.
  • the ratio of N to M may be much larger if the data is to be made available from a much smaller set of slices. However, the ratio of N to M may be smaller if more security is desired.
  • the size of the matrix is 10 by 7 for the slices to be manageable. However, it is left as a design choice as to the exact size of the conversion transform matrix and the value of each element in the matrix.
  • the client system receives the set of data to store.
  • the client system divides the set of data into X consecutive portions of data of M length.
  • M is determined by the number of slices needed to reconstruct the data and
  • X is determined by dividing the Total size of the set of data by M (T/M) that is rounded up to the next whole number to account for a padded portion.
  • T/M Total size of the set of data
  • the process of generating the slices begins in step 1220 by reading an x th one of the
  • the selected x th portion is multiplied by the conversion transform matrix to generate the x th resulting vector in step 1240.
  • the 1st portion is multiplied by the conversion transform matrix to generate the first resulting vector and the 2nd portion is multiplied by the conversion transform matrix to generate the second resulting vector.
  • Each y th element of the X th resulting vector is then read from the x th resulting vector and stored in the x th position of the y th slice in step 1250.
  • the 1st element from the first resulting vector is stored in the 1st position of the 1st slice
  • the 2nd element of 1st resulting vector is stored in the 1st position of the 2nd slice
  • the 3rd element of the 1st resulting vector is stored in the 1st position iof the 3rd slice etc.
  • the error detection data for each slice is generated in step 1270.
  • the error detection data for each slice is then inserted into the storage information to be sent to registry server 110 in step 1280.
  • the error detection data for each slice is a hash.
  • the error detection information is generated by a keyed hash.
  • the conversion transform matrix and/or information for generating the conversion transform matrix is then stored in the storage information to be sent to registry server 110 in step 1290 and process 1200 ends.
  • step 1290 may be omitted if registry server 110 provides the conversion transform matrix as in the described embodiment.
  • FIG. 13 illustrates a process for transmitting the slices to the network devices performed by the client system in accordance with an embodiment of step 940 of process 900.
  • the data for each slice is maintained in a buffer providing a queue and portions of the slice are periodically transmitted to each of the selected network devices.
  • the data for each slice is written to the buffer as the data for each slice is determined from the X portions of data.
  • the data for each slice may be streamed to the network devices as the data for each slice is generated.
  • Process 1300 begins in step 1305 by selecting network device to receive each of the slices.
  • the devices are selected from the list of available devices transmitted to the client system in storage information from registry server 110.
  • a slice identifier may also be generated at this time and stored in the storage information.
  • the slice identifier is unique and does not give an indication of the ordering of the slices.
  • a GUID of the slice is used as the identifier.
  • other naming conventions can be used without departing from this invention.
  • a slice is then selected in step 1310.
  • the buffer storing the slice data in the slice is read.
  • the client system determines if a minimum amount of data is available in step 1330. The minimum amount of data may be one byte or more depending on the requirements of the system and is left as a design choice. If there is a minimum amount of data, the data for the slice is transmitted to the device is step 1340. If not, the process repeats from step 1310 to select another slice. After the portion of the slice is transmitted, the client system waits to receive an acknowledgement from the network device selected to store the slice in step 1350. If no acknowledgement is received, an error recovery process is performed in step 1360.
  • the error recovery process may be to transmit the portion or the entirety of the slice to another available device; or may require regenerating the slice from the data for re-transmitting. Other error recovery methods may be used without departing from this invention. Otherwise, client system determines if the entire slice has been sent in step 1370. If the entire slice has been sent, the client system stores the identifier of the network device storing the slice in the storage information in step 1380. If the entire slice has not been sent, the process is repeated from step 1310. After step 1380, the client system determines if transmission of all of the slices is complete in step 1390. If not, process 1300 is repeated from step 1310. Otherwise, process 1300 ends.
  • FIG. 14 illustrates process 1400 performed by a client device to retrieve a portion of data for a set of data from stored on the network.
  • Process 1400 begins in step 1405 by receiving a request to retrieve a section of a set of data stored by devices in the network.
  • a section means any subset of the entire set of data including the entirety of the set of data.
  • the client system requests storage information for the set of data from registry server 110 in step 1410.
  • the client receives the storage information from the registry server in step 1420.
  • the storage information includes all of the information necessary to retrieve each of the slices and re-generate the data.
  • This information may include the conversion transform matrix or information to construct the conversion transform matrix used to generate the slices; the device storing each of the slices; and error detection information for each of the slices.
  • the identity and IP address of each device storing each slice and any other information needed to access the slice is read from the storage information in step 1430.
  • the elements of the slices needed to re-construct the section of data requested are determined.
  • the elements of the slices required to re-construct the data are determined by determining each x th one of the X consecutive portions of M length of the set of data that include the requested data and requesting each of x th element of the slices.
  • X is determined by dividing the total amount, T, of data in the set by M (T/M) and M is the minimum number of slices to reconstruct the data.
  • the x th portions needed are determined by determining the section of the data that include the offset of the requested data. For example, a file has a total length of 100 bytes and M is 5; and the requested offset is bytes 23-43.
  • the 5th-9th portions are needed as bytes 23-25 are in the 5th portion, bytes 26-30 are in the 6th portion, 31-35 are in the 7th portion, bytes 36-40 are in the 8th portion and bytes 41-43 are in the 9th portion.
  • the 5th-9th elements of M slices are required.
  • the client system then generates a request for the required elements of each slice and transmits the requests for each slice to each network device storing each slice in step 1440.
  • the client system then receives the requested elements of the slices from the network devices in response to the request in step 1450 and uses the error detection information to determine elements of the slices received are correct.
  • the client system determines whether at least a minimum number, M, slices are correctly received where M is the minimum number of slices needed to re-construct the data. If not, an error recovery process may be performed in step 1470. Process 1400 then either repeats from step 1440 after the error recovery process is performed or ends.
  • the identifier and/or position of each of the M slices in the N total number of slices is determined in step 1475.
  • An inverse conversion transform matrix is then generated based upon the slices received in step 1480.
  • the inverse conversion transform matrix is generated by selecting each row corresponding to the position of an individual slice in the M slices from the conversion transform matrix and to form a M x M sub-matrix and generating the inverse matrix of the sub-matrix. For example, if the 2nd, 4th, and 6th slices are received, the 2nd, 4th, and 6th rows of the conversion transform matrix are selected and an inverse matrix is generated from the matrix including these rows.
  • Re-assembly of the set of data is performed in step 1490.
  • the data may be reassembled by multiplying each of m slices by an inverse matrix and determining each of the elements of the original X consecutive portions of data including the requested data. If the X consecutive portions include the portion of the set of data, the padding added to the end of the last portion is then removed based upon the padding information from the received storage information in step 1490 and process 1400 ends.
  • Figure 15 illustrates process 1500 performed by registry server 110 to respond to a read request received from a client system in accordance with an embodiment of this invention.
  • Process 1500 begins in step 1510 by receiving the read request from the client.
  • the registry server then reads the identifier for the set of data from the request in step 1520 and retrieves the storage information for the identified set of data in step 1530. This may be done by reading the Filelnfo entry 405 for the identified set of data from master table 400 maintained in a volatile memory in accordance with some embodiments of this invention.
  • the storage information to transmit to the client system including the necessary information for retrieving the data is then generated from the read data in step 1540 and transmitted to the client in step 1550.
  • Process 1500 then ends.
  • Figure 16 illustrates a flow diagram of process 1600 performed by a client system to append data to the end of a set of data stored on the network devices in accordance with an embodiment of this invention.
  • process 1600 is that the entirety of the set of data does not need to be retrieved and/or stored in memory. This reduces the memory footprint in the client system and reduces network traffic.
  • Process 1600 is performed after the client system has re-constructed at least the last of the X consecutive portions of the data set from the relevant portions of the slices read from the network devices using a read process, such as process 1400 described above and shown in Figure 14. As described in process 1400, any padding added to the last portion is removed during the read process. Otherwise, process 1600 must includes a process for removing the padding.
  • Process 1600 begins in 1602 by determining whether the last portion of data includes padding. This may be done by determining whether the amount of data in the last portion is equal to M, where M is the minimum number of slices needed to re-construct data from the set of data. Alternatively, the storage data may be read to determine if padding information is included for the set of data. If the last portion does not include padding, a conventional write operation, such as the operation described in process 1200, is performed on the appended data adding elements to the end of each of the N slices in step 1603. If the last portion includes padding, process 1600 continues by determining whether the amount of data to be appended is less than the amount of padding added to the end of the file in step 1605.
  • the amount of padding added to the file is determined from the padding information read from storage information received by the client system from registry server 110. If the amount of data to append is greater than the amount of padding added, an amount of data equal to the amount of padding is read from the beginning of the appended data and added to the last of the X consecutive portions of data in step 1610. A normal write operation is then performed for the remainder of the data to append in step 1615. If the amount of data to append is less than or equal to the amount of padding, the data to be appended is added to the end of the last portion in step 1620.
  • step 1630 the last portion with the appended data formed in either step 1620 or 1615 is multiplied by the conversion transform matrix, used to generate the original slices as a vector, to generate an append vector.
  • the client system reads the matrix from the storage data received from registry server 110. In other embodiments, the client system may generate the conversion transform matrix from matrix information read from the storage information.
  • the y th element from the append vector is read. For example, the 1st element is read in a first iteration, the 2nd element is read in the second iteration, etc.
  • the storage information is then read to determine the network device storing the y th slice in step 1650.
  • a request to store the y th element as the current last element of the y th slice is transmitted to the network device storing the y th slice.
  • the 1st element of the result vector is stored as the current last element of the 1st slice
  • the 2nd element of the result vector is stored as the current last value of the 2nd slice, etc.
  • the request includes an offset indicating the position of the current last slice in the stored slice.
  • error detection information for the y th slice is generated using the y th element as the last element in step 1670 and updated to the storage information for the y th slice.
  • the storage information is transmitted to the registry in step 1680.
  • the client system determines whether all of the elements of the result vector have been transmitted to the N slices in step 1690. If not, process 1600 Is repeated from step 1640 for the next y th element of the result vector. If all of the elements have been transmitted, process 1600 ends.
  • FIG. 17 illustrates update process 1700 performed by a client system to update the data in the set of data stored on the network devices in accordance with an embodiment of this invention.
  • Process 1700 is performed after the client system has re-constructed the relevant portions of the data from the relevant portions of the slices read from the network devices using a read process, such as process 1400 described above and shown in Figure 14; and these portions of data have subsequently been amended in some way.
  • the client system maintains the storage information received from the registry server 110 in memory. However, if the storage information is not maintained in memory, the client system must transmit a request and receive the storage information for the set of data from registry server 110.
  • Process 1700 begins in step 1710 by determining each one of the X consecutive portions of the set of data that include updated data.
  • these portions may be determined by looking for changes in data in each M th offset of data, counting the offsets and recording each offset containing updated data where M is the minimum number of slices needed to reconstruct the data.
  • the 3rd and 4th of the X consecutive portions of the set of data contain amended information.
  • the y one of the X portions containing updated data is selected.
  • the 3rd of the X portions is selected.
  • the y th portion is multiplied by a vector with the conversion transform matrix to generate an update vector.
  • the conversion transform matrix is read from the data storage information.
  • the conversion transform matrix may be produced by the client system from information read from the storage information received from registry server 1 10. An x th element is then obtained from the update vector in step 1730.
  • the update vector is generated by multiplying the 3rd portion and the conversion transform matrix. The client system then selects the first element of the update vector.
  • the network device storing the x th slice is then read from the storage information in step 1740.
  • the 1st slice is read.
  • the client system then transmits a request to network device storing the x th slice in step 1750.
  • the request indicates that the x th element is to be stored at an offset for the y th element of the slice.
  • the request indicates the 1st element of the update vector is to be stored as the 3rd element of the 1st slice.
  • the error detection information for the x th slice with the xth element from the update vector as the y th element is then generated in step 1760.
  • the error detection information is then transmitted to registry information to replace the error detection information for x th slice in step 1770.
  • the client system determines whether all of the elements in the update vector have been transmitted in step 1780. If not, process 1700 is repeated from step 1730 until all of the elements of the update vector have been transmitted. Otherwise, the client system determines whether all of the portions including updated data have been processed in step 1790. If not, process 1700 is repeated from 1715 for each of the remaining portions. If so, process 1700 ends.
  • Process 1800 is a flow diagram of a process performed by registry server 110 to update information in storage information in accordance with an embodiment of this invention.
  • Process 1800 begins in step 1810 by receiving updated storage information from a client system.
  • registry server 110 reads the information from the update request and updates the proper field in the proper Fileinfo entry 405 in step 1820.
  • Process 1800 then ends.
  • the above is a description of embodiments of a method and system for storing data in a cloud computing network. It is expected that those skilled in the art can and will design alternative embodiments of this invention as set forth in the following claims.

Abstract

This invention relates to a system for storing data on a plurality of device in a cloud network. The system comprises a server, a processing unit and a media readable by the processing unit. Instructions for directing the processing unit are stored in the media. The instructions comprises receiving data storage information for a plurality of slices of data representing a set of data wherein the data storage information includes an identifier of each of the plurality of slices, an identifier of one of said plurality of devices in said cloud network storing the data, and error detection data for each of the slices, and storing the data storage information in a memory.

Description

METHOD AND SYSTEM FOR STORING DATA IN A CLOUD NETWORK
Field of the Invention
This invention relates to processing systems connected to a cloud computing network. More particularly, this invention relates to storing data from a client computer to multiple devices connected to the cloud computing network. Still more particularly, this invention relates to securely storing data from a client computer to multiple devices connected to the cloud computing network in a manner that prevents unauthorized users from obtaining the data and assures that the data will be available if one or more of the devices storing data is not connected to the network.
Prior Art
"Cloud" computing and/or "cloud" networking is common way of maximizing the use of computer resources for networked computer systems. In a cloud network, the computer system or server providing a resource is not static and as such is assigned at the time a client system requests a resource. One of the problems in storing data using devices in a cloud network is that the network configuration changes dynamically. In other words, devices may connect or disconnect from the system periodically due to communication problems, network policy, routine maintenance, or many other reasons. As such, there is no guarantee to a client system that data stored by one device will be available when the client needs to read the stored data at a later time. To address availability of the data, cloud computing network providers typically store three or more copies of the data in its entirety on separate network devices. Furthermore, there are only a limited number of ways to secure the data from being read and used by unauthorized parties. This typically requires storing an encrypted version of data. Although modern encryption is known to be difficult to break, there is the possibility that the encryption key can be compromised. Once encryption is broken into, the entire data is exposed.
One method, described in US Patent number 5,485,474 issued on 16 January 1996 to Rabin, for storing portions of data on multiple devices is to transform the data to generate n slices of data such that m slices of the data may be used to reconstruct the original data where n>m. The n slices of data are generated by dividing the data into x consecutive portions of m length where x may be any number. Each of the x portions is then multiplied by an n by m transform matrix to generate a vector. Each of the y elements of the resulting vector is stored in an xth position of the yth slice. For example, the 2nd portion of data is multiplied by the transform matrix to generate a 2nd resulting vector. The first element of the 2nd resulting vector is placed in the second position of the first slice, the second element of the 2nd resulting vector is placed in the second position of the second slice, etc. until all y elements are placed in a slice. The generated slices are then stored on different devices connected to the network. The original data may then be obtained by receiving m slices from the connected devices and multiplying the slices by an inverse transform matrix to determine the data in the original x portions of data. This process works well for ensuring that data is available if one or more of the devices storing data are disconnected from the network. However, this method does not prevent an unauthorized party from accessing the slices and obtaining the data.
US Patent Number 7,574,579 issued on 11 August 2009 to Gladwin et al. discloses a system for storing data in a dispersed storage system. In the described system, the data is transformed into multiple slices of data and each of the slices is stored in a separate device. A management system then stores metadata about the stored data in a database for use in maintaining the system. US Patent Publication 2010/0306578 published on 2 December 2010 in the name of Thornton et al. discloses a system in which data stored in slices on multiple devices connected to a network is re-assembled and checked using a checksum stored on the device along with each data slice. One problem with the above systems is that the slices may include data that can be used to re-assemble the data. Thus, an unauthorized party may be able to reconstruct the data from obtaining some or all of the slices. Further, as error detection data such as a checksum are stored with the data, a change to the original data and error detection data by an unauthorized party may corrupt the data with a user being aware of the corruption.
A second problem is the updating of stored data. In order to store modified data, all of the entirety of the data must be re-assembled and new slices representing the changed data must be generated. For a large file, the requirement of re-assembling the data and generating new slices may cause unacceptable delays in both reading and updating of the file.
Summary of the Invention
The above and other problems are solved and an advance in the art is made by the method and system for storing data over a cloud network in accordance with the present invention. A first advantage of a method and system in accordance with this invention is that data may be securely stored on multiple devices connected to a network. A second advantage of a method and system in accordance with this invention is that the data stored may be updated without the need to re-assemble and re-generate the whole data from the stored slices. The method and system for storing data in accordance with this invention are provided in the following manner. In accordance with an embodiment of this invention, a registry server receives data storage information for a plurality of slices of data representing a set of data. The data storage information includes an identifier for each of the slices, an identifier of the device in the cloud network storing the data, and error detection data for each of the slices. The registry server stores the data storage information in a memory.
In accordance with some embodiments of this invention, the registry server also maintains a list of devices connected to the cloud network that are available to store data. In accordance with some of these embodiments, the registry server receives an availability message from a network device indicating the device is available to store data; and stores an identifier of the device in the list of available devices in response to receiving the availability message. In accordance with still further embodiments, the registry server reads a list of namespaces from the availability message. The list of namespaces identifies each namespace that may store data on the device that transmitted the availability message. For purposes of this discussion, a namespace is an identifier for a collection of a plurality of sets of data. The registry server stores an indication in memory of each namespace in the list that may store data on the network device. In accordance with other of these embodiments, registry server determines the Internet Protocol (IP) address of the device sending the message and stores the IP address in memory.
In accordance with some embodiments of this invention, the registry server performs an authentication process with the network device in response to receiving the availability message and transmits an acknowledgement to the device in response to the device being successfully authenticated. In accordance with some of these embodiments, the registry server transmits communication information to the device in response to a successful authentication. The communication information includes information needed to communicate with client devices. In accordance with some of these embodiments the communication information includes a secret key and/or an encryption key. In accordance with some embodiments, a network device may perform the following process. The network devices transmit an availability message, perform authentication with the registry server, and receive communication information in response to a successful authentication process. In accordance with some of these embodiments, the availability message includes a list of namespaces. The list of namespaces includes identifiers of those namespaces that may store data on the device. In accordance with other of these embodiments, the availability message includes an identifier of the device and an IP address of the device.
In accordance with some embodiments of this invention, the registry server performs the following process when a client system performs a write operation to store data to the system. The registry server receives a write data request from the client system to store a set of data. In response to the write request, the registry server transmits write information to the client system. The write information includes information about each network device in a group of devices that are available to store the set of data. In accordance with some of these embodiments, the registry server determines the group of devices available to store the set of data in response to receiving the write request. In accordance with further of these embodiments, the registry server determines the group of devices by reading a namespace for the set of data from the write request. The server then determines each of the devices in the list of devices that is available to store data for the namespace from information stored in memory. Each of the devices determined to be available to store data for the namespace is then added to the group.
In accordance with some embodiments of this invention, a client system performs the following process to store data on devices in the network. The client system receives a request to store data. In response to the request, the client system generates a write request and transmits the request to the registry server. In response to the write request, the client system receives write information from the registry server. The write information includes information for each device in a group of devices that are available to store the set of data. The client system then generates a plurality of slices that represent the data in the set of data. Each of the slices is then transmitted to one of the network devices selected from the group of devices. The storage information for the set of data including information about each of the slices is generated and transmitted to the registry server.
In accordance with some embodiments of this invention, the slices are generated by applying Rabin's information dispersal algorithm to the data in the set of data. In accordance with some of these embodiments, the slices are generated in the following manner. The client system divides the data from the set of data into X consecutive portions of M length where M is the minimum number of slices needed to re-construct data and X is determined by dividing the total amount of data in the set by . Each of X consecutive portions of data is then multiplied by a conversion transform matrix to determine a resulting vector y elements in length where y =N. The conversion transform matrix is N x M in size and N > M. The client system then inserts each yth element of each of the x resulting vectors in xth position of yth one of the slices. In further of these embodiments, error detection information is then determined for each slice and the determined error detection information is inserted into the storage information to be transmitted to the registry server. In accordance with some of these embodiments, the conversion transform matrix is generated from the client system using matrix information read from the write information received from the registry server. In other of these embodiments, the conversion transform matrix is read from the write information received from the registry server. In accordance with some of these embodiments, padding is added to the end of the last of the x portions of data to make the last portion M length and the client system inserts an indication of the padding added into the storage information to be sent to registry server.
In accordance with some of these embodiments, the client system transmits each of the slices to one of the network devices and inserts an identification of the network device storing each of the slices of data into the storage information transmitted to the registry server. The client may select the network device that receives each of the slices from the list of available devices read from the write information received from the registry server. In accordance with some embodiments, the client system receives an acknowledgment from a network device in response to transmitting a slice to the device. In accordance with some of these embodiments, the client system performs an error recovery in response to the acknowledgement from the device not being received by the client system.
In accordance with some embodiments, the client system determines when a specified amount of each slice has been generated and transmits the specified amount to the selected network device and repeats the process until all of the data of each slice is transmitted.
In accordance with some embodiments, the client system reads a section of the set of data from the network devices in the following manner. The process begins by the client system receiving a request to retrieve a section of the set of data. In response to the request, the client system generates a read request that includes an identifier for the set of data. The read request is then transmitted to the registry server. In response to the read request, the client system receives storage information for the set of data from the server. The storage information includes information for each of the slices including the network device storing a slice, an order of the slice, and error detection information for the slice. The client system reads the information for each slice from the storage information and transmits a request for each slice to each network device storing each slice. The request includes the portions of the slices needed to re-construct the requested data. In response to the request, the client system receives the requested portions of the slices from the devices in the network. The client system then determines whether portions from a pre-determined number of slices are received in response to the request. If portions from the pre- determined number of slices have been received, the set of data is re-assembled from the slices. In accordance with some embodiments, padding is removed from re-assembled data.
In accordance with some embodiments of this invention, the client system reassembles the data in the following manner. The client system reads matrix information from the storage information received from the registry servers and generates an inverse conversion transform matrix from the information. Each of the received slices is then multiplied by the inverse conversion transform matrix to obtain each of the X consecutive portions of data of the set of data that include the requested section of data. In accordance with some of these embodiments, error detection information for each slice is read from the storage information read from the registry server and error checking of portion of data received for each slice is performed in response to receiving the portions of each slice from the network devices. In accordance with some, of these embodiments, a slice is disregarded if an error is detected and more than the pre-determined number of slices have been received. Otherwise, an error recovery process may be performed by the client system.
In accordance with some embodiments of this invention, the client system store data appended to the end of the set of data in the following manner. The client receives a request to append the data to the end of the set of data. The client system then generates a storage request and transmits the request to the registry server. The client system then requests the portion of each of the slices that can be used to generate the last of the N portions of the set of data. The client system then assembles the last of the N portions of data and removes the padding. The client then determines if the data to be appended is less than the amount of padding. If so, the client system adds data to append to the end of the data in the last of N portions to obtain a new last portion. Otherwise, a first portion of the appended data equal to the amount of padding is added to the last portion and a normal write operation is performed for the remainder of the appended data. The client system then multiplies the new last portion by a conversion transform matrix to determine an append vector. The client then determines each device storing each of the slices and transmits each of element of the append vector to the device storing to a corresponding slice with an offset for the last position in the slice. The client system then generates error detection data for each of the updated slices and transmits updated storage information to insert the error detection data in the last position for the error detection data of each of the slices stored by the registry server.
In accordance with these embodiments, the registry server receives the error detection information for each of the updated slices, and updates the error detection information of each of the corresponding slices with the error detection information of the element as the last element of the corresponding slice.
In accordance with some embodiments of this invention, the client system updates data in the set of data in the following manner. The client system receives a request to update a portion of data in the data set. The client system then determines each of the consecutive portions of M length that include the updated data where M is minimum number of slices needed to reconstruct the data from the data set. Each of the consecutive portions including updated data is then multiplied by a conversion transform matrix to determine an updated vector. The client system also requests storage information from the registry server. In response to receiving the storage information, the client system reads the storage information for each of the devices storing each of the slices. The client system then transmits a request for each element of each updated vector to a device storing in corresponding slice for each element to store the element in the proper position of the slice. The client system then generates the error detection information for each of the updated slices and transmits the error detection information to registry server to update the error detection information stored for the updated slices. The registry then updates the storage information accordingly.
Brief Description of the Drawings
The above and other features and advantages in accordance with this invention are described in the following detailed description and are shown in the following drawings:
Figure 1 illustrating a client system and registry server connected to a cloud computing network in accordance with an embodiment of this invention; Figure 2 illustrating an exemplary processing system such as those in a client system, registry server, and network devices in accordance with the shown embodiment of this invention;
Figure 3 illustrating a block diagram of applications being executed by a client computer in accordance with an embodiment of this invention;
Figure 4 illustrating a representation of the information stored by a registry server in accordance with an embodiment of this invention;
Figure 5 illustrating a flow diagram of a process performed by a network device to register with a registry server in accordance with an embodiment of this invention;
Figure 6 illustrating a flow diagram of a process performed by a registry server to register a network device as available to store data in accordance with an embodiment of this invention;
Figure 7 illustrating a flow diagram of a process performed by a client system to register with a registry server in accordance with this invention;
Figure 8 illustrating a flow diagram of a process performed by a registry server to register a client system in accordance with this invention;
Figure 9 illustrating a flow diagram of a process performed by a client system to store data from a data set to devices connected to the network in accordance with an embodiment of this invention;
Figure 10 illustrating a flow diagram of a process performed by a registry server to respond to request to store data received from a client in accordance with an embodiment of this invention;
Figure 11 illustrating a flow diagram of a process performed by a registry server to store information for a set of data stored by devices connected to the network in accordance with an embodiment of this invention;
Figure 12 illustrating a flow diagram of a process performed by a client system to generate slices representing the set of data to be stored by the network devices in accordance with an embodiment of this invention;
Figure 13 illustrating a flow diagram of a process performed by a client system to store slices representing the data to network devices in accordance with an embodiment of this invention;
Figure 14 illustrating a flow diagram of a process performed by a client system to read a set of data stored on network devices in accordance with an embodiment of this invention; Figure 15 illustrating a flow diagram of a process performed by a registry server to provide a client system with storage information to read a set of data stored on network devices in accordance with an embodiment of this invention;
Figure 16 illustrating a flow diagram of a process performed by a client system for appending data to an end of a set of data stored on the network devices in accordance with an embodiment of this invention;
Figure 17 illustrating a flow diagram of a process performed by a client system to update data in a set of data stored on the network devices in accordance with an embodiment of this invention; and
Figure 18 illustrating a flow diagram of a process performed by a registry server to update information for a set of data stored by devices connected to the network in accordance with an embodiment of this invention.
Detailed Description
This invention relates to processing systems connected to a cloud computing network. More particularly, this invention relates to storing data from client computer to multiple devices connected to the cloud computing network. Still more particularly, this invention relates to securely storing data from a client computer to multiple devices connected to the cloud computing network in a manner that prevents unauthorized users from obtaining the data and assures that the data will be available if one or more of the devices storing data is not connected to the network.
This invention relates to storing data to devices in a cloud computing network. For purposes of this discussion, a cloud computing network is a group of processing devices communicatively connected over a network such as the Internet to share resources. In accordance with this invention, a user may not be a proprietor of the network and/or may not have control of network resources and/or of accessibility of data stored by the network resources. Further, a user in accordance with this invention may use resources from one or more cloud computing networks.
Figure 1 illustrates an embodiment of this invention in which client system 115 of a user communicatively connects to a cloud computing network 100 and a registry server 110. Client system 115 is a typical processing system such as a desktop computer, laptop computer, or other computer terminal that connects to network 100 via a conventional wire connection, wireless connection or any other method. Client computer 115 executes applications that perform the client system processes in accordance with this invention. One skilled in the art will recognize that although only one client device 115 is shown, any number of clients systems may be connected without departing from this invention. Furthermore, the client device may be connected within a cloud computing network without departing from this invention. Registry server 110 is a processing system that connects to network 100 via a conventional wire connection, wireless connection or any other method. Registry server 110 executes applications for managing the storage of data from client system 115 to devices in network 100 in accordance with embodiments of this invention. Preferably, communications between devices in network 100 and registry server 110 are protected by SSL. However, other means for securing transmissions may be used without departing from this invention.
Cloud network 100 includes network devices 130-132 and 135-137. Network devices 130-132 and 135-137 are processing systems that provide resources to clients systems, such as client system 115, over network 100 and are connected to network 100 via a conventional wire connection, wireless connection or any other method. Resources may include but are not limited to storage, processing time, and applications. Network devices 130-132 and 135-137 may be connected as separate systems in network 100 or connected as parts of separate cloud computing networks. As shown in Figure 1, network devices 130- 132 are connected in first cloud computing network 120 and network devices 135-137 are connected in second cloud computing network 125. One skilled in the art will recognize that the exact number and configurations of network devices in a network; and the exact number and configurations of cloud computing networks in network 100 are design choice left to those skilled in the art. Figure 2 illustrates an exemplary processing system 200 that represents the processing systems in registry server 110; client system 115; and processing systems 130- 132 and 135-137 that execute instructions to perform the processes described below in each system in accordance with this invention. One skilled in the art will recognize that the instructions may be stored and/or performed as hardware, firmware, or software without departing from this invention. One skilled in the art will recognize that the exact configuration of each processing system may be different and the exact configuration executing processes in accordance with this invention may vary and processing system 200 shown in Figure 2 is provided by way of example only. Processing system 200 includes Central Processing Unit (CPU) 205. CPU 205 is a processor, microprocessor, or any combination of processors and microprocessors that execute instructions to perform the processes in accordance with the present invention. CPU 205 connects to memory bus 210 and Input/Output (I/O) bus 215. Memory bus 210 connects CPU 205 to memories 220 and 225 to transmit data and instructions between the memories and CPU 205. I/O bus 215 connects CPU 205 to peripheral devices to transmit data between CPU 205 and the peripheral devices. One skilled in the art will recognize that I/O bus 215 and memory bus 210 may be combined into one bus or subdivided into many other busses and the exact configuration is left to those skilled in the art.
A non-volatile memory 220, such as a Read Only Memory (ROM), is connected to memory bus 210. Non-volatile memory 220 stores instructions and data needed to operate various sub-systems of processing system 200 and to boot the system at start-up. One skilled in the art will recognize that any number of types of memory may be used to perform this function. A volatile memory 225, such as Random Access Memory (RAM), is also connected to memory bus 210. Volatile memory 225 stores the instructions and data needed by CPU 205 to perform software instructions for processes such as the processes for providing a system in accordance with this invention. One skilled in the art will recognize that any number of types of memory may be used to provide volatile memory and the exact type used is left as a design choice to those skilled in the art.
I/O device 230, keyboard 235, display 240, memory 245, network interface 250 and any number of other peripheral devices connect to I/O bus 215 to exchange data with CPU 205 for use in applications being executed by CPU 205. I/O device 230 is any device that transmits and/or receives data from CPU 205. Keyboard 235 is a specific type of I/O device that receives user input and transmits the input to CPU 205. Display 240 receives display data from CPU 205 and display images on a screen for a user to see. Memory 245 is a device that transmits and receives data to and from CPU 205 for storing data to a media. Network interface 250 connects CPU 205 to a network for transmission of data to and from other processing systems.
Figure 3 illustrates applications 300 executed by client system 115 to perform the processes for storing data in devices of a cloud computing network in accordance with an embodiment of this invention. Application 305 is a software process that generates, uses, and stores data. Client library 310 is a group of processes or objects that provide the file I/O methods for storing and reading data in accordance with the invention. Client library 310 is a Java library containing a collection of Java classes, a C# library or any other collection of applications or objects that are used to perform the I/O methods. One skilled in the art will recognize that client library 310 may also be implemented as kernel level components without departing from this invention.
Client library 310 includes slicer/combiner 311 , key management support module 315, data cache 320, key cache 325, and adaptation module 330. Slicer/combiner 311 is a software module that generates slices of data to be stored on the network devices. The slices represent an original set of data. Slicer/combiner 311 also converts the slices of data into the original sets of data when the slices are read from memory. Key management support 315 is a software module that manages assignment of the keys needed to communicate and perform authentication with devices in a cloud computing network. Preferably, the keys are used in SSL communication in accordance with the described embodiment of this invention. However, the key management system may be used to manage keys for other types of communication protocols used to communicate with the network devices in accordance with other embodiments of this invention. Data cache 320 manages the data received from the set of data during conversion processes. Key cache 325 temporarily stores the keys managed by key management support 315. Adaptation module 330 is one or more software modules that generate the necessary messages in the proper protocol to communicate with a device in a particular cloud computer network. One skilled in the art will recognize that if more than one cloud computing networks are supported more than one adaptation module 330 may be needed. For example, adaptation module may use Window Azure REST interface to store slices as a blob in the Azure blob service and Amazon S3 interface may be used to store the slices as objects in an S3 network.
Native file system 335 represents the file systems in memory local to client system 115 and cloud storage 340 is the storage available over cloud computing network 100. One skilled in the art will recognize that in some embodiments one or more of the slices may be stored in the native file system instead of devices in cloud computing network 100 without departing from this invention. The processes performed by these modules are set forth below.
Figure 4 illustrates a representation of a table maintained by registry server 10 to manage the reading and writing of data from a client system to the cloud computing network. Master table 400 is a table or other type of data structure that stores the information needed by registry server 110 to manage the storage of data from client systems to devices on the network. Master table 400 preferably resides in a volatile memory, such a RAM, a data cache or the like. The volatile memory is readily accessible by a processor of registry server 115 to minimize delays in providing information about stored data during operation. However, those skilled in the art will recognize that the table or other structure may be stored in other types of memory without departing from this invention. In an embodiment having table 400 stored in volatile memory, a transaction log may be kept to recreate table 400 should the stored information be lost for some reason. One skilled in the art will recognize that other error recovery methods may be used without departing from this invention. Table 400 includes a Filelnfo entry 405 for each set of data stored by connected client systems to devices in the cloud computing network in accordance with this invention. For purpose of the discussion, sets of data are discussed as being stored by client systems to the network devices. One skilled in the art will recognize that a set of data may be a file, document, object, or other data structure used to store data. Filelnfo entry 405 includes all of the information required to read, write and reconstruct a set of data stored to the network devices. In accordance with embodiments of the invention, Filelnfo entry 405 should include a checksum field indicating a check sum for the set of data or other error detection data; a namespace field indicating the namespace in which the set of data is stored; a paddingsize field indicating the amount of padding added to the end of the set of data; a shares field indicating the number of slices that are used to store the data on the network; a quorum field indicating the minimum number of slices needed to reconstruct the set of data; and fileslice fields 410 that store information about each slice used to store the set of data on the network devices. As shown in Figure 4 and one skilled in the art will recognize, other fields may be added to Fileinfo entry 405 to store other information. For example, information regarding the conversion transform matrix and/or inverse conversion transform matrix may be stored in the entry 405. Other examples are also shown in Figure 4.
Fileslice field 410 stores the information for a slice of data representing data in the set of data stored. Fileinfo entry 405 includes a fileslice field for each of the slices stored to network devices representing data of the stored set of data. Fileslice field 410 includes a serverlD filed indicating the device in the network storing the slice; a slicelD field that indicates the position in the arrangement of slices; a sliceChecksum field that stores a checksum, error detection hash or other error detection data for the slice; and sliceName field storing an identifier of the slice. The use of the information in table 400 will be described in the processes described below and shown in the flowcharts of Figures 5-18. Figure 5 illustrates a flow diagram of process 500 performed by a network device to alert registry server 110 that the device is connected to the system and available to store data in accordance with an embodiment of this invention. Registry server 110 maintains a list or other data structure that indicates the devices connected to the network that is available to store data. The list or data structure may also include a listing of the namespaces that may store data on each particular device. For purposes of this discussion, a namespace is a grouping of set of data such as a folder or directory that serve to link sets of data in a user defined manner. One skilled in the art will recognize that namespaces may be used to segregate data from different organizational units and may be used to manage the scalability of the data stored. The exact use of the namespaces is left as a design choice of those skilled in the art.
Process 500 begins in step 510 when a device connects to the network. In step 520, the device transmits a first availability message to registry server 110. The first availability message may contain an identifier for the device, a list of namespaces supported and a current IP address of the device. The namespaces supported may be determined from a configuration file or some other data structure storing the list that is read by the network device when generating the message. The IP address of a device is often not static and is assigned at the time of connection to the network. Thus, registry server 110 requires both the IP address and device identifier to facilitate transfers of data between the client system 115 and the network devices. One skilled in the art will recognize that after the first availability message, a network may periodically transmit subsequent availability messages to registry server 110 to verify that the device is still connected to the cloud computing network. These subsequent availability messages may include any information
In step 530, the device performs an authentication process initiated by registry server 110 in response to receiving the availability message. One skilled in the art will recognize that any authentication process used is not important to this invention and is omitted for brevity. Process 500 ends after step 540 in which the device receives an acknowledgment from registry server 110 in response to a successful authentication. The acknowledgement may include data needed for communications with registry server 110 and client system 115 over the network. For example, the acknowledgement message may include secret keys, encryption/decryption or other data depending on the protocol for communication between the network devices, client systems, and registry server 110. Figure 6 illustrates a flow diagram of process 600 performed by a registry server to register a network device connecting to the network. Process 600 begins in step 610 when registry server 110 receives an availability message from a device connected to the network. In response to receiving the availability message, registry server 1 10 performs an authentication process with the network device in step 620. In response to a successful authentication, registry server transmits an acknowledgment to the network device in step 630. The acknowledgement may include data needed for communications with registry server 110 and client system 115 over the network. For example, the acknowledgement message may include secret keys, encryption/decryption or other data depending on the protocol for communication between the network devices, client systems, and registry server 110. Further, registry server 110 then determines the namespaces that may store data in device in step 640. The identifier, list of namespaces, and IP address of the device are then stored in memory in step 650 for use in transmitting data between the client systems and network devices in accordance with this invention. Process 600 then ends.
In order to store data to the network, a client system must also register with the registry server in accordance with some embodiments of this invention. Figures 7 and 8 illustrate flow diagrams of the processes performed by client system 115 and registry server 110 to register a client system. Figure 7 illustrates process 700 that is a process performed by a client system to register with registry server 110 in accordance with an embodiment of this invention.
Process 700 begins in step 710. In step 710, the client system transmits a connection request to registry server 1 10. In response to receiving the connection request, registry server 110 performs an authentication process with the client system in step 720. Any number of authentication processes may be used and the exact authentication process used is a design choice left to one skilled in the art. In response to a successful authentication, the client system receives network information including data needed for communications between registry server 110 and client system 1 15 over the network in step 730. For example, the network information may include secret keys, encryption/decryption or other data depending on the protocol for communication between the network devices, client systems, and registry server 1 10. After step 730, process 700 ends.
Figure 8 illustrates process 800 performed by registry server 110 in response to a connection request received from a client system in accordance with an embodiment of this invention. Process 800 begins in step 810 when a connection request is received from a client system. In response to receiving the request, registry server 110 performs an authentication process with the client system in step 820. If the authentication is successful, registry server retrieves network information stored in memory in step 830. The network information may include data needed for communications between registry server 110 and client system 115 over the network. For example, the network information may include secret keys, encryption/decryption or other data depending on the protocol for communication between the network devices, client systems, and registry server 110. The network information is then transmitted to the client system in step 840 and process 800 ends.
This invention relates to the storage of data from a client system to devices in a cloud computing network. Figures 9-18 describe processes performed by a client system and the registry server to read and write data to and from network devices. Figure 9 illustrates a process performed by a client system to write a set of data to the network devices in accordance with an embodiment of this invention. In response to receiving a request to write a set of data, such as a file or other data structure, to storage, client system 115 performs process 900. Process 900 begins in step 910 by transmitting a write request to registry server 110. The write request should preferably include an identifier for the set of data, a namespace in which the set of data is to be stored. In some embodiments, the write request may also include the size of the set of data if known at the time of the write operation.
In response to the write request, client system 115 receives write information from registry server 110 in step 920. The process performed by registry server 110 to provide the write information is described below with respect to Figure 10. The write information includes a list of network devices available to store the data and the IP address for communicating with the devices. The write information may further include information for generating the slices of data, such as conversion transform matrix information, and/or padding information for adding padding data to the set of data. In step 930, client system 115 generates slices of data to store on the network devices. Slices of data are generated to allow the data to be stored on multiple devices to reduce the amount of data stored on a single device as well as to add security for the data by preventing any one network device to store all of the data in a particular set of data. The simplest manner of generating slices is to divide the set of data into portions including a specified amount of data. However, the drawback with this method is that all of the slices are needed to reconstruct the data. Thus, if one of the slices is corrupted or one of a device storing a slice is unavailable to the network, the set of data cannot be reconstructed. Furthermore, the data is not adequately secure as a third party could intercept transmission of all of the slices and reconstruct the set of data. Therefore, in some embodiments, the slices are generated in such a manner that each slice is a representation of the data from the data set. As such, only a specified number of the slices are needed to reconstruct the data. Further data security is enhanced because a third party must know the process used to generate the data in order to re-construct the data from the slices. A description of an Information Dispersal Algorithm (IDA) used to generate the slices in some embodiments of this invention is described below with respect to Figure 12.
After the slices are generated, each slice is transmitted to one of the network devices in step 940. The network devices that receive each slice are selected from the available network devices provided in the write information received from registry server 110. Preferably, the network devices are selected such that the slices are well dispersed such that no one device stores more than M slices where N is the total number of slices; and M is a number less than N and is the minimum number of slices needed to reconstruct the set of data. One skilled in the art will recognize that any number of selection algorithms can be used to select each device to store each slice and the algorithm used is a design choice left to those skilled in the art.
The storage information for the set of data that will be transmitted to registry server is then generated in step 950. The storage information includes information regarding the set of data including an identifier, a namespace size and padding, , information regarding the generation of the slices, and information about each slice. The information about each slice includes an identifier of the slice, the position of the slice in the data, the device storing the slice, and error detection information about the slice. To enhance security; and to provide for the appending and updating processes described below, the error detection information is stored on the registry instead of being transmitted with the slices as is common in the art. The error detection information will be described in more detail below. After the storage information is generated, the client system transmits the storage information to registry server 110 in step 960 and process 900 ends. The process performed by registry server 110 to store the information received from the client system is described below with respect to Figure 11. Figure 10 illustrates process 1000 performed by registry server 110 in response to receiving a write request from a client system in accordance with an embodiment of this invention. Process 1000 begins in step 1010 when a write request is received from the client system. In response to receiving the request, registry server 110 reads the namespace identifier for the set of data from the request in step 1020. Registry server 110 then searches the list of available devices stored in memory to determine each of the devices available to store information for the namespace and other namespace information in step 1030. Write information to transmit to the client system is then generated in step 1040. The write information includes a list of devices available to store data for the namespace as well as other information for communicating with the devices in the list. Registry server 110 may return a list including each device found that is available to store data for the namespace. Alternatively, registry server 110 may perform a selection algorithm to only select a portion of the devices available to include in the list. In other embodiments, all of the network devices that may store data for the namespace are returned along with usage information to allow a client system to perform load balancing analysis and/or assignment of the network devices. The exact method of selection of the devices is left as a design choice to those skilled in the art. Process 1000 then end after step 1050 with the transmission of the write information to the client system.
Figure 11 illustrates a flow diagram of process 1100 performed by registry server 110 in response to receiving storage information for a set of data in accordance with an embodiment of this invention. Process 1100 begins in step 1110 when storage information for a set of data is received from a client system. Registry server 110 then stores the information in step 1120 and process 1100 ends. Preferably, registry server generates a new Filelnfo entry 405 in master table 400 (Figure 4) and populates the fields with the storage information received. However, other storage methods may be used without departing from this invention.
As discussed above with respect to step 930, the set of data should be divided into slices. More preferably, slices of data should be generated that represent portions of the data from the data set such that only a portion of the slices are needed to reconstruct the data set. One method of generating the slices is an Information Dispersal Algorithm (IDA) such as the algorithm described by Rabin in US Patent number 5,485,474. In the described IDA the set of data is represented by N slices and only M slices are required for reconstruction of the original set of data. In the described IDA, a conversion transform matrix of N rows and M columns is used to perform the transformation of the set of data into the N slices. N and M are selected such that N>M. To start the algorithm, the set of data is divided into X consecutive portions of data of M length. X is determined by dividing the total file size, T, divided by M and rounding up to account for an incomplete portion that may be padded with data. The following simple example where T = N is provided. In the example, F is the original set of data where the data is divided into portions represented as the rows and b, is a byte array of F. Thus F = Cbt,b2,b3, ^ bml (bm+11 bm+2 b2m).. {b N-m+l.—
Let A be the transform matrix
Figure imgf000021_0001
and B be the input matrix of F and C be the output matrix (the output slices). Thus, we can write the following matrix equation:
Figure imgf000021_0002
To obtain C , we multiply row 1 of A with column 1 of B. For example, en = f i&i + «i2 &2 +— + lm bm e 6F(28}
Each row of C corresponds to a slice of data to be stored. Reconstruction of the data will be discussed below with regards to Figure 14 illustrating a process for reading the data.
Figure 12 illustrates a process performed by the client system to generate slices to be stored using the IDA described above in accordance with one embodiment of this invention. One skilled in the art will recognize that other methods may be used to generate the slices without departing from this invention. Process 1200 begins in step 1210 by determining a conversion transform matrix. In accordance with the described embodiment, the conversion transform matrix is read from the write information received from registry server 110. However, in other embodiments, the write information may only contain parameter information and the client system performs a process to generate the conversion transform matrix. Furthermore, registry server may store more than one conversion transform matrix and provides one conversion transform matrix to the client system for a specific set of data. However, the exact manner in which the conversion transform matrix is provided is left as a design choice of those skilled in the art.
The conversion transform matrix is an N by M matrix. N is the number of slices that are to be produced and M is the number of slices required to reconstruct the entire set of data. N is greater than M. One skilled in the art will recognize the ratio of N to M may be much larger if the data is to be made available from a much smaller set of slices. However, the ratio of N to M may be smaller if more security is desired. In the described embodiment, the size of the matrix is 10 by 7 for the slices to be manageable. However, it is left as a design choice as to the exact size of the conversion transform matrix and the value of each element in the matrix.
The client system receives the set of data to store. The client system divides the set of data into X consecutive portions of data of M length. M is determined by the number of slices needed to reconstruct the data and X is determined by dividing the Total size of the set of data by M (T/M) that is rounded up to the next whole number to account for a padded portion. One skilled in the art will recognize that the portions may be generated as the data for the set to store is received without the client system knowing the exact number of X portions being generated. The process of generating the slices begins in step 1220 by reading an xth one of the
X portions of the data. The selected xth portion is multiplied by the conversion transform matrix to generate the xth resulting vector in step 1240. For example, the 1st portion is multiplied by the conversion transform matrix to generate the first resulting vector and the 2nd portion is multiplied by the conversion transform matrix to generate the second resulting vector. Each yth element of the Xth resulting vector is then read from the xth resulting vector and stored in the xth position of the yth slice in step 1250. For example, the 1st element from the first resulting vector is stored in the 1st position of the 1st slice, the 2nd element of 1st resulting vector is stored in the 1st position of the 2nd slice, the 3rd element of the 1st resulting vector is stored in the 1st position iof the 3rd slice etc. After all of the elements of the resulting vector have been added to a corresponding slice, process 1200 determines if all of the X portions have been processed in step 1260. If all of the portions have not been processed, process 1200 repeats from step 1220 for the next portion.
After all of portions have been processed, the error detection data for each slice is generated in step 1270. The error detection data for each slice is then inserted into the storage information to be sent to registry server 110 in step 1280. Preferably, the error detection data for each slice is a hash. In the described embodiment, the error detection information is generated by a keyed hash. One skilled in the art will appreciate that the exact method of storing and generating the error detection information is left as a design choice. The conversion transform matrix and/or information for generating the conversion transform matrix is then stored in the storage information to be sent to registry server 110 in step 1290 and process 1200 ends. One skilled in the art will recognize that step 1290 may be omitted if registry server 110 provides the conversion transform matrix as in the described embodiment. Figure 13 illustrates a process for transmitting the slices to the network devices performed by the client system in accordance with an embodiment of step 940 of process 900. In accordance with process 1300, the data for each slice is maintained in a buffer providing a queue and portions of the slice are periodically transmitted to each of the selected network devices. The data for each slice is written to the buffer as the data for each slice is determined from the X portions of data. Thus, the data for each slice may be streamed to the network devices as the data for each slice is generated. One skilled in the art will recognize that any number of different transmitting algorithms may be used to transmit the data in each slice to the network devices without departing from this invention. Process 1300 begins in step 1305 by selecting network device to receive each of the slices. The devices are selected from the list of available devices transmitted to the client system in storage information from registry server 110. A slice identifier may also be generated at this time and stored in the storage information. Preferably, the slice identifier is unique and does not give an indication of the ordering of the slices. In some embodiments, a GUID of the slice is used as the identifier. However, other naming conventions can be used without departing from this invention.
A slice is then selected in step 1310. In step 1320, the buffer storing the slice data in the slice is read. The client system then determines if a minimum amount of data is available in step 1330. The minimum amount of data may be one byte or more depending on the requirements of the system and is left as a design choice. If there is a minimum amount of data, the data for the slice is transmitted to the device is step 1340. If not, the process repeats from step 1310 to select another slice. After the portion of the slice is transmitted, the client system waits to receive an acknowledgement from the network device selected to store the slice in step 1350. If no acknowledgement is received, an error recovery process is performed in step 1360. The error recovery process may be to transmit the portion or the entirety of the slice to another available device; or may require regenerating the slice from the data for re-transmitting. Other error recovery methods may be used without departing from this invention. Otherwise, client system determines if the entire slice has been sent in step 1370. If the entire slice has been sent, the client system stores the identifier of the network device storing the slice in the storage information in step 1380. If the entire slice has not been sent, the process is repeated from step 1310. After step 1380, the client system determines if transmission of all of the slices is complete in step 1390. If not, process 1300 is repeated from step 1310. Otherwise, process 1300 ends.
Figure 14 illustrates process 1400 performed by a client device to retrieve a portion of data for a set of data from stored on the network. Process 1400 begins in step 1405 by receiving a request to retrieve a section of a set of data stored by devices in the network. In accordance with this embodiment, a section means any subset of the entire set of data including the entirety of the set of data. In response to receiving the request, the client system requests storage information for the set of data from registry server 110 in step 1410. The client then receives the storage information from the registry server in step 1420. The storage information includes all of the information necessary to retrieve each of the slices and re-generate the data. This information may include the conversion transform matrix or information to construct the conversion transform matrix used to generate the slices; the device storing each of the slices; and error detection information for each of the slices. The identity and IP address of each device storing each slice and any other information needed to access the slice is read from the storage information in step 1430. In step 1435, the elements of the slices needed to re-construct the section of data requested are determined. In accordance with the described embodiment, the elements of the slices required to re-construct the data are determined by determining each xth one of the X consecutive portions of M length of the set of data that include the requested data and requesting each of xth element of the slices. Where X is determined by dividing the total amount, T, of data in the set by M (T/M) and M is the minimum number of slices to reconstruct the data. The xth portions needed are determined by determining the section of the data that include the offset of the requested data. For example, a file has a total length of 100 bytes and M is 5; and the requested offset is bytes 23-43. In this case, the 5th-9th portions are needed as bytes 23-25 are in the 5th portion, bytes 26-30 are in the 6th portion, 31-35 are in the 7th portion, bytes 36-40 are in the 8th portion and bytes 41-43 are in the 9th portion. To generate the 5th-9th portions, the 5th-9th elements of M slices are required. The client system then generates a request for the required elements of each slice and transmits the requests for each slice to each network device storing each slice in step 1440. The client system then receives the requested elements of the slices from the network devices in response to the request in step 1450 and uses the error detection information to determine elements of the slices received are correct. In step 1460, the client system determines whether at least a minimum number, M, slices are correctly received where M is the minimum number of slices needed to re-construct the data. If not, an error recovery process may be performed in step 1470. Process 1400 then either repeats from step 1440 after the error recovery process is performed or ends.
If at least the request elements of M slices are received in step 1460, the identifier and/or position of each of the M slices in the N total number of slices is determined in step 1475. An inverse conversion transform matrix is then generated based upon the slices received in step 1480. In the described embodiment, the inverse conversion transform matrix is generated by selecting each row corresponding to the position of an individual slice in the M slices from the conversion transform matrix and to form a M x M sub-matrix and generating the inverse matrix of the sub-matrix. For example, if the 2nd, 4th, and 6th slices are received, the 2nd, 4th, and 6th rows of the conversion transform matrix are selected and an inverse matrix is generated from the matrix including these rows.
Re-assembly of the set of data is performed in step 1490. The data may be reassembled by multiplying each of m slices by an inverse matrix and determining each of the elements of the original X consecutive portions of data including the requested data. If the X consecutive portions include the portion of the set of data, the padding added to the end of the last portion is then removed based upon the padding information from the received storage information in step 1490 and process 1400 ends. Figure 15 illustrates process 1500 performed by registry server 110 to respond to a read request received from a client system in accordance with an embodiment of this invention. Process 1500 begins in step 1510 by receiving the read request from the client. The registry server then reads the identifier for the set of data from the request in step 1520 and retrieves the storage information for the identified set of data in step 1530. This may be done by reading the Filelnfo entry 405 for the identified set of data from master table 400 maintained in a volatile memory in accordance with some embodiments of this invention. The storage information to transmit to the client system including the necessary information for retrieving the data is then generated from the read data in step 1540 and transmitted to the client in step 1550. Process 1500 then ends. Figure 16 illustrates a flow diagram of process 1600 performed by a client system to append data to the end of a set of data stored on the network devices in accordance with an embodiment of this invention. The advantage of process 1600 is that the entirety of the set of data does not need to be retrieved and/or stored in memory. This reduces the memory footprint in the client system and reduces network traffic. Process 1600 is performed after the client system has re-constructed at least the last of the X consecutive portions of the data set from the relevant portions of the slices read from the network devices using a read process, such as process 1400 described above and shown in Figure 14. As described in process 1400, any padding added to the last portion is removed during the read process. Otherwise, process 1600 must includes a process for removing the padding.
Process 1600 begins in 1602 by determining whether the last portion of data includes padding. This may be done by determining whether the amount of data in the last portion is equal to M, where M is the minimum number of slices needed to re-construct data from the set of data. Alternatively, the storage data may be read to determine if padding information is included for the set of data. If the last portion does not include padding, a conventional write operation, such as the operation described in process 1200, is performed on the appended data adding elements to the end of each of the N slices in step 1603. If the last portion includes padding, process 1600 continues by determining whether the amount of data to be appended is less than the amount of padding added to the end of the file in step 1605. The amount of padding added to the file is determined from the padding information read from storage information received by the client system from registry server 110. If the amount of data to append is greater than the amount of padding added, an amount of data equal to the amount of padding is read from the beginning of the appended data and added to the last of the X consecutive portions of data in step 1610. A normal write operation is then performed for the remainder of the data to append in step 1615. If the amount of data to append is less than or equal to the amount of padding, the data to be appended is added to the end of the last portion in step 1620.
In step 1630, the last portion with the appended data formed in either step 1620 or 1615 is multiplied by the conversion transform matrix, used to generate the original slices as a vector, to generate an append vector. In the described embodiment, the client system reads the matrix from the storage data received from registry server 110. In other embodiments, the client system may generate the conversion transform matrix from matrix information read from the storage information. In step 1640, the yth element from the append vector is read. For example, the 1st element is read in a first iteration, the 2nd element is read in the second iteration, etc.
The storage information is then read to determine the network device storing the yth slice in step 1650. In step 1660, a request to store the yth element as the current last element of the yth slice is transmitted to the network device storing the yth slice. For example, the 1st element of the result vector is stored as the current last element of the 1st slice, the 2nd element of the result vector is stored as the current last value of the 2nd slice, etc. The request includes an offset indicating the position of the current last slice in the stored slice. After the yth element is stored, error detection information for the yth slice is generated using the yth element as the last element in step 1670 and updated to the storage information for the yth slice. The storage information is transmitted to the registry in step 1680. One skilled in the art will note that the updates of storage information for each slice may be stored and sent at one time without departing from the invention. The client system determines whether all of the elements of the result vector have been transmitted to the N slices in step 1690. If not, process 1600 Is repeated from step 1640 for the next yth element of the result vector. If all of the elements have been transmitted, process 1600 ends.
Figure 17 illustrates update process 1700 performed by a client system to update the data in the set of data stored on the network devices in accordance with an embodiment of this invention. Process 1700 is performed after the client system has re-constructed the relevant portions of the data from the relevant portions of the slices read from the network devices using a read process, such as process 1400 described above and shown in Figure 14; and these portions of data have subsequently been amended in some way. In this embodiment, the client system maintains the storage information received from the registry server 110 in memory. However, if the storage information is not maintained in memory, the client system must transmit a request and receive the storage information for the set of data from registry server 110. Process 1700 begins in step 1710 by determining each one of the X consecutive portions of the set of data that include updated data. These portions may be determined by looking for changes in data in each Mth offset of data, counting the offsets and recording each offset containing updated data where M is the minimum number of slices needed to reconstruct the data. In accordance with an example the 3rd and 4th of the X consecutive portions of the set of data contain amended information. In step 1715, the y one of the X portions containing updated data is selected. In accordance with the above example the 3rd of the X portions is selected. In step 1720, the yth portion is multiplied by a vector with the conversion transform matrix to generate an update vector. In accordance with the described embodiment, the conversion transform matrix is read from the data storage information. In accordance with other embodiments, the conversion transform matrix may be produced by the client system from information read from the storage information received from registry server 1 10. An xth element is then obtained from the update vector in step 1730. In accordance with the above example, the update vector is generated by multiplying the 3rd portion and the conversion transform matrix. The client system then selects the first element of the update vector.
The network device storing the xth slice is then read from the storage information in step 1740. In accordance with the described example, the 1st slice is read. The client system then transmits a request to network device storing the xth slice in step 1750. The request indicates that the xth element is to be stored at an offset for the yth element of the slice. In accordance with the above example, the request indicates the 1st element of the update vector is to be stored as the 3rd element of the 1st slice. The error detection information for the xth slice with the xth element from the update vector as the yth element is then generated in step 1760. The error detection information is then transmitted to registry information to replace the error detection information for xth slice in step 1770. One skilled in the art will note that the updates of storage information for each slice may be stored and sent at one time without departing from the invention. The client system determines whether all of the elements in the update vector have been transmitted in step 1780. If not, process 1700 is repeated from step 1730 until all of the elements of the update vector have been transmitted. Otherwise, the client system determines whether all of the portions including updated data have been processed in step 1790. If not, process 1700 is repeated from 1715 for each of the remaining portions. If so, process 1700 ends.
Process 1800 is a flow diagram of a process performed by registry server 110 to update information in storage information in accordance with an embodiment of this invention. Process 1800 begins in step 1810 by receiving updated storage information from a client system. In response to receiving the information, registry server 110 reads the information from the update request and updates the proper field in the proper Fileinfo entry 405 in step 1820. Process 1800 then ends. The above is a description of embodiments of a method and system for storing data in a cloud computing network. It is expected that those skilled in the art can and will design alternative embodiments of this invention as set forth in the following claims.

Claims

What is claimed is:
1. A system for storing data on a plurality of devices in a cloud network comprising:
a server comprising:
a processing unit;
instructions for directing said processing unit to:
receive data storage information for a plurality of slices of data representing a set of data wherein said data storage information includes an identifier of each of said plurality of slices, an identifier of one of said plurality of devices in said cloud network storing said data, and error detection data for each of said slices, and
store said data storage information in a memory; and a media readable by said processing unit to store said instructions.
2. The system of claim 1 wherein said instructions for directing said processing unit of said server further comprise:
instructions for directing said processing unit to:
maintain a list of said plurality of devices connected to said cloud network available to store data.
3. The system of claim 2 wherein said instructions for directing said processing unit of said server further comprise:
instructions for directing said processing unit to:
receive a first availability message from one of said plurality of network devices indicating said one of said plurality of devices is available to store data; and
store an identifier of said one of said plurality of devices in said list of available devices.
4. The system of claim 3 wherein said instructions for directing said processing unit of said server further comprise:
instructions for directing said processing unit to:
read a list of namespaces from said first availability message wherein said list of namespaces identifies each namespace that may store data on said one of said plurality of devices and a namespace is an identifier for a collection of a plurality of sets of data,
store an indication in memory of each namespace in said list of namespaces that may store data on said one of said plurality of network devices.
5. The system of claim 3 wherein said instructions for directing said processing unit of said server further comprise:
instructions for directing said processing unit to:
determine an Internet Protocol (IP) address for said one of said plurality of devices in response to receiving said first availability message, and
store said IP address in said memory.
6. The system of claim 2 wherein said instructions for directing said processing unit of said server further comprise:
instructions for directing said processing unit to:
perform an authentication process with said one of said plurality of devices in response to receiving said first availability message.
7. The system of claim 6 wherein said instructions for directing said processing unit of said server further comprise:
instructions for directing said processing unit to:
transmit an acknowledgement to said one of said plurality of devices in response to said one of said plurality of devices being successfully authenticated.
8. The system of claim 7 wherein said instructions for directing said processing unit of said server further comprise:
instructions for directing said processing unit to:
transmit communication information to said one of said plurality of devices.
9. The system of claim 8 wherein said communication information includes secret key.
10. The system of claim 3 further comprising:
a plurality of processing devices connected to said cloud network;
each of said plurality of processing devices comprises:
a processing unit,
a memory for storing data,
instructions for directing said processing unit to:
transmit a first availability message to said server in response to connecting to said network;
perform authentication with said server; and
receive communication information from server in response to a successful authentication; and
a media readable by said processing unit for storing said instructions.
11. The system of claim 10 wherein said first availability message includes a list of namespaces wherein said list of namespaces identifies each namespace that may store data on said one of said plurality of devices and a namespace is an identifier for a collection of a plurality of sets of data.
12. The system of claim 10 wherein said first availability message includes an identifier of said device and an Internet Protocol address of said device.
13. The system of claim 2 wherein said instructions for directing said processing unit in said server further comprise:
instructions for directing said processing unit to:
receive a write data request for said set of data from said client system, and
transmit write information including information about each of said plurality of devices in a group of devices that are available to store said set of data to said client system.
14. The system of claim 13 wherein said instructions for directing said processing unit in said server further comprise:
instructions for directing said processing unit to:
determine said group of devices available to store said set of data.
15. The system of claim 14 wherein said instructions to determine said group comprise:
instructions for directing said processing unit to:
read a namespace for said set of data from said request,
determine each of said plurality of devices in said list of devices that is available to store data for said namespace,
add each of said plurality of devices determined to be available to store data for said namespace to said group.
16. The system of claim 13 wherein said information for each of said plurality of devices in a group of devices that are available to store said set of data to said client system includes an Internet protocol address for the device and a device identifier.
17. The system of claim 13 further comprising:
a client system comprising:
a processing unit;
instructions for directing said processing unit to:
I
receive a request to store said set of data to memory, generate said write request in response to receiving said request to store said set of data,
transmit said write request to said server,
receive said write information including said information for each of said plurality of devices in a group of devices that are available to store said set of data from said server,
generate said plurality of slices representing said data in said set of data,
transmit each of said plurality of slices to one of said plurality of devices in said group,
generate storage information for each of said plurality of slices, and
transmit said storage information to said server; and
a media readable by said processing unit for storing said instructions.
18. The system of claim 17 wherein said instructions for directing said processing unit of said client system to generate said plurality of slices comprise:
instructions for directing said processing unit to: apply Rabin's information dispersal algorithm to said data in said set of data.
19. The system of claim 17 wherein said instructions for directing said processing unit of said client system to generate said plurality of slices comprise:
instructions for directing said processing unit to:
divide data from said set of data into X consecutive portions of M length where M is determined by a number of a plurality of slices needed to re-construct the data and X is equal to the total amount of data in the set of data divided by M,
multiply each of xth one of said X consecutive portions of data by a conversion transform matrix to determine an xth resulting vector, and
insert each yth element of each xth resulting vector in Xth position in yth one of said plurality of slices.
20. The system of claim 19 wherein said instructions for directing said processing unit of said client system to generate said plurality of slices further comprise:
instructions for directing said processing unit to:
calculate error detection information for each of said plurality of slices, and
write said error detection information for each of said plurality of slices to error detection information for each of said plurality of slices in said storage information transmitted to said server.
21. The system of claim 19 wherein said instructions for directing said processing unit of said client system to generate said plurality of slices further comprise:
instructions for directing said processing unit to:
generate said conversion transform matrix from matrix information read from said write information received from said server, and insert said conversion transform matrix into said storage information transmitted to said server.
22. The system of claim 19 wherein said instructions for directing said processing unit of said client system to generate said plurality of slices further comprise:
Instructions for directing said processing unit to: read said conversion transform matrix from said write information received from said server.
23. The system of claim 19 wherein said instructions for directing said processing unit of said client system to generate said plurality of slices further comprise:
instructions for directing said processing unit to:
add padding data to an end of a last of said X consecutive portions of data,
insert an indication of said padding into said storage information to send to said server.
24. The system of claim 17 wherein said instructions for directing said processing unit in said client system comprise:
instructions for directing said processing unit to:
store each one of said plurality of slices to one of said plurality of devices in said network, and
insert an identification of said one of said plurality of devices storing each one of said plurality of slices into said storage information.
25. The system of claim 24 wherein said instructions for directing said processing unit in said client system to store each of said plurality of slices comprise:
instructions for directing said processing unit to:
determine one of said plurality of devices in said network to store one of said plurality of slices from a list of available devices provided in said write information received from said server.
26. The system of claim 25 wherein said instructions for directing said processing unit in said client system to store each of said plurality of slices further comprise:
instructions for directing said processing unit to:
transmit said one of said plurality of slices to said one of said plurality of devices in response to said determination.
27. The system of claim 26 wherein said instructions for directing said processing unit in said client system to store each of said plurality of slices further comprise:
instructions for directing said processing unit to: receive an acknowledgment from said one of said plurality in response to transmitting said one of said plurality of slices to said one of said plurality of devices.
28. The system of claim 27 wherein said instructions for directing said processing unit in said client system to store each of said plurality of slices comprise:
instructions for directing said processing unit to:
perform an error recovery process in response to not receiving said acknowledgement.
29. The system of claim 26 wherein said instructions for directing said processing unit in said client system to store each of said plurality of slices comprise:
instructions for directing said processing unit to:
determine a specified amount of data of said one of said plurality of slices is available, and
transmit said specified amount of data of said one of said plurality of slices to said one of said plurality of network devices in response to a determination that said specified amount of data is available.
30. The system of claim 17 wherein said instructions for directing said processing unit in said client system comprise:
instructions for directing said processing unit to:
receive a request to retrieve a section of said set of data,
generate a read request that includes an identifier for said set of data, transmit said read request to said server,
receive storage information for said set of data from said server wherein said storage information includes information for each of said plurality of slices including a one of said plurality of devices storing a slice, an order of said slice, and error information for said slice,
read said information for each of said plurality of slices,
transmit a request for a specified portion of each of said plurality of slices needed to reconstruct said section of said set of data to said one of said plurality of devices for each of said plurality of slices,
receive said specified portions of at least some of said plurality of slices from said plurality of devices, and determine whether said specified portions for at least a predetermined number of slices have been received,
re-assemble said data from said at least some of plurality of slices received.
31. The system of claim 30 wherein said instructions for directing said processing unit in said client system comprise:
instructions for directing said processing unit to:
remove padding from said data responsive to re-assembling said data.
32. The system of claim 30 wherein said instructions for directing said processing unit in said client system to reassemble said data comprise:
instructions for directing said processing unit to:
determine each of said plurality of slices received, read said conversion transform matrix from said storage information received from said server,
select portions of said conversion transform matrix that correspond to each of said plurality of slices received,
generate an inverse conversion transform matrix from said selected portions of said conversion transform matrix, and
multiply each of said plurality of slices with said inversion conversion transform matrix to obtain a plurality of portions of said section of said data set.
33. The system of claim 30 wherein said instructions for directing said processing unit in said client system comprise:
instructions for directing said processing unit to:
read error information for each of said plurality of slices from said storage information received from said server, and
perform error checking using said error information on said portion of data from each of said plurality of slices received.
34. The system of claim 33 wherein said instructions for directing said processing unit in said client system comprise:
instructions for directing said processing unit to: disregard said portion of a slice responsive to a determination that said slice includes an error and greater than said predetermined number of slices have been received.
35. The system of claim 33 wherein said instructions for directing said processing unit in said client system comprise:
instructions for directing said processing unit to:
perform error recovery responsive to a determination that a slice includes an error.
36. The system of claim 17 wherein said instructions for directing said processing unit in said client system comprise:
instructions for directing said processing unit to:
receive a request to append data to an end of said set of data, request storage information from said server,
retrieve a last of X consecutive M length portions of said set of data from said plurality of devices where M is a minimum number of slices reconstruct data and X is determined from a total amount of said set of data divided by M,
read padding information from said storage information,
remove said padding from said last of X portions of data,
add data to append to an end of said data in said last of X portions to obtain a new last portion,
multiply said new last portion by said conversion transform matrix to determine an append vector,
read storage information for each of said plurality of network devices storing each of said plurality of slices, and
transmit a request for each yth element of said append vector to one of said plurality of device storing yth one of said plurality of slices with an offset to a last position in said slice.
37. The system of claim 36 wherein said instructions for directing said processing unit in said client system comprise:
instructions for directing said processing unit to: determine updated error detection information for each of said plurality of slices responsive to adding said yth elements to each of said plurality of slices, and
transmit said updated error detection information for each plurality of slices to said server for storage as error detection data .
38. The system of claim 37 wherein said instructions for directing said processing unit of said server comprise:
instructions for directing said processing unit to:
receive said updated error detection information for said each of said plurality of slices, and
update said error detection information of said each of said plurality of slices with said updated error detection information received for each of said plurality of slices.
39. The system of claim 17 wherein said instructions for directing said processing unit in said client system comprise:
instructions for directing said processing unit to:
receive a request to update a portion of data in said data set, determine each of xth one of X consecutive M length portions of data that include updated data, wherein is a minimum number of slices needed to re-construct data of said set of data and X is determine by dividing a total amount of data in said set of data by M.
multiply each of xth one X consecutive portions that include updated data by a conversion transform matrix to determine an xlh update vector,
read storage information for one of said plurality devices storing each of said plurality of slices, and
transmit a request for each yth element of each said xth update vector to one of said plurality of device storing yth one of said plurality of slices with an offset to an xth position of said slice.
40. The system of claim 39 wherein said instructions for directing said processing unit in said client system comprise:
instructions for directing said processing unit to:
determine updated error detection information for each of said plurality of slices, and transmit said updated error detection information for each of said plurality of slices to said server .
41. The system of claim 40 wherein said instructions for directing said processing unit of said server comprise:
instructions for directing said processing unit to:
receive said updated error detection information for each of said plurality of slices, and
update said error detection information of said each of said plurality of slices with said updated error detection information received for each of said plurality of slices.
42. A system for storing data on a plurality of devices in a cloud network comprising:
circuitry in a server configured to receive data storage information for a plurality of slices of data representing a set of data wherein said data storage information includes an identifier of each of said plurality of slices, an identifier of one of said plurality of devices in said cloud network storing said data, and error detection data for each of said slices; and
circuitry in said server configured to store said data storage information in a memory.
43. The system of claim 42 further comprising:
circuitry in said server configured to maintain a list of said plurality of devices connected to said cloud network available to store data.
44. The system of claim 43 further comprising:
circuitry in said server configured to receive a first availability message from one of said plurality of network devices indicating said one of said plurality of devices is available to store data; and
circuitry in said server configured to store an identifier of said one of said plurality of devices in said list of available devices.
45. The system of claim 44 further comprising:
circuitry in said server configured to read a list of namespaces from said first availability message wherein said list of namespaces identifies each namespace that may store data on said one of said plurality of devices and a namespace is an identifier for a collection of a plurality of sets of data; and
circuitry in said server configured to store an indication in memory of each namespace in said list of namespaces that may store data on said one of said plurality of network devices.
46. The system of claim 44 further comprising:
circuitry in said server configured to determine an Internet Protocol (IP) address for said one of said plurality of devices in response to receiving first availability message; and
circuitry in said server configured to store said IP address in said memory.
47. The system of claim 43 further comprising:
circuitry in said server configured to perform an authentication process with said one of said plurality of devices in response to receiving said first availability message.
48. The system of claim 47 further comprising:
circuitry in said server configured to transmit an acknowledgement to said one of said plurality of devices in response to said one of said plurality of devices being successfully authenticated.
49. The system of claim 48 further comprising:
circuitry in said server configured to transmit communication information to said one of said plurality of devices.
50. The system of claim 49 wherein said communication information includes a secret key.
51. The system of claim 44 further comprising:
circuitry in each of a plurality of network devices configured to transmit a first availability message to said server in response to connecting to said network;
circuitry in each of said plurality of network devices configured to perform authentication with said server; and
circuitry in each of said plurality of network devices configured to receive communication information from server in response to a successful authentication.
52. The system of claim 51 wherein said availability message includes a list of namespaces wherein said list of namespaces identifies each namespace that may store data on said one of said plurality of devices and a namespace is an identifier for a collection of a plurality of sets of data.
53. The system of claim 51 wherein said first availability message includes an identifier of said device and an Internet Protocol address of said device.
54. The system of claim 43 further comprising:
circuitry in said server configured to receive a write data request for said set of data from said client system; and
circuitry in said server configured to transmit write information including information about each of said plurality of devices in a group of devices that are available to store said set of data to said client system.
55. The system of claim 54 further comprising:
circuitry in said server configured to determine said group of devices available to store said set of data.
56. The system of claim 55 wherein said circuitry configured to determine said group comprises:
circuitry in said server configured to read a namespace for said set of data from said request;
circuitry in said server configured to determine each of said plurality of devices in said list of devices that is available to store data for said namespace; and circuitry in said server configured to add each of said plurality of devices determined to be available to store data for said namespace to said group.
57. The system of claim 54 wherein said information for each of said plurality of devices in a group of devices that are available to store said set of data to said client system includes an Internet protocol address for the device and a device identifier.
58. The system of claim 13 further comprising:
circuitry in a client system configured to receive a request to store said set of data to memory; circuitry in said client system configured to generate said write request in response to receiving said request to store set of data;
circuitry in said client system configured to transmit said write request to said server;
circuitry in said client system configured to receive said write information including said information for each of said plurality of devices in a group of devices that are available to store said set of data from said server;
circuitry in said client system configured to generate said plurality of slices representing said data in said set of data;
circuitry in said client system configured to transmit each of said plurality of slices to one of said plurality of devices in said group;
circuitry in said client system configured to generate storage information for each of said plurality of slices; and
circuitry in said client system configured to transmit said storage information to said server.
59. The system of claim 58 wherein said circuitry in said client system configuredrate said plurality of slices comprises:
circuitry in said client system configured to apply Rabin's information dispersal algorithm to said data in said set of data.
60. The system of claim 58 wherein said circuitry in said client system configuredrate said plurality of slices comprises:
circuitry in said client system configured to divide data from said set of data into X consecutive portions of M length where M is determined by a number of a plurality of slices needed to re-construct the data and X is equal to the total amount of data in the set of data divided by M;
circuitry in said client system configured to divide multiply each of xth one of said X consecutive portions of data by a conversion transform matrix to determine an xth resulting vector; and
circuitry in said client system configured to divide insert each yth element of each xth resulting vector in xth position in yth one of said plurality of slices.
61. The system of claim 60 wherein said circuitry in said client system configuredrate said plurality of slices further comprises: circuitry in said client system configured to calculate error detection information for each of said plurality of slices responsive to generating said slices, and
circuitry in said client system configured to write said error detection information for each of said plurality of slices to said storage information transmitted to said server.
62. The system of claim 60 wherein said circuitry in said client system configuredrate said plurality of slices further comprise:
circuitry in said client system configured to generate said conversion transform matrix from matrix information read from said write information received from said server; and
circuitry in said client system configured to insert said conversion transform matrix into said storage information transmitted to said server.
63. The system of claim 60 wherein said circuitry in said client system configuredrate said plurality of slices further comprise:
circuitry in said client system configured to read said conversion transform matrix from said write information received from said server.
64. The system of claim 60 wherein said circuitry in said client system configuredrate said plurality of slices further comprise:
circuitry in said client system configured to add padding data to an end of a last of said X consecutive portions of data; and
circuitry in said client system configured to insert an indication of said padding into said storage information to send to said server.
65. The system of claim 58 further comprising:
circuitry in said client system configured to store each one of said plurality of slices to one of said plurality of devices in said network; and
circuitry in said client system configured to insert an identification of said one of plurality of device storing each one of said plurality of slices into said storage information.
66. The system of claim 65 wherein said circuitry in said client system configured each of said plurality of slices comprise: circuitry in said client system configured to determine one of said plurality of devices in said network to store one of said plurality of slices from a list of available devices provided in said write information received from said server.
67. The system of claim 66 wherein said circuitry in said client system configured each of said plurality of slices further comprise:
circuitry in said client system configured to transmit said one of said plurality of slices to said one of said plurality of devices in response to said determination.
68. The system of claim 67 wherein said circuitry in said client system configured each of said plurality of slices further comprise:
circuitry in said client system configured to receive an acknowledgment from said one of said plurality devices in response to transmitting said one of said plurality of slices to said one of said plurality of devices.
69. The system of claim 68 wherein said circuitry in said client system configured each of said plurality of slices comprise:
circuitry in said client system configured to perform error recovery in response to not receiving said acknowledgement.
70. The system of claim 67 wherein said circuitry in said client system configured each of said plurality of slices comprise:
circuitry in said client system configured to determine a specified amount of data of said of one of said plurality of slices is available; and
circuitry in said client system configured to transmit a portion of said one of said plurality of slices is response to a determination that said specified amount of data is available.
71. The system of claim 58 further comprising:
circuitry in said client system configured to receive a request to retrieve a section of said set of data,
circuitry in said client system configured to generate a read request that includes an identifier for said set of data,
circuitry in said client system configured to transmit said read request to said server; circuitry in said client system configured to receive storage information for said set of data from said server wherein said storage information includes information for each of said plurality of slices including a one of said plurality of devices storing a slice, a order of said slice, and error information for said slice;
circuitry in said client system configured to read said information for each of said plurality of slices;
circuitry in said client system configured to transmit a request for a specified portion of each of said plurality of slices needed to reconstruct said section of said set of data to said one of said plurality of devices for each of said plurality of slices; circuitry in said client system configured to receive said specified portions of at least some of said plurality of slices from said plurality of devices;
circuitry in said client system configured to determine whether said specified portions for at least a predetermined number of slices have been received; and
circuitry in said client system configured to re-assemble said data from said at least some of plurality of slices received.
72. The system of claim 71 further comprising:
circuitry in said client system configured to remove padding from said data responsive to re-assembling said data.
73. The system of claim 71 wherein said circuitry in said client system configuredemble said data comprises:
circuitry in said client system configured to determine each of said plurality of slices received;
circuitry in said client system configured to read said conversion transform matrix from said storage information received from said server;
circuitry in said client system configured to select portions of said conversion transform matrix that correspond to each of said plurality of slices received;
circuitry in said client system configured to generate an inverse conversion transform matrix from said selected portions of said conversion transform matrix; and circuitry in said client system configured to multiply each of said plurality of slices with said conversion transform matrix to obtain a plurality of portions of said data of data set.
74. The system of claim 71 further comprising: circuitry in said client system configured to read error information for each of said plurality of slices from said storage information received from said server; and circuitry in said client system configured to perform error checking using said error information on said portion of data from each of said plurality of slices received.
75. The system of claim 74 further comprising:
circuitry in said client system configured to disregard a slice responsive to a determination that said slice includes an error and greater than said predetermined number of slices have been received.
76. The system of claim 74 further comprising:
circuitry in said client system configured to perform error recovery responsive to a determination that a slice includes an error.
77. The system of claim 58 further comprising:
circuitry in said client system configured to receive a request to append data to an end of said set of data;
circuitry in said client system configured to request storage information from said server;
circuitry in said client system configured to retrieve a last of X consecutive M length portions of said set of data from said plurality of devices where M is a minimum number of slices re-construct data and X is determined from a total amount of said set of data divided by M;
circuitry in said client system configured to read padding information from said storage information;
circuitry in said client system configured to remove said padding from said last of X portions of data;
circuitry in said client system configured to add data to append to an end of said data in said last of X portions to obtain a new last portion,
circuitry in said client system configured to multiply said new last portion by said conversion transform matrix to determine an append vector,
circuitry in said client system configured to read storage information for each of said plurality of network devices storing each of said plurality of slices; and
circuitry in said client system configured to transmit a request for each yth element of said append vector to one of said plurality of device storing yth one of said plurality of slices with an offset to a last position in said slice.
78. The system of claim 77 further comprising:
circuitry in said client system configured to determine error detection information for each of said plurality of slices updated with one of said yth elements of said append vector, and
circuitry in said client system configured to transmit said error detection information for each said plurality of elements to said server for storage .
79. The system of claim 78 further comprising:
circuitry in said server configured to receive said updated error detection information for each of said plurality of slices, and
circuitry in said server configured to update said error detection information of said each of said plurality of slices with said updated error detection information received.
80. The system of claim 58 further comprising:
circuitry in said client system configured to receive a request to update a portion of data in said data set;
circuitry in said client system configured to determine each of xth one of X consecutive M length portions of data that include updated data, wherein M is a minimum number of slices needed to re-construct data of said set of data and X is determine by dividing a total amount of data in said set of data by M;
circuitry in said client system configured to multiply each of Xth one of X consecutive portions that include updated data by a conversion transform matrix to determine an Xth update vector;
circuitry in said client system configured to read storage information for one of said plurality devices storing each of said plurality of slices; and
circuitry in said client system configured to transmit a request for each yth element of each said xth update vector to one of said plurality of device storing yth one of said plurality of slices with an offset to an xth position of said slice.
81. The system of claim 80 further comprising:
circuitry in said client system configured to determine updated error detection information for each of said plurality of slices; and
circuitry in said client system configured to transmit said updated error detection information for each of said plurality of slices to said server for storage .
82. The system of claim 81 further comprising:
circuitry in said server configured to receive said error updated detection information for each of said plurality of slices; and
circuitry in said server configured to update said error detection information of each of said plurality of slices with said updated error information received.
83. A method for storing data from a client system to a plurality of devices in a cloud network comprising:
receiving a request to store a set of data to memory in a client system;
generating said write request in response to receiving said request to store set of data in said client system;
transmitting said write request to a server;
receiving write information from said server in said client system wherein said write information include said information for each of said plurality of devices in a group of devices that are available to store said set of data from said server;
generating said plurality of slices representing said data in said set of data in said client system;
transmitting each of said plurality of slices from said client system to one of said plurality of devices in said group;
generating storage information for each of said plurality of slices in said client system wherein said storage information includes an identifier of each of said plurality of slices, an identifier of one of said plurality of devices in said cloud network storing said data, and error detection data for each of said slices; and
transmitting said storage information from said client system to said server; receiving said storage information in said server; storage information includes an identifier of each of said plurality of slices, an identifier of one of said plurality of devices in said cloud network storing said data, and error detection data for each of said slices; and
storing said storage information in a memory readable by said server.
84. The method of claim 83 further comprising:
maintaining a list of said plurality of devices connected to said cloud network available to store data in said server.
85. The method of claim 84 further comprising: receiving a first availability message in said server from one of said plurality of network devices indicating said one of said plurality of devices is available to store data; and
storing an identifier of said one of said plurality of devices in said list of available devices maintained by said server.
86. The method of claim 85 further comprising:
reading a list of namespaces from said first availability message wherein said list of namespaces identifies each namespace that may store data on said one of said plurality of devices and a namespace is an identifier for a collection of a plurality of sets of data; and
storing an indication of each namespace in said list of namespaces that may store data on said one of said plurality of network devices.
87. The method of claim 85 further comprising:
determining an Internet Protocol (IP) address for said one of said plurality of devices in response to receiving said first availability message; and
storing said IP address in said server.
88. The method of claim 84 further comprising:
performing an authentication process between said server and said one of said plurality of devices in response to receiving said first availability message.
89. The method of claim 88 further comprising:
transmitting an acknowledgement from said server to said one of said plurality of devices in response to said one of said plurality of devices being successfully authenticated.
90. The method of claim 89 further comprising:
transmitting communication information from said server to said one of said plurality of devices.
91. The method of claim 90 wherein said communication information comprises a secret key.
92. The method of claim 85 further comprising: transmitting a first availability message from said one of said plurality of devices to said server in response to connecting to said network;
receiving communication information from server in said one of said plurality of devices in response to a successful authentication.
93. The method of claim 92 wherein said first availability message includes a list of namespaces wherein said list of namespaces identifies each namespace that may store data on said one of said plurality of devices and a namespace is an identifier for a collection of a plurality of sets of data.
94. The method of claim 92 wherein said first availability message includes an identifier of said device and an Internet Protocol address of said device.
95. The method of claim 84 further comprising:
receiving a write data request for said set of data from said client system in said server; and
transmitting write information including information about each of said plurality of devices in a group of devices that are available to store said set of data from said server to said client system.
96. The method of claim 95 further comprising:
determining said group of devices available to store said set of data by said server.
97. The method of claim 96 wherein said step of determining said group comprises:
reading a namespace for said set of data from said write request; determining each of said plurality of devices in said list of devices that is available to store data for said namespace; and
adding each of said plurality of devices determined to be available to store data for said namespace to said group.
98. The method of claim 95 wherein said information for each of said plurality of devices in a group of devices that are available to store said set of data to said client system includes an Internet protocol address for the device and a device identifier.
99. The method of claim 83 wherein said step of generating said plurality of slices comprises:
applying Rabin's information dispersal algorithm to said data in said set of data.
100. The method of claim 83 wherein said step generating said plurality of slices comprises:
dividing data from said set of data into X consecutive portions of M length where M is determined by a number of a plurality of slices needed to re-construct the data and X is equal to the total amount of data in the set of data divided by M;
multiplying each of xth one of said X consecutive portions of data by a conversion transform matrix to determine an xth resulting vector; and
inserting each yth element of each xth resulting vector in Xth position in yth one of said plurality of slices.
101. The method of claim 100 wherein said step of generating said plurality of slices further comprises:
calculating updated error detection information for each of said plurality of slices; and
writing said updated error detection information for each of said plurality of slices in said storage information transmitted to said server.
102. The method of claim 100 wherein said step of generating said plurality of slices further comprises:
generating said conversion transform matrix from matrix information read from said write information received from said server; and
inserting said conversion transform matrix into said storage information transmitted to said server.
103. The system of claim 100 wherein said step of generating said plurality of slices further comprises:
reading said conversion transform matrix from said write information received from said server.
104. The method of claim 100 wherein said step of generating said plurality of slices further comprises: adding padding data to an end of a last of said X consecutive portions of data; inserting an indication of said padding into said storage information to send to said server.
105. The method of claim 83 further comprising:
storing each one of said plurality of slices to one of said plurality of devices in said network; and
inserting an identification of said one of plurality of devices storing each one of said plurality of slices into said storage information generated by said client system.
106. The method of claim 105 wherein said step of storing each of said plurality of slices comprises:
determining one of said plurality of devices in said network to store one of said plurality of slices from a list of available devices provided in said write information received from said server.
107. The method of claim 106 wherein said step of storing each of said plurality of slices further comprises:
transmitting said one of said plurality of slices from said client system to said one of said plurality of devices in response to said determination.
108. The method of claim 107 wherein said step of storing each of said plurality of slices further comprises:
receiving an acknowledgment in said client system from said one of said plurality of devices in response to transmitting said one of said plurality of slices to said one of said plurality of devices.
109. The method of claim 108 wherein said step of storing each of said plurality of slices comprises:
performing error recovery in response to not receiving said acknowledgement.
110. The method of claim 107 wherein said step of storing each of said plurality of slices comprise:
determining a specified amount of data of said of one of said plurality of slices is available; and transmitting said specified amount of data of said one of said plurality of slices in response to a determination that said specified amount of data is available.
111. The method of claim 83 further comprising:
receiving a request to retrieve a section of said set of data in a client system; generating a read request in said client system wherein said read request includes an identifier for said set of data;
transmitting said read request from said client system to said server, receiving storage information for said set of data from said server in said client system wherein said storage information includes information for each of said plurality of slices including a one of said plurality of devices storing a slice, a order of said slice, and error information for said slice;
reading said information for each of said plurality of slices from said storage information;
transmitting a request for a specified portion of each of said plurality of slices needed to reconstruct said section of said set of data from said client system to said one of said plurality of devices for each of said plurality of slices,
receiving said specified portions of at least some of said plurality of slices from said plurality of devices in said client system;
determining whether said specified portions for at least a predetermined number of slices have been received by said client system; and
re-assembling said data from said at least some of plurality of slices received.
112. The method of claim 111 further comprising:
removing padding from said data responsive to re-assembling said data.
113. The method of claim 111 wherein said step of reassembling said data comprises:
determining each of said plurality of slices received by said client system; reading said conversion transform matrix from said storage information received from said server;
selecting portions of said conversion transform matrix that correspond to each of said plurality of slices received by said client system;
generating an inverse conversion transform matrix from said selected portions of said conversion transform matrix; and multiplying each of said plurality of slices with said conversion transform matrix to obtain a plurality of portions of said data of data set.
114. The method of claim 111 further comprising:
reading error information for each of said plurality of slices received by said client system from said storage information received from said server by said client system; and
performing error checking in said client system using said error information on each of said plurality of slices received.
115. The method of claim 114 further comprising:
disregarding a slice responsive to a determination that said slice includes an error and greater than said predetermined number of slices have been received.
116. The method of claim 114 further comprising:
performing error recovery in said client system responsive to a determination that a slice includes an error.
117. The method of claim 83 further comprising:
receiving a request to append data to an end of said set of data in a client system;
requesting storage information be transmitted from said server to said client system;
retrieving a last of X consecutive M length portions of said set of data where is a minimum number of slices re-construct data and X is determined from a total amount of said set of data divided by M;
reading padding information from said storage information;
removing said padding from said last of X portions of data;
adding data to append to an end of said data in said last of X portions to obtain a new last portion;
multiplying said new last portion by said conversion transform matrix to determine an append vector,
reading storage information for each of said plurality of network devices storing each of said plurality of slices; and transmiting a request for each y element of said append vector to one of said plurality of device storing yth one of said plurality of slices with an offset to a last position in said slice.
118. The method of claim 117 further comprising:
determining error detection information for each yth element of each of said plurality of slices responsive to adding each element of said resulting vectors to corresponding slices in said client system; and
transmitting said error detection information for each of said plurality of slices from said client system to said server .
119. The method of claim 118 further comprising:
receiving said error detection information for each of said plurality of slices, and
updating said error detection information of each of said plurality of slices with said received error detection information.
120. The method of claim 83 further comprising:
receiving a request in said client system to update a portion of data in said data set;
determining each of x,h one of X consecutive M length portions of data that include updated data, wherein M is a minimum number of slices needed to reconstruct data of said set of data and X is determine by dividing a total amount of data in said set of data by ;
multiplying each of Xth one X consecutive portions that include updated data by a conversion transform matrix to determine an xth update vector;
reading storage information for one of said plurality devices storing each of said plurality of slices; and
transmitting a request for each yth element of each said xth update vector to from said client system one of said plurality of devices storing yth one of said plurality of slices with an offset to an xth position of said slice.
121. The method of claim 120 further comprising:
determining updated error detection information for each of said plurality of slices responsive to each of said elements of said update vector in said client system; and transmitting said updated error detection information for each of said plurality of slices from said client system to said server for storage as error detection data for each of said plurality slices.
122. The method of claim 121 further comprising:
receiving said error updated detection information for each of said plurality of slices in said server; and
updating said error detection information of said each of said plurality of slices stored in said server with said updated error detection information.
PCT/SG2011/000138 2011-04-04 2011-04-04 Method and system for storing data in a cloud network WO2012138296A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
SG2013071840A SG193616A1 (en) 2011-04-04 2011-04-04 Method and system for storing data in a cloud network
PCT/SG2011/000138 WO2012138296A1 (en) 2011-04-04 2011-04-04 Method and system for storing data in a cloud network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SG2011/000138 WO2012138296A1 (en) 2011-04-04 2011-04-04 Method and system for storing data in a cloud network

Publications (1)

Publication Number Publication Date
WO2012138296A1 true WO2012138296A1 (en) 2012-10-11

Family

ID=46969451

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2011/000138 WO2012138296A1 (en) 2011-04-04 2011-04-04 Method and system for storing data in a cloud network

Country Status (2)

Country Link
SG (1) SG193616A1 (en)
WO (1) WO2012138296A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10983957B2 (en) 2015-07-27 2021-04-20 Sas Institute Inc. Distributed columnar data set storage
WO2021101798A1 (en) * 2019-11-18 2021-05-27 Sas Institute Inc. Distributed columnar data set storage and retrieval

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020048284A1 (en) * 2000-02-18 2002-04-25 Moulton Gregory Hagan System and method for data protection with multidimensional parity
KR20100016057A (en) * 2007-05-04 2010-02-12 마이크로소프트 코포레이션 Mesh-managing data across a distributed set of devices
KR20100122197A (en) * 2009-05-12 2010-11-22 주식회사 클루넷 Cloud computing network system and file distrubuting method of the same
KR20110028968A (en) * 2009-09-14 2011-03-22 고려대학교 산학협력단 Method for verifying the integrity of a user's data in remote computing and system thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020048284A1 (en) * 2000-02-18 2002-04-25 Moulton Gregory Hagan System and method for data protection with multidimensional parity
KR20100016057A (en) * 2007-05-04 2010-02-12 마이크로소프트 코포레이션 Mesh-managing data across a distributed set of devices
KR20100122197A (en) * 2009-05-12 2010-11-22 주식회사 클루넷 Cloud computing network system and file distrubuting method of the same
KR20110028968A (en) * 2009-09-14 2011-03-22 고려대학교 산학협력단 Method for verifying the integrity of a user's data in remote computing and system thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10983957B2 (en) 2015-07-27 2021-04-20 Sas Institute Inc. Distributed columnar data set storage
WO2021101798A1 (en) * 2019-11-18 2021-05-27 Sas Institute Inc. Distributed columnar data set storage and retrieval

Also Published As

Publication number Publication date
SG193616A1 (en) 2013-11-29

Similar Documents

Publication Publication Date Title
US10303549B2 (en) Dispersed storage network with access control and methods for use therewith
EP3438903B1 (en) Hierarchical network system, and node and program used in same
US11657171B2 (en) Large network attached storage encryption
RU2473112C2 (en) Creation and deployment of distributed extensible applications
US8761167B2 (en) List range operation dispersed storage network frame
US8788831B2 (en) More elegant exastore apparatus and method of operation
US8751788B2 (en) Payment encryption accelerator
AU2018430192A1 (en) Blockchain system and method
CN110011981B (en) Trusted cloud storage method and system based on block chain
US20080077803A1 (en) System and method for cryptographic data management
US10652350B2 (en) Caching for unique combination reads in a dispersed storage network
US9898474B1 (en) Object sharding in a host-side processing device for distributed storage
US9652487B1 (en) Programmable checksum calculations on data storage devices
EP3744071B1 (en) Data isolation in distributed hash chains
US10476663B1 (en) Layered encryption of short-lived data
CN109845183A (en) For from client device to the method for cloud storage system storing data block
CN113886743B (en) Method, device and system for refreshing cache resources
Giri et al. A survey on data integrity techniques in cloud computing
CN113411404A (en) File downloading method, device, server and storage medium
CN104348624A (en) Method and device for authenticating credibility through Hash operation
US11190353B2 (en) Computer implemented methods and systems for managing a cryptographic service
US20110154015A1 (en) Method For Segmenting A Data File, Storing The File In A Separate Location, And Recreating The File
WO2012138296A1 (en) Method and system for storing data in a cloud network
US11356254B1 (en) Encryption using indexed data from large data pads
US20080091955A1 (en) System and method for rotating data in crypto system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11862908

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11862908

Country of ref document: EP

Kind code of ref document: A1