GB2574076A - Distributed data storage - Google Patents

Distributed data storage Download PDF

Info

Publication number
GB2574076A
GB2574076A GB1815423.7A GB201815423A GB2574076A GB 2574076 A GB2574076 A GB 2574076A GB 201815423 A GB201815423 A GB 201815423A GB 2574076 A GB2574076 A GB 2574076A
Authority
GB
United Kingdom
Prior art keywords
node
data block
nodes
data
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1815423.7A
Other versions
GB201815423D0 (en
GB2574076B (en
Inventor
Mavaddat Matin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nationwide Building Society
Nationwide Building Soc
Original Assignee
Nationwide Building Society
Nationwide Building Soc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nationwide Building Society, Nationwide Building Soc filed Critical Nationwide Building Society
Priority to GB1815423.7A priority Critical patent/GB2574076B/en
Publication of GB201815423D0 publication Critical patent/GB201815423D0/en
Publication of GB2574076A publication Critical patent/GB2574076A/en
Application granted granted Critical
Publication of GB2574076B publication Critical patent/GB2574076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
    • H04L67/1076Resource dissemination mechanisms or network resource keeping policies for optimal resource availability in the overlay network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/123Applying verification of the received information received data contents, e.g. message integrity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0894Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/14Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using a plurality of keys or algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
    • H04L67/1078Resource delivery mechanisms
    • H04L67/108Resource delivery mechanisms characterised by resources being split in blocks or fragments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0816Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
    • H04L9/085Secret sharing or secret splitting, e.g. threshold schemes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/50Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Storage Device Security (AREA)

Abstract

A method of sending a data block 12 from a first node 10 to a plurality of other nodes 30, 40 for storage. The first node has a first node distribution key 22. The method further comprises computing a set of other nodes using the first node distribution key and an identifier 19 of the data block as inputs into a distribution function 29. The method also comprises sending the data block from the first node to T nodes from the computed set 24 of other nodes for storage. The method may also comprise encrypting the data block to form an encrypted data block 18 before distribution to the other T nodes for storage. The encryption may be performed with an encryption key 16 that also has a corresponding encryption key identifier 11. Computing 20 the set of nodes may also be based on the encryption key identifier. The encryption key may be at least as long as the length, L, of the data block and encryption performed through a one-time pad type modular addition or XOR operation of the data block and encryption key. The data block may also be pre-encrypted using a pre-encryption private key.

Description

Distributed data storage
Field of the invention
The present invention relates to distributed data storage. More particularly, embodiments of the present invention relate to systems and methods that allow a node in a system to securely store data in a cloud of other nodes even when some of these other nodes may be malicious or compromised in some way.
Introduction
People are increasingly using computers for business and personal use. Such users are finding they have increasing amounts of computer based personal information and data. While some kinds of data may typically be stored at a user computing device, it is desirable to have a back-up copy of this information. This means that if a user’s information is deleted for any reason, the user is able to reobtain their information and continue using their computing device as if the information had never been lost.
Typically user data can be backed up using a trusted third party server to store the information. However, this places a large load requirement on servers which may have to store large amounts of information for multiple users, as well as on data networks and the user computing devices themselves. Furthermore, governments are increasingly adding regulations on how secure data should be stored, increasing the burden on parties running secure servers for such purposes. As such, it is beneficial to look for alternative methods of storage. This may particularly be the case for historical personal information, such as old financial transactions, that a user/secure server may no longer be required by law to store but the user may wish to access.
While it is possible to store data in “the cloud” i.e. at several other computing devices, these computing devices are often untrusted or at least might be subject to compromise. Data can be encrypted before it is sent to the cloud. However, this carries a risk since most encryption methods are at least semi-vulnerable to offline attacks or other brute force attacks. Such attacks may become more of a risk over time, for example as quantum computers, able to factor efficiently and hence break RSA encryption and other encryption schemes, become more viable.
-2Furthermore, malicious parties may attempt to work out where in “the cloud” particular data was sent and request it back in an attempt to acquire user data which they should not be able to access. As such, there is a risk of offline attack from third parties as well as from the computers in the cloud storing data.
It would be desirable to provide a method of securely storing data across multiple nodes, for example in a cloud configuration, in a manner that prevents third parties from working out where the data is stored, for example in which of a large number of possible nodes. In particular, it would be desirable to store the data in a manner that makes it less vulnerable to offline and brute force attacks than other cloud storage methods. Storing the data in the cloud in such a manner may provide further benefits of reducing the burden on secure servers while still allowing a user to store their data securely.
While some embodiments of the invention can be used to implement cloud arrangements for back-up storage, methods and systems described in this document may also be used for primary storage for example where a node uses the described techniques for primary storage instead of or as well as local storage for particular data.
Summary of invention
The invention generally relates to distributed data storage. Embodiments of the invention provide for a new type of infrastructure that assures confidentiality, integrity and availability of data using commodity hardware (i.e. user computing devices) and peer to peer network communication in an untrusted environment in which malicious and unreliable computing devices might be present.
Benefits of being able to use commodity hardware in people's homes via peer to peer networks to create cloud storage arrangements include improved scalability and lower cost. However, such cloud storage arrangements present a number of security and reliability obstacles. For example, assurance is required that the integrity, confidentiality and availability of stored data cannot be compromised even in the presence of malicious, colluding actors and unreliable nodes.
The present invention seeks to address problems and limitations of the related prior art.
-3In accordance with one aspect of the present invention a method of sending a data block from a first node to a plurality of other nodes for storage is provided. The first node comprises, has or has access to a first node distribution key. The method comprises encrypting the data block to form an encrypted data block, the encrypted data block having an identifier. The method further comprises computing a set of other nodes using the first node distribution key and the identifier of the encrypted data block. The method also comprises sending the encrypted data block from the first node to T nodes from the set of other nodes for storage at the T nodes.
The set of nodes can be computed in a manner that means it would be hard/ difficult/ infeasible/ computationally hard/computationally difficult/computationally infeasible for any party that does not have access to the first node distribution key to recreate the set of nodes. In other words the set of nodes cannot easily be recreated without access to the first node distribution key even if the algorithm/function used to compute the set is known. The computing a set of nodes can be a computation that means it would be (computationally) hard/difficult/infeasible for the computation to be performed without the distribution key. In other words, a third party who does not have access to the distribution key would not be able to compute the set of nodes. The set of nodes may appear random to any party that does not have access to the distribution key.
The set of nodes can be computed in any suitable manner for example using a secure pseudo-random number generator, a block-cipher, series of hash functions or any other suitable one way function.
In the example described above, the first node can be any computing device for example a user computing device such as a mobile computing device (such as a mobile telephone, tablet device or laptop), a desktop computing device, and so forth.
The first node distribution key can be any suitable secret key known by or available to the first node. In some examples, the distribution key is also or instead stored at a secure server so that it can be accessed when the first node loses information. The distribution key may be directly stored by the first node/secure server or it may be derived from another suitable secret key using a known or conventional key derivation technique whenever it is needed.
-4The data block can be encrypted using various known encryption technique. The identifier of the encrypted data block is also referred to as an encrypted data block identifier. The identifier of the encrypted data block is a reference number or identifier that is unique from the perspective of the first node in respect of this and other data blocks encrypted and distributed in the same way at least by the first node When the term reference number is used there is no restriction on the reference number being numerical and any suitable characters can also form part of the reference number. Any other such data blocks/data chunks owned by the first node will have a different identifier. The identifier can take any suitable form and may form part of the encrypted data block or may be merely associated with the encrypted data block.
The other nodes of the set the encrypted data block is sent to can be other user computing devices that together form part of a cloud of available nodes. Additionally, one or more of the nodes may be a server. In described embodiments, when a server is used as a node it will generally be treated in the same fashion as other user device nodes and not considered to be any more or less secure. The set of nodes can be created in any suitable fashion that meets the requirements listed above. While the term set of nodes is used, the set of nodes could also be considered to be a list of nodes or another group of nodes. When the term list is used, no ordering should be assumed. Example methods of how to select the nodes are discussed below.
The number of nodes the encrypted data block is sent to, T, can be selected in various suitable ways. In some embodiments, T is simply a fixed constant for example set in advance or agreed between the nodes. In other examples T can be calculated based on the probability of nodes being offline and so data stored at these nodes being inaccessible.
The encrypted data block can be sent along with the identifier of the encrypted data block so that the other node can identify the encrypted data block and return it when requested. In other examples an identifier for the data block (discussed in more detail later) is sent instead of, or as well as, the identifier of the encrypted data block. However, any other suitable method of allowing the other node to identify the encrypted data block could also be used. The first node can send its public key alongside any encrypted data block to allow later verification.
-5The encrypted data block is sent to more than one node to allow for the risk or expectation that some of the nodes encrypted data blocks are sent to will be offline, or will have lost the data, or will no longer exist, when the first node later attempts to recover the data. Furthermore, it is assumed that some of the nodes may be offline or otherwise unavailable when the first node sends data to that node. In such a case, the data sent to the node may not be stored. However, the encrypted data block is preferably stored at all nodes that receive the encrypted data block. As such, T is used here to add some assurance that the data will be returned to the first node if the first node requests its data.
Encrypting the data block can comprise encrypting the data block using one or more encryption keys wherein each encryption key has a corresponding encryption key identifier. The method can further comprise, for each of the one or more encryption keys: computing a set of other nodes using the first node distribution key and the encryption key identifier; and sending the encryption key from the first node to T nodes from the set of other nodes for storage.
The encryption keys can take any suitable form. For example, any suitable form of symmetric-key encryption may be used to encrypt the data block and the encryption keys can be secret/private keys for this form of encryption. Example symmetric-key encryption algorithms include such as Twofish, Serpent, AES, Blowfish, CAST5, Kuznyechik, RC4, 3DES, Skipjack, Safer+/++ and IDEA. Each encryption key may be used to encrypt the data block using a different encryption algorithm. Therefore, the encryption keys do not all need to have the same form. Preferably the encryption keys are used to both encrypt and decrypt data. However, in other examples, the data could be encrypted using a public key and the private key used for decryption could be the encryption key sent to other nodes for storage.
As with the encrypted data block identifier, the encryption key identifiers can take any suitable form and can be stored as part of the encryption keys or separately from the encryption keys. The encryption key identifiers should be unique from the perspective of the first node and should differ from each other and from the encrypted data block identifier. The encryption key identifiers can also be referred to as encryption key reference numbers although this should not be considered to limit them to numerical characters.
-6Each encryption key can be sent along with the corresponding encryption key identifier so the other node can identify the encryption key and return it when requested. In other examples an identifier for the data block (discussed in more detail later) is sent instead of, or as well as, the identifier of the encryption keys. However, any other suitable method of allowing the other node to identify the encryption keys could also be used. The first node can send its public key alongside any encryption key to allow later verification.
Each encryption key is sent to T other nodes where the set of nodes each encryption key is sent to is determined separately so that the set of nodes is different. The set of nodes each encryption key is sent to can be edited to remove any nodes that encrypted data block or previous encryption keys are sent to.
The data block can comprise a string of length L; at least one of the one or more encryption key can comprise a random or pseudo-random string of at least length L; and encrypting the data block can comprise using the at least one of the one or more encryption keys as a one-time pad to encrypt the data block.
The strings that form the data block and the encryption keys can take any suitable form i.e. they can be decimal, binary, hexadecimal strings etc. However, it is preferably that the strings that form the encryption keys have the same form as the string that forms the data block. The length L is likely to be determined by the length of the data block. The size of the data block could be fixed and the first node can split any data it wishes to store into blocks of this fixed size. Alternatively, the data could all be stored as a single data block and the size of the data block can merely be the size of the data the first node wishes to store. However, how L is determined is not that important. Instead the length is mentioned since for one-time pad cryptography it is preferable that the encryption key is at least the same length as the data to be encrypted. It is possible for the encryption key to be longer than the data being encrypted and for the other characters in the encryption key to not be used for encryption.
One-time pad encryption is beneficial since it is resistant to offline attacks by any node that does not have both the encrypted data block and the encryption key. This increases the security of the stored data.
-7Using an encryption key as a one-time pad can comprise performing modular addition between corresponding bits of the data block and the encryption key. Using an encryption key as a one-time pad can also comprise performing an XOR operation between corresponding bits of the data block and the encryption key.
While other one-time pad cryptographic methods may be known, this a standard operation used for one-time pad cryptography. Modular addition can be used with any suitable form of string. The XOR operation can be used when the strings that form the data block and encryption keys are binary strings. In such a case, the decryption process also involves performing an XOR operation.
As mentioned above, preferably a third party re-computing a set of nodes without the first node distribution key is computationally infeasible.
Computationally infeasible means that it is infeasible, difficult or hard for a third party to recreate the set of nodes without access to the first node distribution key. In other words it would be computationally difficult or computationally hard for a third party to determine the set of nodes without access to the first node distribution key. More particularly, a third party who did not have access to the first node distribution key would be unable to recreate the set of nodes. This prevents any third party working out where the encrypted data block and/or the encryption keys have been sent for storage.
The above limitation preferably applies to both the sets of nodes that the encryption keys are being sent to and the set of nodes the encrypted data block is being sent to. However, in other situations it may just apply to one of these sets of nodes.
The computing of a set of other nodes can comprise using a secure pseudo-random number generator wherein a seed used by the secure pseudo-random number generator is generated using the first node distribution key and the identifier of either the encrypted data block or the one or more encryption keys.
While a secure pseudo-random number generator is mentioned where necessary a general pseudo-random number generator could be used. The secure pseudo-random number generator can output numerical or any other form of characters. The random number generator is a pseudo-random number generator to ensure that the set can be recreated
-8when the same seed is used. The use of a secure pseudo-random number generator with a known seed therefore achieves the recreatable sets that cannot be recreated by a third party that does not have access to the first node distribution key. In one example, the seed that is a function of the identifier and the distribution key is the only seed of the secure pseudo-random number generator. The function of the identifier and distribution key for each encrypted block/encryption key should be unique compared to the seed created from the identifier of other encrypted blocks/encryption keys. This ensures that the set of T nodes created is different for each encrypted block/key.
Preferably, the secure/computationally secure pseudo-random number generator is used is used to determine the sets of nodes that both the encrypted data block and the encryption keys are sent to. However, in other examples, the secure pseudo-random number generator can be used for determining just a set of nodes that the encrypted data block is sent to or just a set of nodes the encryption keys are sent to.
The data block can have a data block identifier. The encrypted data block and the one or more encryption keys can be provided with a reference representing their position in an ordered list; and the identifier for each of the encrypted data block and the encryption keys can be a hash function of the data block identifier and the reference representing the position of the encrypted data block or the one or more encryption keys within the ordered list.
When more than one data block is used, it is helpful to have identifiers for the data blocks that enable the data blocks to be referred to and identified. These identifiers can take any suitable form provided each data block has a different identifier. In one example these identifiers can be consecutive numbers. In another example they can be randomly generated numbers. In yet another example, they can be a hash function of the contents of the data block.
Placing the encrypted data block and the encryption keys in an ordered list does not necessarily moving them in storage. Instead it involves providing the encrypted data block and encryption keys with a reference where each encrypted data block/encryption key has a unique reference. In some examples these references can also be called numbers although this should not necessarily limit them to numerical characters. Preferably the references are sequential or increase following a predictable and known formula.
-9Before the hash function is performed, the reference representing the position of the encrypted data block/encryption key within the ordered list and the data block identifier are combined. The reference for each data block/encryption key can be combined with the data block identifier in any suitable way provided the result is unique for each data block/encryption key. For example the two can be appended together. Once the two have been combined they can be sent to a hash function and the results of this can be used as the identifier for the encrypted data block/encryption key.
The first node can have a pre-encryption key and the method can further comprise preencrypting the data block using the pre-encryption key before encrypting the data block.
The pre-encryption key can be a public key; and pre-encrypting the data block can comprise using a public-private key encryption algorithm to pre-encrypt the data block. Alternatively, the pre-encryption key can be a private key and pre-encrypting the data block can comprise using a symmetric key encryption algorithm to pre-encrypt the data block.
The pre-encryption is preferably performed before the data block is encrypted using the encryption keys. However, in other examples it could also be performed after the data block is encrypted using encryption keys. In one example where the pre-encryption is performed after encrypting the data block using the encryption keys, the encryption keys also undergo pre-encryption using the same pre-encryption algorithm as the data block.
The symmetric key algorithm mentioned could be any suitable symmetric key encryption algorithm such as Twofish, Serpent, AES, Blowfish, CAST5, Kuznyechik, RC4, 3DES, Skipjack, Safer+/++ and IDEA. Similarly, if public-private key encryption is used, any suitable public-private key encryption algorithm such as EIGamal, elliptical curve cryptography, lattice-based cryptography, McEliece cryptosystem, multivariate cryptography, Paillier cryptosystem, RLCE, RSA, and Cramer-Shoup cryptosystem or any other suitable cryptographic system.
The private key of the pre-encryption algorithm or the private key that corresponds to the public key of the pre-encryption algorithm can be stored at the first node and/or at a secure server. Alternatively, in some scenarios it may be sent to other nodes in the system for storage.
- 10The method can further comprise post-encrypting the encrypted data block and/or the one or more encryption keys after encrypting the data block using a public key of the node from the set of other nodes to which that encrypted data block or the one or more encryption keys is being sent.
Each encrypted data block/encryption key is sent to multiple nodes. In order to perform the post-encryption a copy of the encrypted data block/key can be created for each node and this copy can be encrypted using the public key of the node it is going to be sent to. The node that receives the encrypted data block/key can decrypt the data block/encryption key using its private key before it stores the encrypted data block/key. The additional post encryption helps reduce the risk of a node intercepting communications from a first node in order to obtain multiple data blocks/ encryption keys.
T can be calculated based upon an acceptable probability of the data block being permanently unavailable and an acceptable probability of the data block being temporarily unavailable.
The acceptable probability of a data block being temporarily or permanently unavailable can be determined based on the form of data being stored. This is something that a skilled person setting up the system would need to decide based on the users and importance of the data. T can also depend upon the probability of a node being offline or otherwise unable to receive the encrypted data block and/or encryption keys when they are sent to that node. Specific methods for calculating T are provided in more detail in the Detailed Description.
The method can further comprise pre-processing the data block in order to allow a user to verify that the data block has not been modified.
Any known or standard form of pre-processing to allow verification would be suitable. The verification would be performed once the first node has reobtained its data from storage when it wants to confirm that none of the node storing the data modified the data in any way.
- 11 The method can be performed on multiple data blocks and the pre-processing can comprise arranging the data blocks in a hash tree before encryption of the data blocks.
Arranging the data blocks in a hash tree does not necessarily involve performing any physical rearrangement of the data blocks. Instead, arranging the data blocks in the hash tree can simply involve processing the data blocks so they are stored in a hash tree wherein each block in the hash tree is provided with a hash of the previous block in the tree. The hash tree can also be called a private block-chain. When a node wishes to verify that the blocks have not been modified, the node can retake the hashes and compare them to the original hashes. If the hashes differ then the node has reason to suspect that at least one block has been modified.
The pre-processing can comprise before encrypting the data block performing a hash function on the data block and using the results of the hash function as an identifier for the data block.
Any suitable secure hash function can be used to take the hash function of the data block. The data block identifier may be made available publically so it should not be possible to recreate the contents of the data block from the hash function. The pre-processing can be performed before or after any pre-encryption. Although the pre-processing is preferably done before the data block is encrypted, it could also be done after the data block is encrypted. In such a case, the encryption keys may also undergo pre-processing.
As mentioned later, the identifier for the data block may be published on a public register. Alternatively, the first node will retain this data or store it at a secure server. As such, the first node will have access to the data block identifier. When the data block is returned the first node can take a new hash of the data block and compare this to the identifier to confirm the data block has not been modified.
The method can further comprise: receiving from at least one of the other nodes from each of the sets of other nodes that received data in the form of either the encrypted data block or an encryption key an acknowledgement of receipt of the data; and publishing the received acknowledgements as a transaction on a public ledger.
- 12 The public ledger can be considered public in that it is accessible by both the first node and each of the other nodes to which data is sent. As such, public in the case does not mean the general public can access the ledger, just that each node in the system can access the ledger. The public ledger is also known as a public register.
The first node may receive an acknowledgment from every node it sent data to. However, this is not always the case. This is because, in some examples, the first node will send data to a node that is offline. Alternatively the data transfer will be interrupted in some other way. As such, the acknowledgments of receipt will only be received from nodes that actually received data.
Publishing the acknowledgments on the public register allows all the nodes and, where applicable, a registrar to monitor how much data each node is distributing for storage. This allows the registrar to take action against a node if it is publishing too much data. As such, the public register can be used to prevent any node flooding the system.
The public ledger can be a block chain. The use of a block chain allows the validity of the register to be monitored even when the register is not stored centrally.
The first node can have a node identifier; and the method can further comprise: publishing the first node identifier in the transaction on the public ledger along with the acknowledgments.
The node identifier of the first node can be a public key of the first node. Alternatively it can be a string or other identifier that allows identification of the first node but which cannot be considered to be a key. For example each node in the system may be provided with a node identifier and the first node identifier can be this node identifier. Publishing the node identifier in its transaction allows the first node and other nodes to identify which transactions belong to the first node. The node identifier could also be considered to be a node number. However, when the term node number is used it should not considered to be limiting the node identifier to numerical characters.
This helps when the first node is recovering its data. It also allows the registrar or any other node to identify when a particular node is trying to flood the system by sending too many data blocks and/or encryption keys to storage.
- 13The data block can have a data block identifier; and the method can further comprise: publishing the data block identifier in the transaction on the public ledger along with the acknowledgments.
The data block identifier can be as described previously. For example, the data block identifier can be a hash of the contents of the data block to allow for later verification. Publishing the data block identifier on the public register allows a first node to reobtain these identifiers when it loses all its data. The data block identifier can also be called a data block reference number. However, this should not be considered to limit the data block identifier to numerical characters.
Publishing the acknowledgements can comprise: publishing the acknowledgements received from each set of other nodes as separate transactions.
More specifically, this involves the first node publishing a separate transaction on the public register for the acknowledgements returned with respect to each encrypted data block/encryption key. As such, the acknowledgements returned with respect to the encrypted data block can be published as a first transaction, the acknowledgments returned with respect to a first encryption key can be published a second transaction etc.
This ensures the public register can be used to monitor both (a) how many data blocks/encryption keys a first node is sending for storage and (b) how many nodes a first node sends each data block/encryption key to. This enables more accurate monitor of the nodes and makes it easier to identify when a node is trying to flood the system with junk data blocks.
Publishing the acknowledgements on the public ledger can comprise publishing up to M acknowledgements on the public ledger for each set of other nodes and discarding any other acknowledgements. M can be a minimum number of nodes that must store data in order to achieve a desired level of availability for the data.
As mentioned above an encrypted data block or an encryption key is sent to one set of T nodes. As such, publishing M acknowledgements for each set of T nodes comprises publishing M acknowledgements for each encrypted data block or encryption key.
- 14Preferably T is chosen so that if an encrypted data block/encryption key is sent to T nodes at least M acknowledgments will always be returned. If less than M acknowledgments are returned then the skilled person would understand that in such a case less than M acknowledgments are published for each set of T nodes.
The M acknowledgements published on the register can be chosen in any suitable fashion. For example they can represent the first M acknowledgements returned to the first node. Alternatively they can be chosen at random. M can be calculated based on the probability of nodes being either permanently or temporarily offline when the first node requests its data back. This is discussed in more detail in the detailed description.
The method can further comprise at each of the other nodes to which data is sent: sending an acknowledgement of receipt of data to the first node; searching the public ledger for the acknowledgement of receipt; storing the data if the acknowledgement of receipt is found on the public ledger; and deleting the data if the acknowledgement of receipt is not found on the public ledger.
The other nodes will send acknowledgements of receipt when they receive data. If they do not receive data they will not send an acknowledgment of receipt. The acknowledgements of receipt may take any suitable form. However, preferably, they do not allow the identity of the node that received data to be determined.
The other node which received data may find the acknowledgement on the public ledger in any suitable way. In the simplest example, the other node simply searches the public register for its acknowledgment for a predefined period of time after it has sent the acknowledgment. In other examples it can search transactions using an identifier of the first node and only look at these transactions for the acknowledgment. The other node may search all transactions but in one example it only searches for transactions that were posted after it published the acknowledgment of receipt.
Only publishing a limited number of nodes and deleting data if that data does not correspond to a published transaction has the advantage that it is clear how much data each node is distributing. This also prevents a node distributing more data than necessary in order to flood the storage.
- 15For each encrypted data block and/or encryption key: sending the encrypted data block or the encryption key from the first node to a set of other nodes for storage can further comprise sending a proof of storage request along with the encrypted data block or the encryption key wherein the proof of storage request comprises a token; and receiving an acknowledgement of receipt from a particular node can comprise receiving the token sent to that particular node. A different token can be sent to each node.
While a different token can be sent to each of the other nodes, in some examples the same token may be sent to some of the other nodes or all of the other nodes.
The skilled person would understand that the first node may not receive T acknowledgments. Instead only some of the T other nodes may receive the data they are sent and hence return an acknowledgment.
The token in each storage request can take any suitable form provided the token is only known by the first node before it is sent to the other node. In other words, the token can be a secret that the first node sends to the other node. In one example the token or secret is randomly or pseudo-randomly generated. Preferably the token should be long enough to ensure a reasonable probability that it does not overlap with any other tokens.
The use of a token allows the first node to tie the acknowledgement of receipt back to the storage request it sent without revealing any information in the storage request that would allow the receiving node to work out where other data blocks were sent. It also prevents nodes from sending false acknowledgements when they have not received any data since the first node would be able to determine that such acknowledgments were not tied to any storage request.
The proof of storage request can further comprise a public key of the first node. This allows the receiving node to identify the first node sending the data and storage request.
Receiving an acknowledgement of receipt can comprise receiving an additional string along with the token; and publishing the acknowledgements on the public ledger comprises publishing the received additional strings on the public ledger.
- 16As with the token, the additional string can take any suitable form provided it is known only by the other node before it is sent to the first node. As such, the additional string can be another secret or an additional secret. Once again the additional string can be random. Preferably the additional string should be long enough to ensure a reasonable probability that it does not overlap with any other additional strings. The additional string may also be referred to as an additional number. This should not be considered to limit the string to numerical characters.
The use of the additional string allows the other nodes to easily identify their acknowledgments of receipt on the public register.
Sending a proof of storage request can comprise encrypting the proof of storage request sent to a particular node with the public key of that particular node before sending.
In other words, if the proof of storage request is sent to a specific node, the public key of that specific node is used to encrypt the proof of storage request. Each proof of storage request may be sent to a different node and encrypted with the public key of the corresponding node it is being sent to.
This ensures that only the node that the proof of storage request is for can access the request and hence the token. This makes it harder for other nodes to intercept messages and then pretend that the data has been stored at the desired node.
Sending a proof of storage request can further comprise signing the proof of storage request using private key of the first node before sending the proof of storage request.
The proof of storage request can be signed using any suitable known or standard signing algorithm. This signing enables other nodes to verify that it is indeed the first node which sent the proof of storage request and hence the encrypted data block or encryption key.
The method can further comprise: monitoring a public ledger for transactions that indicate data at a node is now permanently unavailable wherein data at a node is permanently unavailable when the node storing the data is either permanently offline or has lost its data, and if either an encrypted data block or an encryption key has been sent to more than T
- 17nodes at which data is permanently unavailable redistributing that encrypted data block or encryption key to a new set of other nodes using the method of any previous claim.
As before, the public ledger is public in that it is accessible by both the first node and each of the other nodes to which data is sent. The public ledger does not necessarily need to be accessible to the general public. In one example, the public ledger here is the same public ledger as previously mentioned.
Redistributing the encrypted data block or encryption key can comprise creating new random encryption keys, encrypting the data block using these new encryption keys and then distributing the new encrypted data block and encryption keys as previously described. This means nodes that falsely claim to have lost their data in the hope of obtaining more than one encrypted data block or encryption key do not receive this additional data.
Alternatively, only the data block or encryption key at more than T’ other nodes may be redistributed. In this case the set of nodes the data block or encryption key is being sent to may be edited to ensure that it does not include any nodes that the encrypted data block or encryption keys had previously been sent to.
T’ can be determined in any suitable way. In one example T’ is agreed in advanced between the nodes and may be a fixed number. Alternatively T’ may be depend upon T and M or may be re-agreed between nodes as T and M change.
The method may further comprise receiving, at a registrar for the public ledger, a heartbeat message from the first node and each of the other nodes to which data is sent at regular time intervals. If the registrar has failed to receive the heartbeat message from a particular node for a defined length of time, publishing a transaction on the public register indicating the particular node is permanently offline and is thus permanently unavailable.
The registrar can be a central registrar or it could be any other scheme for administrating the public ledger. The heartbeat messages can be received at any suitable time interval, for example every 24 hours. The defined length of time to not receive a heartbeat message can be chosen depending on the form of the node. For example it may be longer
- 18for a desktop computer than a mobile phone. In one example suitable for a mobile phone, it may be five days.
The use of heartbeat messages allows a registrar to easily identify any permanently offline nodes. This enables nodes to determine when data is stored at nodes unable to return that data so that they can redistribute that data as described above. This reduces the risk of data stored in the system of nodes becoming permanently unavailable and so unrecoverable.
The method may further comprises publishing by any node in the system that has lost data a transaction on the public register indicating that the node has lost its data. In one example a node publishing on the public ledger that it has lost data comprises: the node informing the registrar that it has lost data; and the registrar publishing, on the public ledger, the transaction that indicates the node has lost data.
This enables other nodes to identify when their data is stored at a node that has lost data. Using a registrar to publish the transaction allows these transactions to be monitored and any suspicious transactions to be flagged.
The method may further comprise: obtaining by the first node the encrypted data block sent for storage. This can be done by: re-computing the set of other nodes using the first node distribution key and the identifier of the encrypted data block; sending a request for the encrypted data block to at least one of the nodes from the set of other nodes; obtaining the encrypted data block from at least one of the set of other nodes; and decrypting the encrypted data block.
The skilled person would understand that the set of other nodes can be created in the same fashion as originally used to create the set of other nodes. Therefore the set of other nodes is deterministic when the encrypted data block identifier and the first node distribution key is known but cannot be easily created by any other node or third party in the system.
Obtaining the encrypted data block comprises receiving the data block from at least one of the nodes in the set of nodes. The encrypted data block will only be received from a node after a request for the data block has been sent to that node. While the method may
- 19comprise only requesting the encrypted data block from a single node, in one example a request for the encrypted data block is sent to all of the T nodes from the set that the encrypted data block was originally sent to. Not all the nodes that receive a request for the encrypted data block will necessarily return the data block. This may be because they are offline or unavailable but it may also be because they have been somehow compromised and so do not respond to the request. Alternatively, a node may have lost its data and may respond to the request for the encrypted data block without returning the encrypted data block.
The above method allows the first node to obtain any encrypted data blocks it has sent for storage. This should allow it to recreate its data when it has lost its data.
The method may further comprise obtaining by the first node the one of more encryption keys sent for storage by, for each of the one or more encryption keys. This can be done by re-computing the set of other nodes using the first node distribution key and the encryption key identifier; sending a request for the encryption key to at least one of the nodes from the set of other nodes; and obtaining the data block from at least one of the set of other nodes.
Obtaining the one or more encryption keys can comprise receiving each of the one or more encryption keys from at least one node in the set of nodes to which that encryption key was sent. Each encryption key only needs to be received from a single node. However, in some examples, each encryption key may be received from multiple nodes. A node may not necessarily respond to a request for an encryption key either because the node is somehow unavailable or because it has lost its data or it has been compromised and so refuses all requests. In such a case, the encryption key will need to be obtained from another node at which it was stored.
As with the above, the sets of other nodes are recreated in the same fashion as the sets were originally created. While the method may comprise only requesting the encryption key(s) from a single node, in one example a request for the encryption key(s) is sent to all of the T nodes from the sets of other nodes that stored the encryption key(s).
Reobtaining the encryption key(s) as well as the encrypted data block allows the first node to use the encryption key(s) to decrypt the encrypted data block.
-20Decrypting the encrypted data block can comprise using at least one of the one or more encryption keys as a one-time pad to decrypt the data block.
The use of one-time pad cryptography means that no offline attack can be performed to obtain the contents of the data block unless a node has the encrypted data block and all the encryption keys used as one time pads. Given that the nodes to which the encrypted data block and encryption keys have been can’t be determined by a third party without the distribution key, it would be difficult for a malicious node to request this information. As such, the use of one-time pad cryptography makes the invention resistant to offline attacks. Preferably the one time pad decryption is used when the data block was originally encrypted using one time pad encryption. As such, the data block and encryption keys can take the form discussed above with respect to the one time pad encryption.
Using an encryption key as a one-time pad to decrypt the encrypted data block can comprise performing an inverse of modular addition between corresponding bits of the data block and the encryption key. Alternatively or in addition using an encryption key as a onetime to decrypt the encrypted data block can comprise performing an XOR operation between corresponding bits of the data block and the encryption key.
The inverse modular addition can be a modular arithmetic subtraction. When an XOR operation is used, the encrypted data block and the encryption keys should be formed of binary strings. The above ensures that the decryption algorithm correctly decrypts the encrypted data block.
Sending a request for the data block or an encryption key and obtaining the data block or an encryption key can comprise: iteratively repeating steps comprising: (a) requesting the data block or encryption from one node in the set of other nodes and (b) in response to the request failing to receive the data block or encryption key, until a node provides the data block or encryption key in response to the request; and after obtaining the data block or encryption key from a node ceasing to send any further requests for the data block or encryption key.
A request for the encrypted data block is sent to the nodes in the set of nodes the encrypted data block was originally sent to. A request for each encryption key is sent to the set of nodes to which that encryption key was originally sent to. In some examples,
-21 both the encrypted data block and the encryption key(s) are requested using the above iterative process. However, in other examples only one of the encrypted data block and encryption key(s) are requested using the iterative process above.
Once the requested data i.e. the encrypted data block and/or encryption key(s) have been received, no further requests need to be sent for that data. As such, it is preferably to work through the set of nodes to which the data was sent, and individually request data from each node until the data is returned. This prevents multiple copies of the same data being returned.
It is preferably that the encrypted data block and encryption keys are only returned from a single node to ensure that the first node does not obtain too much data. Sending a request to each node in the set of nodes that stored the encrypted data block/encryption key in turn and then stopping sending requests once the encrypted data block/encryption key is received prevents the encrypted data block/encryption key being returned from multiple nodes. This limits the amount of data the first node receives and has to process when it is recreating its data.
The first node can have a pre-encryption key and the method can further comprise performing a secondary decryption of the data block after decrypting the data block.
The pre-encryption key can be a public key and the first node can have a private key corresponding to the public key; and performing a secondary decryption can comprise using the private to decrypt the data block. Alternatively, the pre-encryption key can be a private key; and performing a secondary decryption can comprise using a symmetric key decryption algorithm to decrypt the data block.
This secondary decryption is designed to undo the pre-encryption previously applied to the data block. As such, the secondary decryption algorithm can correspond to the preencryption algorithm used for pre-encryption in that it is the corresponding decryption algorithm. The decryption part of the algorithms listed as suitable for pre-encryption are therefore suitable for this secondary decryption.
This secondary decryption ensures the data block can be read by the first node.
-22Obtaining the data block or encryption key from at least one of the plurality of other nodes can comprise receiving the data block or encryption key further encrypted using a public key of the first node; and the method can further comprise: decrypting the further encrypted data block or encryption key using a private key of the first node.
The private key for the further encryption can be the same or different private key as used for the pre-encryption and secondary decryption. In some examples, the private keys for each encryption are different but they are derived from the same secret or other key. In relation to this example, the method can further comprise: at a node that receives a request for a data block or encryption key from the first node; encrypting the data block or encryption key using the public key of the first node; and sending the data block to the first node.
This provides an additional layer of security in case a node or any other third party is pretending to be the first node in order to obtain data blocks and/or encryption keys.
The method can further comprise at a node that receives a request for a data block or encryption key from the first node; verifying the identity of the first node; and sending the data block to the first node after the identity of the first node is verified.
Any suitable node or identity verification scheme can be used to verify the identity of the first node. Having the other node verify the identity of the first node ensures that requested data is only ever sent to the first node.
The method can be performed on multiple data blocks and the method can further comprise: after decrypting the data blocks rearranging the data blocks in a hash tree; and using the hash tree to verify the data blocks have not been modified during storage.
This can be done by comparing the hashes that were originally stored in the hash tree to newly taken hashes. If the hashes differ then the data block for which the hashes differ and the data block containing the differing hashes can be re-requested from other nodes.
A verification process such as on the one above, can help the first node identify when the nodes storing data have modified the data.
-23The method can further comprise: performing a hash function on the data block; and comparing the results of the hash function to the data block identifier to verify that the data block has not been modified during storage.
When the data block identifier is a hash of the data block, then it is possible to retake the hash of the data block at this stage and compare it to the data block identifier. This allows the first node to verify that the data block has not been modified. This is particularly beneficial when the data block identifiers are stored in the public register as the first node also has confidence that the data block identifiers have not been modified.
The method can further comprise: using the node identifier of the first node to identify relevant transactions on the public ledger; extracting from the relevant transactions the data block identifier; and using the block identifier to determine the identifier of the encrypted data block and/or the encryption key identifiers.
Extracting the data block identifiers from the public register means the first node can reobtain these identifiers even when it has lost all its data. Using the data block identifier to determine the identifier for the encrypted data block and/or encryption key(s) also allows this information to be obtained. This has further advantages when the data block identifiers are a hash function of the contents of the data block since it allows the first node to verify the contents of the data block.
Using the data block identifier to determine the identifier of the encrypted data block and/or the encryption key identifiers can comprise: providing the encrypted data block and the encryption keys to be obtained with a reference representing their position in an ordered list; and using a hash function of the data block identifier and the reference representing the position of the encrypted data block or the encryption key within the ordered list to obtain the identifier for each of the encrypted data block and the encryption keys.
As such, the encrypted data block identifier and/or the encryption key identifier(s) are obtained in an identical fashion to before. This means they will be identical to the previously obtained identifiers and so can be used to recreate the sets of nodes to which encrypted data blocks/encryption keys were sent.
-24The skilled person would understand that the above method can be used with any suitable apparatus. As such, the present invention also relates to an apparatus arranged to carry out any of the methods described above. The present invention also relates to a computer program which, when executed by a processor, causes the processor to carry out any of the methods described above and a computer-readable medium storing such a computer program.
In accordance with a second aspect of the invention a first node is provided. The first node comprises an encryption module configured to encrypt a data block to form an encrypted data block, the encrypted data block having an identifier. The first node also comprise a node determining module configured to compute a set of other nodes using a first node distribution key and the identifier of the encrypted data block. The first node also comprises a distribution module configured to send the encrypted data block from the first node to T nodes from the set of other nodes for storage.
In the example described above, the first node can be any computing device for example a user computing device such as a mobile computing device (such as a mobile telephone, tablet device or laptop), or a desktop computing device, and so forth.
The first node can be considered a first node configured to implement the method described above. More specifically, the first node is a first node having several modules which can be used to implement different aspects of these methods. The modules may be hardware or software. In some examples, the individual modules may be combined so the encryption module the node determining module may be performed by the same combined module.
The data block can be encrypted using any suitable encryption algorithm or encryption function that may form part of the encryption module. The identifier of the encrypted data block is also referred to as an encrypted data block identifier or encrypted data block reference number. When the term reference number is used there is no restriction on the reference number being numerical and any suitable characters can also form part of the reference number.
The identifier of the encrypted data block is a reference number or identifier that is unique from the perspective of the first node in respect of this and other data blocks encrypted and
-25distributed in the same way at least by the first node. Any other such data blocks/data chunks owned by the first node will have a different identifier. The identifier can take any suitable form and may form part of the encrypted data block or may be merely associated with the encrypted data block.
The node determining module can computes the set in a manner that means it would be hard/ difficult/ infeasible/ computationally hard/computationally difficult/computationally infeasible for any party that does not have access to the first node distribution key to recreate the set of nodes. In other words the set of nodes cannot easily be recreated without access to the first node distribution key even if the algorithm/function used to compute the set is known. The node determining module can compute the set of nodes such that it would be (computationally) hard/difficult/infeasible for the computation to be performed without the distribution key. In other words, a third party who does not have access to the distribution key would not be able to compute the set of nodes. The set of nodes may appear random to any party that does not have access to the distribution key.
The other nodes of the set the encrypted data block is sent to can be other user computing devices that together form part of a cloud of available nodes. Additionally, one or more of the nodes may be a server. In described embodiments, when a server is used as a node it will generally be treated in the same fashion as other user device nodes and not considered to be any more or less secure. The set of nodes can be created in any suitable fashion that meets the requirements listed above. Example methods of how to select the nodes are discussed below.
The number of nodes the encrypted data block is sent to, T, can be selected in various suitable ways. In some embodiments, T is simply a fixed constant for example set in advance or agreed between the nodes. In other examples T can be calculated based on the probability of nodes being offline and so data stored at these nodes being inaccessible.
The encrypted data block is sent to more than one node to allow for the risk or expectation that some of the nodes encrypted data blocks are sent to will be offline, or will have lost the data, or will no longer exist, when the first node later attempts to recover the data. Furthermore, it is assumed that some of the nodes may be offline or otherwise unavailable when the first node sends data to that node. In such a case, the data sent to the node may not be stored. However, the encrypted data block is preferably stored at all nodes that
-26receive the encrypted data block. As such, T is used here to add some assurance that the data will be returned to the first node if the first node requests its data.
The encryption module can be configured to encrypt the data block using one or more encryption keys wherein each encryption key has a corresponding encryption key identifier. For each of the one or more encryption keys: the node determining module can be further configured to compute a set of other nodes using the first node distribution key and the identifier of the encryption key; and the distribution module can be further configured to send the encryption key from the first node to T nodes from the set of other nodes for storage.
The encryption keys can take any suitable form. For example, any suitable form of symmetric-key encryption may be used to encrypt the data block and the encryption keys can be secret/private keys for this form of encryption. Example symmetric-key encryption algorithms include such as Twofish, Serpent, AES, Blowfish, CAST5, Kuznyechik, RC4, 3DES, Skipjack, Safer+/++ and IDEA. Each encryption key may be used to encrypt the data block using a different encryption algorithm. Therefore, the encryption keys do not all need to have the same form. Preferably the encryption keys are used to both encrypt and decrypt data. However, in other examples, the data could be encrypted using a public key and the private key used for decryption could be the encryption key sent to other nodes for storage.
As with the encrypted data block identifier, the encryption key identifiers can take any suitable form and can be stored as part of the encryption keys or separately from the encryption keys. The encryption key identifiers should be unique from the perspective of the first node and should differ from each other and from the encrypted data block identifier.
Each encryption key is sent to T other nodes where the set of nodes each encryption key is sent to is determined separately so that the set of nodes is different. The set of nodes each encryption key is sent to can be edited to remove any nodes that encrypted data block or previous encryption keys are sent to.
The node determining module can be configured to compute a set of other nodes using a secure pseudo-random number generator wherein a seed of the secure pseudo-random
-27number generator is generated using the first node distribution key and the identifier of either the encrypted data block or the encryption key.
While a secure pseudo-random number generator is mentioned where necessary a general pseudo-random number generator could be used. The secure pseudo-random number generator can output numerical or any other form of characters. The random number generator is a pseudo-random number generator to ensure that the set can be recreated when the same seed is used. The use of a secure pseudo-random number generator with a known seed therefore achieves the recreatable sets that cannot be recreated by a third party that does not have access to the first node distribution key. In one example, the seed that is a function of the identifier and the distribution key is the only seed of the secure pseudo-random number generator. The function of the identifier and distribution key for each encrypted block/encryption key should be unique compared to the seed created from the identifier of other encrypted blocks/encryption keys. This ensures that the set of T nodes created is different for each encrypted block/key.
Preferably, the secure/computationally secure pseudo-random number generator is used is used to determine the sets of nodes that both the encrypted data block and the encryption keys are sent to. However, in other examples, the secure pseudo-random number generator can be used for determining just a set of nodes that the encrypted data block is sent to or just a set of nodes the encryption keys are sent to.
The first node can further comprise a hash module configured to arranging the data blocks in a hash tree before the encryption module encrypts the data block.
Arranging the data blocks in a hash tree does not necessarily involve performing any physical rearrangement of the data blocks. Instead, arranging the data blocks in the hash tree can simply involve processing the data blocks so they are stored in a hash tree wherein each block in the hash tree is provided with a hash of the previous block in the tree. The hash tree can also be called a private block-chain. When a node wishes to verify that the blocks have not been modified, the node can retake the hashes and compare them to the original hashes. If the hashes differ then the node has reason to suspect that at least one block has been modified.
-28The first node can further comprise an acknowledgment receiving module configured to receive an acknowledgement of receipt from at least one of the other nodes from each of the sets of other nodes; and an acknowledgment processing module configured to publish the received acknowledgments on a public ledger.
The public ledger can be considered public in that it is accessible by both the first node and each of the other nodes to which data is sent. As such, public in the case does not mean the general public can access the ledger, just that each node in the system can access the ledger. The public ledger is also known as a public register.
The first node may receive an acknowledgment from every node it sent data to. However, this is not always the case. This is because, in some examples, the first node will send data to a node that is offline. Alternatively the data transfer will be interrupted in some other way. As such, the acknowledgments of receipt will only be received from nodes that actually received data.
The first node can also comprise a storage request module configured to generate proof of storage requests wherein each proof of storage request comprises a token. The data distribution module can be configured to send a proof of storage request along with the encrypted data block and/or the one or more encryption keys; and the acknowledgment receiving module can be configured to receive the tokens sent in the proof of storage requests.
The token in each storage request can take any suitable form provided the token is only known by the first node before it is sent to the other node. In other words, the token can be a secret that the first node sends to the other node. In one example the token or secret is randomly or pseudo-randomly generated. Preferably the token should be long enough to ensure a reasonable probability that it does not overlap with any other tokens.
The use of a token allows the first node to tie the acknowledgement of receipt back to the storage request it sent without revealing any information in the storage request that would allow the receiving node to work out where other data blocks were sent. It also prevents nodes from sending false acknowledgements when they have not received any data since the first node would be able to determine that such acknowledgments were not tied to any storage request.
-29The node determining module can be further configured to re-computer the set of other nodes using the first node distribution key and the identifier of the encrypted data block. The first node can further comprise a data requesting module configured to send a request for the encrypted data block to at least one of the nodes from the set of other nodes and receive the encrypted data block from at least one of the nodes from the set of other nodes; and a decryption module configured to decrypt the encrypted data block.
The skilled person would understand that the set of other nodes can be created in the same fashion as originally used to create the set of other nodes. Therefore the set of other nodes is deterministic when the encrypted data block identifier and the first node distribution key is known but cannot be easily created by any other node or third party in the system.
Obtaining the encrypted data block comprises receiving the data block from at least one of the nodes in the set of nodes. The encrypted data block will only be received from a node after a request for the data block has been sent to that node. While the method may comprise only requesting the encrypted data block from a single node, in one example a request for the encrypted data block is sent to all of the T nodes from the set that the encrypted data block was originally sent to. Not all the nodes that receive a request for the encrypted data block will necessarily return the data block. This may be because they are offline or unavailable but it may also be because they have been somehow compromised and so do not respond to the request. Alternatively, a node may have lost its data and may respond to the request for the encrypted data block without returning the encrypted data block.
The above allows the first node to obtain any encrypted data blocks it has sent for storage. This should allow it to recreate its data when it has lost its data.
For each of the one or more encryption keys: the node determining module can be further configured to re-compute the set of other nodes using the first node distribution key and the identifier of the encryption key; the data requesting module can be configured to send a request for the encryption key to at least one of the nodes from the set of other nodes and receive the encryption key from at least one of the nodes from the set of other nodes; and
-30the decryption module can be configured to decrypt the encrypted data block using the returned encryption keys.
Obtaining the one or more encryption keys can comprise receiving each of the one or more encryption keys from at least one node in the set of nodes to which that encryption key was sent. Each encryption key only needs to be received from a single node. However, in some examples, each encryption key may be received from multiple nodes. A node may not necessarily respond to a request for an encryption key either because the node is somehow unavailable or because it has lost its data or it has been compromised and so refuses all requests. In such a case, the encryption key will need to be obtained from another node at which it was stored.
As with the above, the sets of other nodes are recreated in the same fashion as the sets were originally created. While the method may comprise only requesting the encryption key(s) from a single node, in one example a request for the encryption key(s) is sent to all of the T nodes from the sets of other nodes that stored the encryption key(s).
Reobtaining the encryption key(s) as well as the encrypted data block allows the first node to use the encryption key(s) to decrypt the encrypted data block.
Brief description of the drawings
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
Figure 1 shows an overview of a system which enables a first node to distribute data blocks and associated encryption keys via a network for distributed storage at other nodes in the system;
Figures 2a and 2b shows an example of how one-time pad encryption may be used to encrypt a data block in the system of Figure 1. In Figure 2a the data block is encrypted using a single encryption key. In Figure 2b the data block is encrypted using multiple encryption keys;
- 31 Figure 3 shows how a first node of figure 1 may determine to which nodes the data block and encryption keys should be sent;
Figure 4 shows how identifiers may be calculated for an encrypted data block and associated encryption keys;
Figure 5a shows how a data block can undergo pre-encryption before it is encrypted using the encryption keys illustrated in Figure 2;
Figure 5b shows how an encrypted data block and associated encryption keys can undergo post-encryption before they are sent to the nodes at which they will be stored;
Figure 6 shows how a data block of the previous Figures can be arranged in a block-chain of other data blocks to allow subsequent verification;
Figure 7 shows how the other nodes receiving data can return acknowledgments of receipt which can then be published on a public register;
Figure 8 shows how the first node can create proof of storage requests and these proof of storage requests can be used by other nodes to generate the acknowledgments of receipt;
Figure 9 shows how a registrar can monitor heartbeat signals from all the nodes in the system and use these heartbeat signals to update the public register if a node goes permanently offline;
Figure 10 shows how a first node may recreate a data block from the data stored at other nodes when the first node has lost its data;
Figure 11 shows how the block-chain of figure 6 can be recreated by the first node in order to enable the first node to verify the data blocks it received from the other nodes;
Figure 12 is a flow diagram showing a method for distributing data from a first node for example in the system of figure 1 and according to the other previous figures;
- 32 Figure 13 is a flow diagram showing a method by which a node receiving a data chunk from a first node in the system of figure 1 determines whether to store or delete the data chunk;
Figure 14 is a flow diagram showing how a first node can recover distributed data chunks when it wishes to obtain data that was previously sent for storage;
Figure 15 schematically illustrates an example of a computer apparatus or device suitable for implementing nodes and/or servers as described in this document.
Detailed description
In the description that follows and in the figures, certain embodiments of the invention are described. However, it will be appreciated that the invention is not limited to the embodiments that are described and that some embodiments may not include all of the features that are described below. It will be evident, however, that various modifications and changes may be made herein without departing from the broader spirit and scope of the invention as set forth in the appended claims.
Referring to Figure 1 there is illustrated a system comprising a plurality of nodes 10, 30-1 30-N connected via a network 50. Each node represents a user computing device such as a mobile computing device or a desktop computing device. Each node may belong to the same or different users. Although ideally the nodes 10, 30-1 - 30-N, 40-1 - 40-N comprise user computing devices, one or more of the nodes 30-1 - 30-N, 40-1 - 40-N may also comprise a server or other data management system. In relation to the embodiments discussed herein, such a server or data management system will be treated in the same fashion as the nodes representing user devices and thus can also be considered to be a user device.
In the system illustrated in Figure 1, the first node 10 sends a data block 12 to be stored at a least some of the other nodes 30-1 - 30-N. The data block 12 is sent via the network 50. Before sending the data block 12, the first node 10 processes the data block 12 to ensure that none of the other nodes 30-1 - 30-N storing the data block 12 can determine the information contained within the data block 12.
-33In particular, before the first node 10 sends the data block 12 to the other nodes 30-1 - 30N for storage, the first node 10 encrypts the data block 12 to form an encrypted data block 18. The data block 12 is encrypted using an encryption module 15 that performs an encryption function 17. The encryption function 17 takes as inputs the data block 12 and one or more encryption keys 16-1. The encryption function 17/encryption module 15 uses the one or more encryption keys 16-1 to encrypt the data block 12 to form an encrypted data block 18.
The encrypted data block 18 is then sent to a distribution module 28 that performs a distribution function 29 to send the encrypted data block 18 via the network 50 to at least some of the other nodes 30-1 - 30-N. A set of other nodes 24-1 to which the encrypted data block 19 is sent is determined by a node determining module 20 that performs a node determining function 21. This node determining function 21 determines the set of other nodes 24-1 to which to send the encrypted data block 18 in a manner that is recreatable at the first node 10 but appears random to the other nodes 30-1 - 30-N, 40-1 - 40-N. In other words, the node determining function 21 determines the set of other nodes 24-1 to which to send the encrypted data block 18 in a manner that is recreatable using data known by the first node 10 but which would be difficult/hard/infeasible without this data. How this is achieved is discussed in more detail below.
As mentioned previously, the encrypted data block 18 is being sent from a first node 10. A distribution key 22 is stored at the first node 10. This distribution key 22 is a secret or private key only known by the first node 10. Although the distribution key 22 can be considered to be a private key, in that it is only known by the first node 10, there is no requirement for there to be a corresponding public key.
The encrypted data block 18 has an identifier 19 known as the encrypted block identifier 19 or as the identifier of the encrypted data block. This encrypted data block identifier 19 can be stored separately from the encrypted data block 18 and associated with the encrypted data block, or can be stored as part of the encrypted data block 18. Other manner of storing the encrypted data block identifier 19 and associating the encrypted data block identifier 19 with encrypted data block 18 may also be suitable. Either way, the
The node determining module 20 takes as an input the first node distribution key 22 and the encrypted data block identifier 19 and uses the node determining function 21 to
-34calculate a set of other nodes 24-1. The node determining function 21 creates a set of other nodes 24-1 that can be recreated provided the first node distribution key 22 and the encrypted block identifier 19 is known but which would be difficult/hard/infeasible for any other node 30-1 - 30-N, 40-1 - 40-N that does not know these to recreate. This is explained in more detail later.
In some embodiments, the other nodes 30-1 - 30-N, 40-1 - 40-N may have an associated node identifier 32-1 - 32-N, 42-1 - 42-N wherein the node identifier 32-1 - 32-N, 42-1 - 42N is associated with the node 30-1 - 30-N, 40-1 - 40-N. First node 10 may also have an associated node identifier 35. The node identifiers 32-1 - 32N, 42-1 - 42N, 35 may be associated with the nodes 30-1 - 30-N, 40-1 - 40-N, 10 in any suitable way. The node identifiers 32-1 - 32-N, 42-1 - 42-N, 35 associated with the nodes can be of a group of characters of length W, where W is set based upon the number of other nodes 30-1 - 30N, 40-1 - 40N in the system and is large enough to ensure that each node 30-1 - 30-N, 401 - 40N can be allocated a unique identifier.
The node determining function 21 may output characters modulo W such that the output is groups of characters where each group of characters has a size W. Each group of characters in the set can correspond to the identifier 32-1 - 32-N, 42-1 - 42-N, 35 of a node.
Alternatively, the set of other nodes 24-1 may comprise a binary/decimal string or a string of characters in any other suitable format 240. The string of binary/decimal or other format characters 240 can then be split such that the first W characters of the string 240 are considered to represent the identifier 32-1 of first node 30-1 to which data should be sent, the second W character of the string 240 represent the identifier 32-2 of the second node 30-2 to which data should be sent etc. The skilled person would of course understand that for whatever reason, some of the characters of the string 240 may be skipped before/after and between the characters that represent the identifiers 32-1 - 32-N, 42-1 - 42-N of the nodes 30-1 - 30-N, 40-1 - 40-N. As such, the terms “first W characters”, “second W characters” etc. refers to the characters being used to determine node identifiers 32-1 - 32N, 42-1 - 42-N and not necessarily the first characters in the set 24-1.
Once the nodes in the set of nodes 24-1 - 24-N have been determined, the address of the nodes 24-1 - 24-N may be looked up, for example at a rendezvous server.
-35While the above provides example ways to determine nodes 30-1 - 30-N, 40-1 - 40-N from a set of other nodes 24-1 other ways to perform such a step can be imagined and can be used in place of the above technique without deviating from the scope of the invention.
Once the set of other nodes 24-1 has been determined it is used as an input to the distribution module 28 that distributes the encrypted data block 18 using the distribution function 29. To this end either the distribution module 28 or the node determining module 28 takes the set 24-1 of nodes and an input T which is a number representing the number of other nodes 30-1 - 30-N, 40-1 - 40-N to which the encrypted data block 18 should be sent and determines T nodes 30-1 - 30-N from the set of nodes 24-1. The distribution module 28 takes these T nodes and sends the encrypted data block 18 to these T nodes 30-1 - 30-N via the network 50.
The distribution module 28 can also send the T nodes the encrypted data block identifier 19 so that the other nodes 30-1 - 30-N, 40-1 - 40-N have a reference for the data they are storing. In other examples an identifier 14 for the data block 12 (discussed in more detail later) is sent instead of, or as well as, the identifier 19 of the encrypted data block 18. However, any other suitable method of allowing the other nodes 30-1 - 30-N to identify the encrypted data block 18 could also be used. The first node 10 can send its public key 78 alongside any encrypted data block 18 to allow later verification.
The T other nodes 30-1 - 30-N can be the first T other nodes 30-1 - 30-N from the set of other nodes 24-1. Alternatively, the T other nodes 30-1 - 30-N can be chosen randomly from the set of other nodes 24-1. However, other ways of selecting T nodes from the set 24-1 can be envisaged. The number T can be determined based on a required probability of the data being available and a probability of one of the other nodes 30-1 - 30-N, 40-1 40N being permanently unavailable and a probability of one of the other nodes 30-1 - 30N, 40-1 - 40-N being temporarily unavailable. Preferably T is also determined based on the probability of a node being unavailable when data is sent to it. Further details on how T can be calculated are discussed later.
Distributing an encrypted data block 18 in the manner described above ensures that only the first node 10 who is sending the encrypted data block 18 for distribution knows where the encrypted data block 18 is stored. This prevents a malicious third party from working
-36out the stored location of the encrypted data block 18 and requesting the data block from one of the nodes storing the encrypted data block 30-1 - 30-N. When multiple data blocks are sent, this also prevents a malicious node getting a complete picture of where all of a first node’s 10 data is stored.
In one example, as well as sending the encrypted data block 18 for storage the at least one encryption key 16-1 can also be sent to other nodes 40-1 - 40-N for storage. When both the encrypted data block 18 and at least one encryption key 16-1 are sent for storage the encrypted data block 18 and the at least one encryption key 16-1 can be considered to be data chunks that form the unencrypted data block 12. The nodes to which the encryption key 16-1 are sent are selected independently from the nodes to which the encrypted data block 18 are sent but using the same technique.
More specifically, the encryption key 16-1 has an encryption key identifier 11-1. This encryption key identifier 11-1 may form part of the encryption key 16-1 or may be stored separately from the encryption key 16-1 and associated with the encryption key 16-1 in any suitable manner.
When the encryption key 16-1 is sent to a plurality of other nodes 40-1 - 40-N for storage, the node determining module uses the node determining function to determine a set of nodes 24-2 to which the encryption key 16-1 should be sent. This set of nodes 24-2 is determined separately from the set of nodes 24-1 to which the encrypted data block 18 is to be sent.
When node determining function 21 of the node determining module 20 is determining the set of nodes 24-2 to which to send the encryption key 16, the node determining function 29 takes as an input the first node distribution key 22 and the encryption key identifier 11-1. The node determining function 21 then creates a set of other nodes 24-2 that can be recreated provided the first node distribution key 22 and the encryption key identifier 11-1 is known but which would be difficult for any node 30-1 - 30-N, 40-1 - 40-N lacking these pieces of information to recreate. Suitable functions are described in more detail later.
In one example, the set of nodes 24-2 to which the encryption key 16-1 is sent to is edited to ensure it does not contain any of the same nodes that the encrypted data block 18 is going to be sent to. This can be done by comparing the sets 24-1 and 24-2 and removing
-37from set 24-2 any entries which also appear in set 24-1. This editing can be done by the node determining module 20 or the data distribution module 28. This editing is done before the T nodes from set 24-2 are selected to ensure that any data chunk 18, 16-1 is always sent to T other nodes 30-1 - 30-N, 40-1 - 40-N. However it may be done after the T nodes from set 24-1 are selected so that only nodes that the encrypted data block 18 is actually being sent to are removed from set 24-2. In such a case the node determining module 20 may take T as an input and use this to select the T nodes from a set instead of this being done at the distribution module. This editing ensures that no one node receives both the encrypted data block 18 and the encryption key 16.
Once the set of other nodes 24-2 to which the encryption key 16-1 should be sent has been determined it is used as an input to a distribution module 28 that distributes the encryption key 16-1 using a distribution function 29. The distribution module 28 can distribute the encryption key 16-1 along with its encryption key identifier 11-1 so the other node 30-1 30-N, 40-1 - 40-N has a reference for the data it is storing. In other examples an identifier 14 for the data block 12 (discussed in more detail later) is sent instead of, or as well as, the identifier 11-1 of the encryption key 16-1. However, any other suitable method of allowing the other nodes 30-1 - 30-N, 40-1 - 40-N to identify the encryption key 16-1 could also be used. The first node 10 can send its public key 78 alongside any encryption key 16-1 to allow later verification.
To distribute the encryption key 16-1 either the distribution function 29 or node determining module 20 takes the set of other nodes and a number T, wherein T represents the number of other nodes 30-1 - 30-N, 40-1 - 40-N to which the encryption key 16-1 should be sent and selects T nodes from the set 24-2. In general, the encrypted data block 18 and the encryption key 16-1 should be sent to the same number, T, of nodes although the actual nodes to which the encrypted data block 18 and the encryption key 16-1 should be sent are determined separately and are likely to differ. The distribution function takes the selected T other nodes 40-1 - 40-N from the set of nodes 24-2 and sends the encryption key 16-1 to these T other nodes 40-1 - 40-N via the network 50. The distribution module 28 also distributes the encrypted data block 18 as outlined above. The number of nodes T to which to send the encryption key 16-1 can be determined in multiple ways and some example ways are discussed in more detail later.
-38While it is possible that the data block 12 is encrypted using only one encryption key 16-1, it is also possible to encrypt the data block 12 using multiple encryption keys 16-1 - 16-N. Once again, in this scenario the encrypted data block 18 and the multiple encryption keys 16-1 - 16-N can be considered to be data chunks that form the unencrypted data block 12. When multiple encryption keys 16-1 - 16-N are used, each encryption key can be sent to T nodes from a corresponding set of other nodes 24-2 - 24-N+1 such that the set of nodes 24-2 - 24-N+1 to send each encryption key 16-1 - 16-N to are determined separately using the method outlined above.
Sending each encryption key 16-1 - 16-N for storage can also comprise sending the corresponding encryption key identifier 11-1 - 11-N to the nodes 30-1 - 30-N, 40-1 - 40-N storing the encryption key 16-1 - 16-N. This provides the nodes 30-1 - 30-N, 40-1 - 40-N with a reference for the data they are storing. In other examples an identifier 14 for the data block 12 (discussed in more detail later) is sent instead of, or as well as, the identifiers 11-1 - 11-N of the encryption keys 16-1 - 16-N. However, any other suitable method of allowing the other nodes 30-1 - 30-N, 40-1 - 40-N to identify the encryption keys 16-1 16-N could also be used. The first node 10 can send its public key 78 alongside any encryption keys 16-1 - 16-N to allow later verification.
The sets of nodes 24-2 - 24-N+1 can be edited to ensure they do not contain any of the same nodes mentioned in a previous set 24-1 - 24-N. This can be done by editing each set 24-2 - 24-N+1 to remove any nodes mentioned in a previous set 24-1 - 24-N. For example, the set of nodes 24-1 the encrypted data block 18 is sent to can be considered the first set, the set of nodes 24-2 the encryption key 16-1 is sent to can be considered the second set, and the set of nodes 24-3 the encryption key 16-2 is sent to can be considered the third set. The second set of nodes 24-2 is edited based on the first set of nodes 24-1 by deleting any entries from the second set of nodes 24-2 that also occur on the first set of nodes 24-1. The third set of nodes 24-3 is edited based on both the second set of nodes 24-2 and 24-1 to remove any nodes that are also on one of these other sets 24-1,24-2. Such a technique ensures that no one node receives both the encrypted data block 18 and one or more of the encryption keys 16-1 - 16-N. This editing can be done by the node determining module 20 or the data distribution module 28. This editing is done before the T nodes from the later set 24-2 - 24-N+1 are selected to ensure that any data chunk 18, 16-1 - 16-N is always sent to T other nodes 30-1 - 30-N, 40-1 - 40-N. However, it may be done before the T nodes from the earlier set 24-1 - 24-N are selected to prevent the
-39unnecessary removal of nodes from a set 24-2 - 24-N+1. In such a case the node determining module 20 may take T as an input and use this to select the T nodes from a set instead of this being done at the distribution module.
Sending the encryption key(s) 16-1- 16-N to other nodes 40-1 - 40-N for storage means that the first node 10 can obtain the encryption key(s) 16-1 - 16-N from the other nodes 401 - 40-N if the first node 10 loses its data. Furthermore, given that a third party cannot work out where the encryption key(s) 16-1 - 16-N have been stored, it is not possible for any third party or any other node 30-1 - 30-N, 40-1 - 40-N to request back both the encrypted data block 18 and the encryption key(s) 16-1 - 16-N. Furthermore, using multiple encryption keys 16-1 - 16-N means any third party or other node 30-1 - 30-N, 401 - 40-N has more information to find in order to be able to decrypt the encrypted data block 18. As such, this allows a first node 10 to send its data for backup storage while ensuring that none of the nodes 30-1 - 30-N, 40-1 - 40-N providing this backup storage or any third party can decrypt the stored data.
Figure 2 shows an example of how encryption key 16-1 can be used to encrypt the data block 12. In Figure 2a only a single encryption key 16-1 is used. In Figure 2b multiple encryption keys 16-1 - 16-N are used. While Figure 2b shows three encryption keys 16-1 - 16-3, the skilled person would understand that any number of encryption keys is suitable provided at least one encryption key is used.
Figure 2a shows the data block 12 in more detail. The data block 12 contains a string of data 13 wherein the string 13 has a length L in that is made up of L characters. The string 13 can take any suitable form. Therefore, the string can be a binary string, a hexadecimal, an alphanumeric string, a Unicode string or a string in suitable other data format. The characters that make up the string vary depending upon the form of the string e.g. if the string is a binary string the characters are 0s and 1s while if the string is a hexadecimal string the characters are hexadecimal characters.
Figure 2a also shows an encryption key 16-1. In this example, the encryption key 16-1 is also a string 160-1 of length L in that it is made up of L characters. In one example the string 160-1 that forms the encryption key 16-1 takes the same form as the string 13 that forms the data block 12 e.g. if the string 13 that forms the data block 12 is a binary string then the string 160-1 that forms the encryption key 16-1 is also a binary string etc. The
-40string 160-1 that forms the encryption key 16-1 is a random string in that each character of the string 160-1 is chosen randomly or pseudo-randomly.
The encryption function 17 used to encrypt the data block 12 is one-time pad encryption. More specifically, the encryption module 15 uses the encryption key 16-1 as a one-time pad to encrypt the data block 12 to form the encrypted data block 18. In one example using the encryption key 16-1 as a one-time pad comprises performing modular addition between corresponding characters of the string 160-1 forming the encryption key 16-1 and the string 13 forming the data block 12. For example, a modular addition is performed between the first character of the string 160-1 forming the encryption key 16-1 and the first character of the string 13 forming the data block 12, a modular addition is performed between the second character of the string 160-1 forming the encryption key 16-1 and the second character of string 13 forming the data block 12 etc. up until the Lth character where a modular addition is performed between the Lth character of the string 160-1 forming the encryption key and the Lth character of the string 13 forming the data block.
In one example where the strings 13 and 160-1 are binary strings, the performing the onetime pad encryption comprises performing an XOR operation. As such, using the encryption key 16-1 to encrypt the data block 12 comprises performing an XOR operation between each bit of the string 13 that forms the data block 12 and the corresponding bit of the string 160-1 that forms the encryption key 16-1.
The use of one-time pad cryptography ensures that any node of the other nodes 30-1 - 30N which are storing either the encrypted data block 18 or the encryption key 16-1 are unable to perform any offline attack which would allow them to obtain the unencrypted data block 12. As such, the use of one-time pad cryptography enhances the security of the stored data.
In Figure 2b multiple encryption keys 16-1 - 16-3 are used to encrypt the data block 12. As mentioned previously, although Figure 2b shows three encryption keys, the skilled person would understand that this number of encryption keys is merely exemplary and more or less encryption keys could be used. In this example, each encryption key 16-1 - 16-3 is a string 160-1 - 160-3 of length L in that it is made up of L characters. In one example each string 160-1 - 160-3 takes the same form as the string 13 that forms the data block 12 e.g. if the string 13 is a hexadecimal string then the strings 160-1 - 160-3 that form the
-41 encryption keys 16-1 - 16-3 are also hexadecimal strings etc. Each string 160-1 - 160-3 of the strings forming the encryption keys 16-1 - 16-3 is a random string in that each character of the string is chosen randomly or pseudo-randomly.
In the example shown in Figure 2b using the encryption function 17 to encrypt the data block 12 comprises using modular addition between each character of the string 13 forming the data block 12 and the corresponding characters of the strings 160-1 - 160-3 forming the encryption keys 16-1 - 16-3. This modular addition is described in more detail above with respect to using a single encryption key. The result of these module additions is a single encrypted data block 18. Therefore, preferably each encryption key 16-1 - 16-3 is used to perform the modular addition on the same data block 18 either consecutively or simultaneously. Preferably, performing the modular addition does not comprise forming multiple data blocks and encrypting each data block with a separate encryption key. This modular addition is therefore equivalent to using each encryption key 16-1 - 16-3 as a onetime pad to encrypt the data block 12.
The result of such an encryption is an encrypted data block 18 that can only be decrypted using all of the encryption keys 16-1 - 16-3. As mentioned above, these encryption keys 16-1 - 16-3 and the encrypted data block 18 are each separately sent to T other nodes 401 - 40-N for storage wherein the T other nodes each encryption key 16-1 - 16-3 and the encrypted data block 18 are sent to can only be worked out from the first node data distribution key 22 and the encryption key and encrypted data block identifiers 11-1 - 11-3,
19. As such, no other node 30-1 - 30-N, 40-1 - 40-N can work out where the encryption keys 16-1 - 16-3 and the encrypted data block 18 have been sent.
The use of multiple encryption keys 16-1 - 16-3, each one being used as a one-time pad ensures that multiple encryption keys 16-1 and 16-3 and the encrypted data block 18 have to be obtained by any party wishing to decrypt the encrypted data block 18. Furthermore, unless all the encryption keys 16-1 and 16-3 and the data block 18 can be obtained no offline attack can be used to obtain the unencrypted data block 12. Given that nodes to which the encryption keys 16-1 and 16-3 and the encrypted data block 18 were sent cannot be obtained without the distribution key 22 and the relevant identifiers 11-1 - 11-3, 14, a third party or any other node 30-1 - 30-N, 40-1 - 40N who does not have access to this information would struggle to request the relevant data and thus decrypt the encrypted data
-42block 18. This allows the first node 10 to store data at other nodes 30-1 - 30-N, 40-1 - 40N securely.
Figure 3 shows an example first node 10 which can be used in accordance with the system shown in Figure 1. For simplicity and clarity the first node 10 shown in Figure 3 does not include all the elements shown in Figure 1. However, the skilled person would understand that all such elements can be included with the first node 10 shown in Figure 3. As discussed above, the node determining function 21 determines sets of nodes 24-1 - 24 N+1 to which the encrypted data block 18 and the encryption keys 16-1 - 16-N should be sent. The encrypted data block 18 and the encryption keys 16-1 - 16-N can each be referred to as a data chunk that makes up the unencrypted data block 12.
In the example shown in Figure 3, the node determining function 21 is a secure pseudorandom number generator (PSRN) 23 (otherwise known as a cryptographically secure pseudo-random number generator) that uses either (a) the first node distribution key 22 and the relevant identifier 19, 11-1 - 11-N as a seed; or (b) a hash function of the first node distribution key 22 and the relevant identifier 19, 11-1 - 11-N as a seed. The relevant identifier 19, 11-1 - 11-N is the identifier of the data chunk 18, 16-1 - 16-N that is being distributed i.e. the identifier 19 of the encrypted data block 18 or the identifier 11-1 - 11-N of the encryption keys 16-1 - 16-N. While the term secure pseudo-random number generator is used and it is preferable that any suitable pseudo-random number generator used can be considered to be a cryptographically secure pseudo-random number generator, the skilled person would understand that other pseudo-random number generators may also be suitable and that the secure pseudo-random number generator 23 could be replaced with another pseudo-random number generator.
More specifically, as shown in Figure 3 when the node determining function 21 is a secure pseudo-random number generator 23, the seed 25 for the secure pseudo-random number generator 23 is a function 27-1 calculated by a function module 26. The function 27-1 is a function of the first node distribution key 22 and the relevant identifier 19, 11-1 - 11-N i.e. the identifier 19 of the encrypted data block 18 or the identifier 11-1 - 11-N of the encryption keys 16-1 - 16-N dependent on which data chunk is currently being processed by the node determining function 21/secure pseudo-random number generator 23. This function 27-1 can take any suitable form provided that for each data chunk 18, 16-1 - 16-N the seed is distinct from the seeds for other random data chunks 18, 16-1 - 16-N when the
-43relevant identifiers 19, 11-1 - 11-N differ. For example the function 27-1 could be a function that appends the relevant identifier 19, 11-1 - 11-N to the beginning or end of the first node distribution key 22. Alternatively, the function 27-1 could modify the first node distribution key 22 using the relevant identifier 19, 11-1 - 11-N for example by adding or subtracting the relevant identifier 19, 11-1 - 11-N to/from the first node distribution key 22, or by scaling the first node distribution key 22 by the relevant identifier 19, 11-1 - 11-N.
In one example, once the function module 26 has combined the first node distribution key 22 and the identifier 19, 11-1 - 11-N of the data chunk the result is used as seed 25. The fact that the first node distribution key 22 is unknown by any other party such as other nodes 30-1 - 30-N, 40-1 - 40-N ensures that the seed 25 cannot be recreated by any other party. The use of the data chunk identifier 19, 11-1 - 11-N in calculating the seed 25 ensures that the seed 25 for each data chunk 18, 16-1 - 16-N is unique so each data chunk 18, 16-1 - 16-N is distributed to a different set of nodes 30-1 - 30-N, 40-1 - 40-N.
In another example, once the function module 26 has performed function 27-1 to combine the identifier 19, 11-1 - 11-N of the encrypted data block 18 or encryption key 16-1 - 16-N, it then performs a hash function 27-2 on the result and uses the result of hash function 27-2 as the seed 25 for the secure pseudo-random number generator 23. The use of a hash function 27-2 means that even if the seed 25 was leaked, it would not be possible for a third party or another node 30-1 - 30-N, 40-1 - 40-N to work out the first node distribution key 22 and thus be able to calculate where other data chunks 18, 16-1 - 16-N will be sent.
The secure pseudo-random number generator 23 can output groups of characters modulo W, such that each group of characters is W characters long and represents the identifier of a node. As such, the output of the pseudo-random number generator is in a form that allows the skilled person to access the identifiers 32-1 - 32-N, 42-1 - 42-N of the nodes 30-1 - 30-N, 40-1 - 40-N from the set 24-1 - 24-N without further processing.
Alternatively, the output of the secure pseudo-random number generator 23 can be a string 240-1 - 240-N of any suitable format characters (e.g. binary numbers, decimal numbers, hexadecimal numbers etc.) that form set 24-1 - 24-N. As mentioned above, in order to turn this string 240-1 - 240-N into numbers 32-1 - 32-N, 42-1 - 42-N that represent the other nodes 30-1 - 30-N, 40-1 - 40-N, the string can be divided into groups of W characters wherein W is the length of the identifiers 32-1 - 32-N, 42-1 - 42-N representing the nodes
-4430-1 - 30-N, 40-1 - 40-N. The string 240-1 - 240-N can be divided into these groups of characters using any known or standard technique. The above paragraph is one example but other techniques for taking the output of the pseudo-random number generator 23 and turning it into a set of nodes 24-1 -24-N+1 can also be imagined.
Using a secure pseudo-random number generator 23 as the node determining function 21 ensures the data chunks 18, 16-1 - 16-N are distributed in a manner that appears random to other nodes 30-1 - 30-N, 40-1 - 40-N but can be recreated by the first node 10.
Figure 4 shows the data block 12 and the data chunks i.e. the encrypted data block 18 and the encryption keys 16-1 -16-N from Figure 1. Although Figure 4 does not show much of the other detail from Figure 1 and Figure 3, the skilled person would understand that the details shown in Figure 4 can be used in combination with the details from Figure 1 and Figure 3. Furthermore, although Figure 4 shows three encryption keys 16-1 - 16-3, the skilled person would understand that this number is merely an example and more or less encryption keys could be used.
As mentioned previously, the encrypted data block 18 and the encryption keys 16-1 - 16-3 used to encrypt that encrypted data block 18 can be considered to be data chunks that form part of data block 12. As shown in Figure 4 the data chunks i.e. the encrypted data block 18 and the encryption keys 16-1 - 16-3 are arranged in an ordered list 62 and each data chunk is provided with a reference 60-1 - 60-4 that represents its position within the ordered list 62. For example, in Figure 4, the encrypted data block 18 is considered the first data chunk in the list 62 and is hence provided with the reference 60-1 “1” while the third encryption key 16-3 is considered the fourth item in the ordered list 62 and hence is provided with the reference 60-4 “4”. Arranging the data chunks 18, 16-1 - 16-3 in an ordered list 62 does not require any physical rearranging of the data chunks 18, 16-1 - 163, instead this step merely refers to the process of assigning each data chunk 18, 16-1 16-3 a corresponding reference. Similarly, although in the example shown the encrypted data block 18 is the first item in the ordered list 62, it could be placed anywhere in the ordered list 62 so it could be a middle item or even the final item of the list. This step can be performed by identifier module 64.
As also shown in Figure 4, the unencrypted data block 12 has a data block identifier 14. The data block identifier 14 can be associated with the data block 12 in any suitable way.
-45For example, the data block identifier 14 could be part of the string 13. Alternatively, the data block identifier 14 could be associated with the data block 12 but not stored as part of the data block 12.
In accordance with the example shown in Figure 4, the identifiers 19, 16-1 - 16-3 of the data chunks 11-1 - 11-3 are calculated/determined from the data block identifier 14 by a identifier module 64 that performs a identifier calculation 66. In particular, for each data chunk 18, 16-1 - 16-3 the identifier calculation 66 takes as an input the data block identifier 14 and the reference 60-1 - 60-4 representing the position of the data chunk 18, 16-1 - 163 within the ordered list 62 and returns the data chunk identifier 19, 11-1 - 11-3. In one example, the identifier calculation 66 simply adds the position reference 60-1 - 60-4 i.e. the reference representing the position of the data chunk 18, 16-1 - 16-3 in the ordered list 62 to the data block identifier 14. However, the skilled person would understand that many other identifier calculations are possible for example: subtracting the position reference 601 - 60-4 from the data block identifier 14; appending the position reference 60-1 - 60-4 to the data block identifier 14; or multiplying the data block identifier 14 by the position reference 60-1 - 60-4. More specifically, any calculation that results in data chunk identifiers 19, 11-1 - 11-3 that are different for each data chunk 18, 16-1 - 16-3 may be suitable.
In the example mentioned above, the results of the identifier calculation 66 are directly used as the data chunk identifiers 19, 11-1-11-3. However, it is also possible for the result of the identifier calculation 66 to be passed to hashing function 68, also at the identifier module 64 in order to generate the data chunk identifiers 19, 11-1 - 11-3. In this example, the identifier calculation 66 described above is performed to generate a unique reference for each data chunk 18, 16-1 - 16-3. This reference is then fed into a hash function 68 which produces the data chunk identifiers 19, 11-1 - 11-3 from the unique references.
Using the data block identifier 14 to calculate data chunk identifiers 19, 11-1 - 11-N has the advantage that if the first node 10 loses its data, it needs less information about each data chunk 19, 11-1 - 11-N to determine the other nodes 30-1 - 30-N, 40-1 - 40-N to which they were sent.
-46In another example, the data chunk identifiers 19, 11-1 - 11-N can simply be consecutive numbers starting from one or any other predefined number. Use of simple data chunk identifiers 19, 11-1 - 11-N in this form means the first node 10 does not have to do any work to establish the data chunk identifiers 19, 11-1-11-N which can make them easier to re-establish when a node loses data.
In order to enhance the security of the system/method and ensure that no other nodes 30-1 - 30-N, 40-1 - 40-N can read the contents of data block 12, it is possible to perform a preencryption on the data block 12 before it goes to encryption module 15 for encryption by the encryption function 17. An example of the pre-encryption is shown in Figure 5a. Figure 5 shows first node 10 which may be the same as the first node 10 from the previous Figures. However, Figure 5a only shows the features relevant to the pre-encryption stage. That said, the skilled person would understand that all the steps described above with respect to Figures 1 to 4 can be performed along with the pre-encryption.
As shown in the example in Figure 5a, before the data block 12 is passed to encryption module 15, it can pass through pre-encryption module 70. Pre-encryption module 70 performs an encryption algorithm known as pre-encryption algorithm 72 on the data block 12 to form data block 76 which is a pre-encrypted data block.
In one example, pre-encryption algorithm 72 uses a key 74 to encrypt data block 12. Key 74 can be a private key or a secret key in that it is known only by the first node 10. In other words, pre-encryption algorithm 72 can be a symmetric-key encryption algorithm that uses the same key i.e. key 74 for encryption and decryption. Any suitable symmetric-key encryption algorithm can be used such as Twofish, Serpent, AES, Blowfish, CAST5, Kuznyechik, RC4, 3DES, Skipjack, Safer+/++ and IDEA. When key 74 is a private key there is not necessarily any corresponding public key. However, in some examples, key 74 can be a private key and there can be a corresponding public key 78 which is not used as part of pre-encryption algorithm 72. In such an example, public key 78 is used for verifying the identity of the first node 10 if this is ever required.
In another example pre-encryption algorithm 72 can use a public key 78 to encrypt the data block 12. The private key 74 is used if and when the first node 10 needs to decrypt the preencrypted data block 76. In this example, the pre-encryption algorithm 72 can be any suitable public-private key encryption algorithm or system such as EIGamal, elliptical curve
-47cryptography, lattice-based cryptography, McEliece cryptosystem, multivariate cryptography, Paillier cryptosystem, RLCE, RSA, and Cramer-Shoup cryptosystem or any other suitable cryptographic system. In this example the public key 78 may also be used to verify the identity of the first node 10 if this is ever required.
Once the pre-encryption module 70 has run pre-encryption algorithm 72, the resultant preencrypted data block 76 is passed to the encryption module 15 and the method proceeds as described with respect to previous Figures.
In all of the examples discussed above, it is noted that the first node 10 now has two secret keys, private/secret key 74 and distribution key 22. These two keys can be completely independent of each other. However, in another example they can be connected. For example distribution key 22 may be derived from private/secret key 74 in any suitably secure fashion using any suitable secure key derivation technique such as key derivation functions: PBKDF1, PBKDF2, PKCS or any other suitable key derivation function. Alternatively, both distribution key 22 and private key 74 can be derived from another private key held by the node. This reduces the number of keys the first node 10 needs to store. Furthermore, as discussed later the first node 10 will need the distribution key 22 and where pre-encryption is used the private/secret key 74 to re-accesses its data. It is possible to store such secret data at a trusted and secure third party server. This ensures such data can be accessed when the first node 10 loses all its data. Reducing the number of secret/private keys the first node 10 uses reduces the amount of data it cannot lose or which needs to be stored securely at a trusted third party.
The use of a pre-encryption algorithm further enhances the security of the data block 12 since even if the other nodes 30-1 - 30-N, 40-1 - 40-N somehow managed to obtained all of the data chunks 18, 16-1 - 16-3 they would still not be able to read the data in encrypted data block 18.
Figure 5b shows a post-encryption algorithm that can be performed on a data chunk 86 i.e. encrypted data block 18 and/or encryption keys 16-1 - 16-N. In Figure 5b, the postencryption is being performed on a single data chunk 86 but the skilled person would understand that this is merely exemplary and that all data chunks 18, 16-1 - 16-N may undergo post-encryption. This example is not illustrated in Figure 5b to reduce the complexity of the Figure.
-48Figure 5b shows a post-encryption module 80 that performs an encryption algorithm known as post-encryption function 82. Post encryption module 80 receives the data chunk 86 to be encrypted and set of nodes 24 representing the nodes 30-1 - 30-N to which the data chunk 86 should be sent. This set of nodes 24 may already be edited to contain only T nodes. The post encryption module 80 obtains from each node 30-1 - 30-N in the set of nodes 24 a public key 85-1 - 85-N. These public keys 85-1 - 85-N may be obtained directly from the nodes 30-1 - 30-N or from any party, including parties at the first node 10, who have access to them. Post-encryption module 80 copies the data chunk 86 so that there is a copy of the data chunk 86 for each node 30-1 - 30-N to which the data chunk 86 will be sent. Post-encryption module 80 then performs an encryption function 82 on each copy of the data chunk 86, wherein each copy is encrypted using the public key 85-1 - 85N of the node 30-1 - 30-N to which it will be sent. This results in the formation of postencrypted data chunks 90-1 - 90-N. Nodes 30-1 - 30-N have a private key 89-1 - 89-N which corresponds to their public key 85-1 - 85-N which they can then use to decrypt the copy of the data chunk 86 they receive using a decryption module 87-1 - 87-N.
The addition of this post encryption ensures that only the node 30-1 - 30-N for which a data chunk 86 is intended can access the content of the data chunk 86. This reduces the risk of a node or any other third party intercepting data chunks 18, 16-1 - 16-N intended for several other nodes 30-1 - 30-N.
The number of nodes T to which a data chunk 18, 16-1 - 16-N is to be sent can be determined in a number of ways. In the simplest example, T can simply be a constant that is agreed between the nodes 10, 30-1 - 30-N, 40-1 - 40-N.
However, in the main example, T is determined based upon a redundancy factor, rf, which is calculated based on the probability of data at a node being permanently unavailable and the probability of data at a node being temporarily unavailable. Example pseudo-code for calculating the redundancy factor, rf, and T is provided in Appendix One. A technique for calculating T is also discussed below.
Data at a node is considered permanently unavailable if the node is: (a) malicious, (b) has lost its data, or (c) is permanently offline. Data at a node is considered temporarily
-49unavailable if the node is offline when the first node 10 wishes to send or obtain data from that node.
The probability of that data chunk being permanently unavailable i.e. all the nodes to which it was sent are permanently unavailable is given by:
pjj 1 uchunk
PVnode x Tt rf \rf) wherein PUchUnk is the probability of a data chunk being permanently unavailable, PUnOde is the probability of a node being permanently unavailable, Tt is the total number of nodes available in the system including online and offline nodes and rf is the redundancy factor.
The probability of a data chunk being temporarily unavailable, i.e. the chunk is unavailable but at least one node is only temporarily unavailable is given by:
PV chunk
where TUChunk is the probability of a data chunk being temporarily unavailable, and TUnOde is the probability of a node being temporarily unavailable.
The probability of a node being temporarily or permanently unavailable can be estimated or determined from past behaviour of the nodes.
In the embodiment where the encryption keys 16-1 - 16-N are also sent to nodes, all of the data chunks 18, 16-1 - 16-N e.g. encryption keys 16-1 - 16-N and the encrypted data block 18, that form a data block 12 need to be available for a data block 12 to be considered available. As such, when one data chunk 18, 16-1 - 16-N is unavailable, the data block 12 is unavailable. Therefore, the probability of a data block 12 being unavailable is given by:
-50PUblock ~
PU chunk x Pc Nc (Tc\ W
Nc-1 (PUchunk x TA (Tt — PUchunk X Tc
V \ Nc-n J k n
Tc
Nc where PUbiOck is the probability of a block being permanently unavailable, Tc is the total number of chunks distributed i.e. the number of data blocks distributed multiplied by the number of data chunks that form a block excluding any redundencyand Nc is the number of chunks in a data block excluding any redundancy.
The probability of a data block 12 being temporarily unavailable is given by:
TUbi0Ck
TUchunk x TA
where TUbiOck is the probability of a block being temporarily unavailable, OChunk = Tc - Pl TL, Pl is the number of permanently unavailable data chunks and TL is the number of temporarily unavailable data chunks.
In all above equations the notation represents the binomial distribution i.e.
ra\ _ a!
kb) ~ b! (a! — b!)
Given the above equations, rf can be calculated based on an acceptable probability for permanent and temporary loss of a data block and/or data chunk. The acceptable probability can be determined based on the importance of the data and also the importance of being able to access the data at the required time. For example, if the data is significant and likely to be needed for immediate use the probability of both permanent and temporary unavailability could be low e.g. less than 0.001%. However, if the data is important but a user is likely to be happy to wait for return of the data, then the acceptable probability of permanent availability may be chosen to be low e.g. less than 0.001% but the acceptable probability of temporary unavailability may be higher e.g. 1% or greater. An example pseudo-code for calculating rf given the above equations is shown in Appendix One.
- 51 However, the skilled person would understand that other alternative methods for calculating rf from the above equations would also be suitable.
The redundancy factor, rf, above considers the number of nodes at which data needs to be stored. However, the nodes may also be offline when data is sent to the nodes. If data is sent to T nodes it is important that it will be stored at, at least, rf of these nodes. This is done using the equation:
q (Tt ~ OA / Of \
Σ\Μ + η) \q-nJ ( Tt \ n~° \M + q) which is then solved for Min(q) where F(q) > PR where PR is the acceptable probability of M nodes being able to store the data chunk if T nodes are queried, and M is the redundancy factor rf. T can then be found from this equation since T = M +q. PR can be a pre-defined probability based on what is considered an acceptable probability of M nodes storing the data chunk. This can be pre-defined by the nodes among themselves based on the importance of data being stored. In one example PR is 0.999. However, the value of PR can vary dependent upon the importance of the data.
Calculating the numbers of nodes T to which each data chunk 18, 16-1 - 16-N is to be sent can prevent a data chunk 18, 16-1 - 16-N from being sent to too many nodes 30-1 - 30-N, 40-1 - 40-N. It also makes it easier to edit sets 24-1 - 24-N+1 and, as discussed later, allows monitoring of how much data a node 10, 30-1 - 30-N, 40-1 - 40-N is storing to prevent flooding of the system.
Data block 12 is sent from the first node 10 to several other nodes 30-1 - 30-N, 40-1 - 40N for storage. Therefore, once the data block 12 is returned from the other nodes 30-1 30-N, 40-1 - 40-N it is desirable to be able to verify, at least probabilistically, that the other nodes 30-1 - 30-N, 40-1 - 40-N have not modified the content of the data block 12. When one-time pad cryptography is used, this verification can be somewhat implicit since if one node 30-1 - 30-N, 40-1 - 40-N modifies the data chunk 18, 16-1 - 16-N it is storing, then decryption of the encrypted data block 18 will no longer be possible without returning meaningless data. However, it is also possible to use additional verification methods and use of these methods also allows verification when the encryption keys 16-1 - 16-N are not distributed for storage. How such verification methods are used to verify the returned data is discussed later. The present discussion instead focuses on the pre-processing required
- 52 before data chunks 18, 16-1 - 16-N are sent to other nodes 30-1 - 30-N, 40-1 - 40-N for storage to enable such verification.
One form of pre-processing that can be performed to allow later verification is the use of a hash tree or private block-chain. In such a case the block-chain is private in that it is only accessible to the first node 10. Figure 6 shows an example of such a hash tree or private block-chain. Figure 6 shows only the arrangement of data blocks 120-1 - 120-N such as data block 12 into a hash tree or block-chain 126, it does not show any of the subsequent data processing that is performed on the data blocks. 120-1 - 120-N. However, the skilled person would understand that once the data blocks shown in Figure 6 have been arranged in the block-chain 126, they can be processed using any of the techniques described above and sent to other nodes 30-1 - 30-N, 40-1 - 40-N for storage. The data blocks can be arranged in the block-chain 126 by hash module 122.
Figure 6 shows four data blocks 120-1 - 120-N. While four data blocks are shown in Figure 6, the skilled person would understand that more or less data blocks can be used and four data blocks are illustrated merely as an example. The data blocks 120-1 - 120-N together make up information 110 which the first node 10 wants to send to other nodes 301 - 30-N, 40-1 - 40-N for storage. This information 110 can be split into the data blocks 120-1 - 120-N in any suitable way. For example, the information 110 can simply be split into a desired number of data blocks of the same size without any other processing of the information 110. In another example, the desired size of the data blocks may be known and the information 110 split into as many data blocks of the desired size as required without any other processing.
Once the information 110 has been split into data blocks 120-1 - 120-N, the data blocks are arranged in a hash tree 126 which can otherwise be known as a private block-chain. Arranging the data blocks 120-1 - 120-N in the hash tree does not necessarily involve any physical rearrangement of the data blocks 120-1 - 120-N. To place the data blocks 120-1 120-N in a private block-chain 126, the data blocks 120-1 - 120-N are first considered to form an ordered list. Once again placing the data blocks 120-1 - 120-N in an ordered list does not necessarily involve rearranging or physically moving the data blocks 120-1 - 120N, instead it involves establishing a relationship between the data blocks 120-1 - 120-N. In the example shown in Figure 6 data block 120-1 can be considered to be the first data
-53block in the block chain and data block 120-N can be considered to be the final data block in the block chain.
Once the data blocks 120-1 - 120-N have been placed in the ordered list, it is possible to turn the list into the private block-chain/hash tree 126. This involves taking a hash 128-1 128-N-1 of the contents of each data block other than the final data block, i.e. data blocks 120-1 - 120-N-1 (i.e. 120-3) and storing the results of that hash 128-1 - 128-N-1 in the subsequent data block 120-2 - 120-N in the ordered list. For example a hash is taken of data block 120-1 and this hash 128-1 is stored in data block 120-2, a hash 128-2 is taken of data block 120-2 and this is stored in data block 120-3 etc. The hash can be taken using hash function 124 at hash module 122. Figure 6 shows the data blocks 120-1 - 120-N as inputs to the hash module 122 and hashes 128-1 - 128-N as outputs. The lines are labelled A to C to reflect how the hash module 122 takes the hash of a data block 120-1 120-N and places it in a subsequent data block 120-1 - 120-N. In some examples the hash 128-2 of the second data block 120-2 can be taken on the entire contents of the data block including the hash 128-1 from the first data block 120-1. However, in other examples the hash 128-2 of the second data block 120-2 is taken on the contents of the second data block 120-2 but excluding any contents added as a hash 128-1 of the first data block 120-1. The hash function 124 used to calculate the hash can be any suitable hash function such as SHA1, SHA2, SHA3 or any other suitable hash function. In one example, the output of the hash function 124 is smaller than the contents of the data block 120-1 - 120-N being hashed. This prevents a significant increase in the size of the data blocks 120-1 - 120-N when they are arranged in the hash tree.
When the hash tree 126 described above is used in conjunction with a pre-encryption stage, the pre-encryption can be performed before or after the data blocks 120-1 - 120-N have been arranged in the private block-chain. Similarly, while in the example above it is assumed that encryption function 17 is performed after arranging the data blocks 120-1 120-N in the hash tree, the skilled person would understand that it is also possible to perform the encryption function 17 before the data blocks 120-1 - 120-N are arranged in the private block-chain.
When multiple data blocks 120-1 - 120-N, the data block identifiers 14 can simply be consecutive numbers such as the numbers representing their position in the ordered list.
-54However, they may also be chosen in a more complicated fashion to make them harder for a third party to guess.
Given that multiple data blocks 120-1 - 120-N are likely to be sent for storage arranging them fist in the private block-chain is an efficient way to allow for verification. However, in some cases information 110 is formed into only a single data block 12. This may be because information 110 is of a suitable size to send as a single data block 12 or because the first node 10 requires all the information 110 to be stored as a single data block 12 for any reason.
An additional or alternative verification technique can be used in a single data block 12 scenario, as well as when multiple data blocks 12 are used. In this case, the hash module 122 and hash function 124 works out the hash of the data block 12 and the hash of the data block is used as the data block identifier 14. This allows the first node 10 to verify the contents of the data block 12 from the data block identifier 14.
The skilled person would understand that the above pre-processing methods for data methods are not the only such methods that can be used and any alternative data preprocessing for verification can also be used. In such a case, any suitable processing to enable later use of the data verification method can be performed on the data chunks 18, 16-1 - 16-N or data blocks 120-1 - 120-N before sending data chunks 18, 16-1 - 16-N to other nodes 30-1 - 30-N, 40-1 - 40-N for storage. For example the data blocks 120-1 120-N could be verified using message authentication codes of MACs or by publically publishing a hash of each data block 120-1 - 120-N such that the first node 10 can retrieve these when necessary.
The above describes how data and in particular data block 12 can be distributed from a first node 10 to a plurality of other nodes 30-1 - 30-N, 40-1 - 40-N for storage in such a fashion that no third party observer or other node 30-1 - 30N, 40-1 - 40-N can work out where the data block 12 has been sent. Such a technique enables a node 10 to distribute both a data block 12 and the encryption keys 16-1 - 16-N to nodes 30-1 - 30-N, 40-1 - 40-N without a third party or other node 30-1 - 30-N, 40-1 - 40-N being able to recover the data block 18 and the encryption keys 16-1 - 16-N.
-55While the above focuses on the first node 10 distributing data, the skilled person would understand that each node 30-1 - 30-N, 40-1 - 40-N can act as a first node 10 when distributing its own data and in such a scenario first node 10 would act as an other node. When distributing data in such a fashion it is beneficial to have a method to monitor how much data each node is distributing to prevent a malicious node flooding the other nodes with a large number of data chunks 18, 16-1 - 16-N. This is because malicious flooding can overwhelm any storage capacity at other nodes 30-1 - 30-N, 40-1 40-N and thus limit what non-malicious other nodes 10, 30-1 - 30-N, 40-1 - 40-N can store. While it is desirable to monitor how much data each node is storing, it is also important that no detail on where data has been sent is revealed in order to avoid compromising the security of the data block 12.
Discussed below is a technique to allow monitoring of nodes 10, 30-1 - 30-N, 40-1 - 40-N to prevent flooding and also to allow each node 10, 30-1 - 30-N, 40-1 - 40-N to know when another node goes offline. Such a technique would be particularly beneficial in relation to the above described data storage because it does not involve revealing at which nodes 301 - 30-N, 40-1 - 40-N data has been stored. However, the technique below could also be combined with other known techniques for determining where to send data for storage and still have an advantageous effect.
Figure 7 shows a first node 10 that is distributing data to several other nodes 30-1 - 30-N, 40-1 - 40-N and receiving acknowledgments 202 from the other nodes 30-1 - 30-N, 40-1 40-N when the other nodes 30-1 - 30-N, 40-1 - 40-N receive data. Figure 7 does not show how encrypted data block 18 is formed or how the other nodes 30-1 - 30-N, 40-1 - 40-N to which data is sent are determined. However, the skilled person would understand that Figure 7 can be used with the data distribution method discussed above with respect to Figures 1 to 6.
In Figure 7 data chunks 210-1 - 210-N are sent from first node 10 to other nodes 30-1 30-N, 40-1 - 40-N for storage. The data chunks 210-1 - 210-N can be the encrypted data block 18 and encryption keys 16-1 - 16-N discussed above. The data chunks 210-1 - 210N are each sent to T other nodes for storage wherein the nodes that each data chunk is to be sent to are determined by a node determining module (not shown) such as node determining module 20. A distribution module 228 distributes each data chunks 210-1
-56210-N to the T other nodes using distribution function 229. Distribution module 228 can be distribution module 28 and distribution function 229 can be distribution function 29.
When a node from other nodes 30-1 - 30-N, 40-1 - 40N receives a data chunk 210-1 210-N from the first node 10, it responds by providing the first node 10 with an acknowledgment of receipt 214-1 - 214-N, 216-1 - 216-N. In one example, these acknowledgments of receipt 214-1 - 214-N are encrypted using a public key 78 of the first node 10 before sending. It is noted that while distribution module 228 sends a data chunk e.g. 210-1 from data chunks 210-1 - 210-N to T other nodes e.g. nodes 30-1 - 30-N, such a data chunk 210-1 may not be received by T other nodes. This is because some of the nodes 30-1 - 30-N that distribution module 228 sends the data chunk 210-1 to, may be offline or the data transfer may fail for other reasons. As such, first node 10 does not necessarily receive T acknowledgments 214-1 - 214-N, 216-1 - 216-N for each data chunk 210-1 - 210-N.
First node 10 receives the acknowledgements 214-1 - 214-N, 216-1 - 216-N from the other nodes 30-1 - 30-N, 40-1 - 40-N at an acknowledgment receiving module 204. If these acknowledgments 214-1 - 214-N, 216-1 - 216-N have been encrypted using a public key 78 of the first node 10, the acknowledgement receiving module 204 decrypts them using a corresponding private key 74 of the first node 10. The acknowledgment receiving module 204 then passes these on to an acknowledgment processing module for processing. As shown in Figure 7, server 250 hosts a public ledger 252. Although sever 250 is shown as a single entity, the skilled person would understand that server 250 can be a distributed server. Therefore, public ledger 252 can be a distributed public ledger 252 such as a block chain. The use of a block chain as the public ledger 252 is explained in more detail later. Once acknowledgment processing module 206 has received acknowledgments 214-1 214-N, 216-1 - 216-N it publishes these acknowledgments 214-1 - 214-N, 216-1 - 216-N in a transaction 254-1 on the public ledger 252. The public ledger 252 is public in that it is accessible to all the nodes in the system i.e. at least first node 10 and other nodes 30-1 30-N, 40-1 - 40-N. The public ledger 252 is not necessarily accessible to a third party who is not a node in the system. Publishing the acknowledgments on the public ledger 252 enables all nodes 10, 30-1 - 30-N, 40-1 - 40-N to monitor how much data is being sent for storage and take action if the system is flooded.
-57As mentioned previously, the first node 10 may have an associated identifier 35 which can be considered to be a node identifier. First node 10 may also have a public key 78 that can be used to verify the identity of the node 10. In one example, when the acknowledgment processing module 206 publishes the acknowledgments 214-1 - 214-N, 216-1 -216-N in a transaction 254-1 on the public ledger 252, it also publishes some information that allows identification of the node 10 i.e. one or both of the associated identifier 35 and the public key 78 in that transaction 254-1. This enables the first node 10 to search for the transaction 254-1 it has published on the public leger 252. It also allows any node or other party monitoring the public leger 252 to identify if any node 10, 30-1 - 30-N, 40-1 - 40-N is publishing an unusually large number of transactions on the public leger 252.
The acknowledgment processing module 206 may also or alternatively publish the data block identifier 14 as part of the transaction 254-1 containing the acknowledgments 214-1 214-N, 216-1 - 216-N. The data block identifier 14 is the identifier of the data block 12 associated with the data chunks 18, 16-1 - 16-N being stored. In one example, the data block identifier 14 is a hash of the contents of the data block. In another example, as well as publishing the data block identifier 14, the acknowledgement processing module 206 also publishes a hash of the data block 12. As discussed later, publishing the block identifier 14 has the advantage that when the first node 10 has lost its data it can easily recover the data block identifier 14. In other examples the data chunk identifiers 19, 11-1 11-N may be published instead of the data block identifier 14. However, given the data chunk identifiers 19, 11-1 - 11-N can be calculated from the data block identifier 14, this is not essential. Use of either the data block identifier 14 or the data chunk identifiers 19, 111 - 11-N would allow the first node 10 to recover these pieces of information after complete data loss.
While the above assumes all the acknowledgments 214-1 - 214-N, 216-1 - 216-N related to a data block 12 are published in the same transaction 254-1, this is not necessarily the case. In particular, as established above when more than one data chunk 18, 16-1 - 16-N is sent out for a data block 12 each data chunk 18, 16-1 - 16-N is sent to T other nodes 301 - 30-N, 40-1 - 40-N for storage. In such a scenario, either the acknowledgement receiving module 204 or the acknowledgment processing module 206 can sort the acknowledgments of receipt 214-1 - 214-N, 216-1 - 216-N based on the data chunk 18, 161 - 16-N to which they relate. The acknowledgments 214-1 - 214-N, 216-1 - 216-N
-58related to each data chunk 18, 16-1 - 16-N are then published in separate transactions 254-1 - 254-N.
Figure 7 shows an example where a first data chunk 18 is sent to nodes 30-1 - 30-N and second data chunk 16-1 is sent to nodes 40-1 - 40-N. The skilled person would understand that more data chunks 16-2 - 16-N can be sent to nodes for storage and that these are not illustrated in Figure 7 for clarity. Returning to Figure 7, acknowledgments 214-1 - 214-N are received from nodes 30-1 - 30-N in relation to the first data chunk 18 while acknowledgments 216-1 - 216-N are received from nodes 40-1 - 40-N in relation to the second data chunk 16-1. While in Figure 7 all nodes 30-1 - 30-N, 40-1 - 40-N return an acknowledgment 214-1 - 214-N, 216-1 - 216-N, the skilled person would understand that this is not necessarily the case. In particular, one or more of nodes 30-1 - 30-N, 40-1 - 40-N may be offline or the communication of either the data chunk 18, 16-1 or the acknowledgement 214-1 - 214-N, 216-1 - 216-N may be disrupted for some reason. The acknowledgment receiving module 204 receives the acknowledgments 214-1 -214N, 2161 - 216-N from the nodes 30-1 - 30-N, 40-1 - 40-N. These acknowledgments are then published by the acknowledgment processing module 206 on the public ledger 252 as two transactions 254-1, 254-2 wherein the first transaction 254-1 corresponds to the acknowledgements 214-1 - 214-N received from the first T nodes 30-1 - 30-N with respect to the first data chunk 18 and the second transaction 254-2 corresponds to the acknowledgments 216-1 - 216-N received from the second T nodes 40-1 - 40-N with respect to the second data chunk 16-1. The acknowledgments 214-1 - 214-N, 216-1 216-N are associated with the relevant data chunk 18, 16-1 by either the acknowledgment receiving module 204 or the acknowledgment processing module 206.
The above features have the advantage that it is possible to see how many other nodes 30-1 - 30-N, 40-1 - 40-N a first node 10 is sending a data chunk to. This makes it easier to identify when a node, such as first node 10, is trying to flood the system by sending a data chunk to too many other nodes 30-1 - 30-N, 40-1 - 40-N. Other nodes 30-1 - 30-N, 40-1 - 40-N or a registrar 260 can monitor the public ledger 252 and determine when a transaction 254-1 - 254-N contains too many entries then decide whether to take action.
As mentioned previously, although first node 10 sends each data chunk 18, 16-1 - 16-N to T other nodes 30-1 - 30-N, 40-1 - 40-N it does not necessarily receive an acknowledgment 214-1 - 214-N, 216-1 - 216-N from all of the other nodes 18, 16-1 - 16-N. This can be
-59because one or more of the other nodes it sends data to is offline or because the data transfer of either the data chunk 18, 16-1 - 16-N or the acknowledgment 214-1 - 214-N, 216-1 - 216-N failed. To avoid having each transaction 254-1 - 254-N contain a different number of acknowledgments, the acknowledgment processing module 206 can publish only M acknowledgments for each of the T nodes 30-1 - 30-N, 40-1 - 40-N to which data was sent. M is chosen based on the minimum number of nodes that must store data in order to achieve a desired level of availability for the data. The acknowledgment processing module 206 can select the M nodes 30-1 - 30-N, 40-1 - 40-N for which to publish an acknowledgment 214-1 - 214-N, 216-1 - 216-N in a transaction 254-1 - 254-N in any suitable way. For example, it could publish the first M acknowledgments 214-1 214-N, 216-1 - 216-N received or it could randomly choose M acknowledgements 214-1 214-N to publish.
When only M acknowledgments per data chunk 18, 16-1 - 16-N are published, any nodes 30-1 - 30-N, 40-1 - 40-N who received a data chunk 18, 16-1 - 16-N (and potentially sent an acknowledgement 214-1 - 214-N, 216-1 - 216-N) but whose acknowledgment 214-1 214-N, 216-1 - 216-N was not published on the public register 252 do not need to store the data chunk 18, 16-1 - 16-N. As such, when a node 30-1 - 30-N, 40-1 - 40-N receives a data chunk 18, 16-1 - 16-N from first node 10, it monitors the public ledger 252 using a register monitoring module 342 to find transactions 254-1 - 254-2 that have been published by the first node 10. If one of these transactions 254-1 - 254-2 contain its acknowledgment 214-1 - 214-N, 216-1 - 216-N then it keeps the data chunk 18, 16-1 - 16-N it was sent. However, if none of these transactions contain its acknowledgment 214-1 - 214-N, 216-1 216-N it deletes the data chunk 18, 16-1 - 16-N it was sent.
In one example M is the redundancy factor, rf, calculated above in the section related to T. In other examples, the nodes may agree between them to have an M that is greater than the redundancy factor, rf. However, other methods of calculating M can be envisaged.
A registrar 260 or any node 10, 30-1 - 30-N, 40-1 - 40-N can monitor the public ledger 152 and if any node 10, 30-1 - 30-N, 40-1 - 40-N attempts to publish a transaction 254-1 254-N containing more than M acknowledgements per data chunk 18, 16-1 - 16-N then the transaction 254-1 - 254-N can be rejected. In the example where each transaction 254-1 254-N represents acknowledgments 214-1 - 214-N, 216-1 - 216-N related to a single data chunk 18, 16-1 - 16-N then any transaction 254-1 - 254-N containing more than M
-60acknowledgments can indicate a node 10, 30-1 - 30-N, 40-1 - 40-N is publishing too many acknowledgments 214-1 - 214-N per data chunk 18, 16-1 - 16-N and be rejected. In contrast, if each transaction contains acknowledgments for all the data chunks 18, 16-1 16-N in a data block 12, then a transaction 214-1 - 214-N will likely contain more than M acknowledgments. In such a case, an agreement can be reached between the registrar 260 and nodes 10, 30-1 - 30-N, 40-1 - 40-N on the maximum number 256 of encryption keys 16-1 - 16-N that can be used for encrypting a data block 12. If a transaction 214-1 214-N contains more than M multiplied by this maximum number 256 of acknowledgements 214-1 - 214-N, this will indicate the node 10, 30-1 - 30-N, 40-1 - 40-N is publishing too many acknowledgments 214-1 - 214-N.
Monitoring the number of acknowledgements 214-1 - 214-N per transaction 254-1 - 254-N and then taking action in this fashion helps prevent any nodes 10, 30-1 - 30-N, 40-1 - 40-N from flooding the system.
The number of transactions 254-1 - 25-4N a node 10 publishes can also be monitored. As mentioned above, when a node 10, 30-1 - 30-N, 40-1 - 40-N publishes a transaction 254-1 - 254-N it can include in the transaction 254-1 - 254-N some form of identification e.g. an identifier 35 or a public key 78. Each node 10, 30-1 - 30-N, 40-1 - 40-N can have a budget 220 representing how many data chunks 18, 16-1 - 16-N it is allowed to send to other nodes 30-1 - 30-N, 40-1 - 40-N for storage. This budget 220 can be calculated in any suitable way. For example, if the size of the data chunks 18, 16-1 - 16-N is fixed, the budget 220 can be a total amount of information each node is allowed to store 30-1 - 30-N, 40-1 - 40-N e.g. 2 Gb divided by the size of a data chunk 18, 16-1 - 16-N. Alternatively, the budget 220 could be a fixed number of data chunks 18, 16-1 - 16-N independent of the size of the data chunks. Alternatively, the number of data chunks 18, 16-1 - 16-N each node is allowed to store may vary dependent upon the size of data chunks 18, 16-1 - 16-N a node 10, 30-1 - 30-N, 40-1 - 40-N is trying to store. The skilled person would understand that the budget 220 varies dependent upon the form of the nodes 10, 30-1 30-N, 40-1 - 40-N in the system and the function of the system. For example if the nodes 10, 30-1 - 30-N, 40-1 - 40-N are all large servers with a large memory then the budget 220 can be high. Alternatively, if the nodes 10, 30-1 - 30-N, 40-1 - 40-N are low memory devices, the budget can be low.
-61 If any node 10, 30-1 - 30-N, 40-1 - 40-N attempts to publish transactions 254-1 - 254-N indicating it is storing more data chunks 18, 16-1 - 16-N than allowed by its budget, then the transaction 254-1 - 254-N and future transactions 254-1 - 254-N can be rejected.
Once again, this limiting of the number of transactions 254-1 - 254-N and taking action if a node 10, 30-1 - 30-N, 40-1 - 40-N publishes too many transactions 254-1 - 254-N can help identify if a node 10, 30-1 - 30-N, 40-1 - 40-N is trying to flood the system and prevent this from happening in future.
Figure 8 shows a first node 10 that is distributing a data chunk 310 to other nodes 330-1 330-3. Figure 8 can be a first node 10 as described with respect to previous Figures 1 to 7. However, for simplicity Figure 8 shows only the details of the distribution of the data chunk 310. As such, details about how data chunk 310 is obtained and how to determine the nodes 330-1 - 330-3 to which that data chunk 310 should be distributed are omitted from Figure 8. The skilled person would understand that such steps can be performed as mentioned previously. While Figure 8 shows the data chunk 310 being sent to three nodes 330-1 - 330-3, the skilled person would understand that this number of nodes is merely exemplary and the data chunk 310 could be sent to more or less nodes.
Figure 8 shows storage request module 302 which generates storage requests 312-1 312-3 wherein the storage request module 302 generates a different storage request 312-1
- 312-3 for each node 330-1 - 330-3 to which data chunk 310 is being sent. As mentioned above, each data chunk 310 is sent to T nodes so T storage requests 312-1 - 312-3 should be generated. In Figure 8, T is shown to be three but T could be higher or lower than three without any significant modification of the method described below. Storage request module 302 has a token generator 320 which generates a token 322-1 - 322-3 for each storage request 312-1 - 312 -3. As such, the token generator 320 generates T tokens.
The tokens 322-1 - 322-3 for the storage requests 312-1 - 312-3 differ from each other. The tokens 322-1 - 322-3 can be calculated or generated in any suitable way. For example the tokens 322-1 - 322-3 can be determined using a random or pseudo-random number generator or they can follow a pattern known by the first node 10.
Once the storage request module 302 has generated the required number of tokens 322-1
- 322-3, a combiner 324 at the storage request module 302 combines each token 322-1 322-3 with the public key 75 of the first node 10. While several different combinations are
-62suitable, in one example the combiner simply appends the public key 75 to each token. In other examples the tokens 322-1 - 322-3 can be modified by the public key 75, for example by adding or subtracting the public key 75 from the tokens 322-1 - 322-3. The results of the combiner 324 are sent to a signer 326 also at the storage request module 302. The signer or signer module 326 uses a private key 360 of the first node 10 to sign the results of the combiner 324 with signature 327. This results in an unencrypted storage request 328-1 - 328- 3 for each token 322-1 - 322-3/desired storage request 312-1 - 312-
3.
The unencrypted storage requests 328-1 - 328-3 are then encrypted by encrypter/encryption module 329 dependent upon the node 330-1 - 330-3 to which they are being sent. To this end the encrypter 329 receives as an input set of nodes 24-1 or any other set indicating which nodes 330-1 - 330-3 the data chunk 310 should be sent. The encryption module 329, otherwise known as encrypter 329, obtains from each node in the set of nodes 24-1 a public key 332-1 - 332- 3. These public keys 332-1 - 332-3 may be obtained directly from the relevant nodes 330-1 - 330-3. However, they may also be obtained from any third party, including a third party at the first node 10, who has access to the public keys 332-1 - 332-3. Encrypter 329 then encrypts the unencrypted storage requests 328-1 - 328-3 using the obtained public keys 332-1 - 332-3 to produce storage requests 312-1 - 312-3. Each unencrypted storage request 328-1 - 328-3 is encrypted using a public key 332-1 - 332-3 for one of the nodes 330-1 - 330-3. For example, unencrypted storage request 328-1 is encrypted using the public key 332-1 of node 330-1 to form storage request 312-1, unencrypted storage request 328-2 is encrypted using the public key 332-2 of node 330-2 to form storage request 312-2 and unencrypted storage request 328-3 is encrypted using the public key 330-3 of node 330-3 to form storage request 312-3.
After storage request module 302 has created storage requests 312-1 - 312-3, the storage requests 312-1 - 312-3 are passed to a node distribution module 428 such as node distribution module 28 for distribution to the nodes. The node distribution module 428 distributes the storage requests 312-1 - 312-3 along with the data chunk 310 to which the storage requests 312-1 - 312-3 relates. The data distribution module 428 distributes the storage requests 312-1 - 312-3 such that each node 330-1 - 330-3 receives the storage request 312-1 - 312-3 encrypted using its public key 332-1 - 332-3. For example, along with data chunk 310, node distribution module 428 sends: storage request 312-1
-63(encrypted using public key 332-1) to node 330-1, storage request 312-2 (encrypted using public key 332-2) to node 330-2 and storage request 312-3 (encrypted using public key 332-3) to node 330-3.
As mentioned above data distribution module 428 sends both a data chunk 310 and a storage request 312-1 - 312-3 to each node 330-1 - 330-3. These are then received at each node 330-1 - 330-3 by a data receiving module 334-1 - 334-3. The data chunk 310 is at least temporarily stored at each node 330-1 - 330-3. At least temporarily refers to the fact that for each node 330-1 - 330-3, the storage of the data chunk 310 may become permanent. A unencrypter 336-1 - 336-3 at each node 330-1 - 330-3 decrypts the storage request 312-1 - 312-3 sent to that node 330-1 - 330-3 using a private key 338-1 - 338-3 of that node 330-1 - 330-3. The unencrypter 336-1 - 336-3 otherwise known as an unencryption module can use the public key 75 of the first node 10 and/or the signature 329 used to sign the storage request 312-1 - 312-3 to identify the node 10 that sent the storage request 312-1 - 312-3 and data chunk 310 and verify that the storage request 3121 - 312-3 came from the node 10.
The unencrypter 336-1 - 336-3 passes the token 322-1 - 322-3, from the storage request 312-1 - 312-3, to a token sending module 340-1 - 340-3. Token sending module 340-1 340-3 then returns the token 322-1 - 322-3 to the first node 10 as an acknowledgment of receipt i.e. acknowledgment of receipt 214-1 - 214-N. In one example the token sending module 304-1 - 340-3 sends the token along with an additional string 346-1 - 346-N wherein additional string 346-1 - 346-N can be a random string. When the token sending module 340-1 - 340-3 sends the token 322-1 - 322-3 and, where applicable, the additional string 346-1 - 346-N it can be considered to be sending an acknowledgment of receipt 214-1 - 214-N. In one example, these acknowledgments of receipt 214-1 - 214-N are encrypted using a public key 78 of the first node 10 before sending and upon receiving them, the first node 10, decrypts them using a corresponding private key 74.
As mentioned above, the first node 10 then publishes these acknowledgments of receipt 214-1 - 214-N on the public register 252 (not shown in Figure 8). In one example this comprises publishing the additional strings 346-1 - 346-3 on the public register 252. In another example the tokens 322-1 - 322-3 could be published on the public register 252 either instead of or along with the additional strings 346-1 - 346-3. The first node 10 can publish the acknowledgments of receipt 214-1 - 214-N on the public register using the
-64method described above. Each node 330-1 - 330-3 has a register monitoring module 3421 - 342-3 that monitors the public register 352. If a node 330-1 - 330-3 finds the additional string 346-1 - 346-3 or token 322-1 - 322-3 it returned to the first node 10 on the public register 252 it stores data chunk 310 in memory 344-1 - 344-3. In contrast, if node monitoring module 342-1 - 342-3 does not find the additional string 346-1 - 346-3 or token 322-1 - 322-3 it returned to the first node 10 on the public register 252 it deletes data chunk 310 from temporary storage.
The above technique of sending proof of storage requests 312-1 - 312-3 containing a token 322-1 - 322-3 and receiving the token 322-1 - 322-3 in return as an acknowledgment of receipt 214-1 - 214-3 enable the first node 10 and other nodes 330-1 330-3 to verify each other identities while still ensuring that no third party intercepting the communications can work out where a data chunk 310 has been stored. The above technique also provides a form of acknowledgment request 312-1 - 312-3 that is hard to fake but which a third party viewing the public register 252 cannot use to identify which nodes 330-1 - 330-3 received the data chunk 310.
The public ledger 252 can be useful for functions beyond the monitoring of nodes 10, 30-1 - 30-N, 40-1 - 40-1 - 40-N to prevent flooding of the system. As shown in Figure 9 each node can have a heartbeat signal generator 402-1 - 402-3, 402-10 that emits a heartbeat signal 404-1 - 404-3, 404-10 at regular time intervals. For example the heartbeat signal 404-1 - 404-3 could be generated every second, every minute, every hour or every day depending on how often nodes 10, 30-1 - 30-N, 40-1 - 40-N go on and offline. The skilled person would understand that any known or standard heartbeat signal 404-1 - 404-3, 40410 could be used in this situation.
The heartbeat signals 404-1 - 404-3, 404-10 are received by a heartbeat receiving module 462 at the registrar 260. The heartbeat receiving module 462 of the registrar 260 monitors the received heartbeat signals 404-1 - 404-3, 404-10. If the registrar 260 does not receive an expected heartbeat signal 404-1 - 404-3, 404-10 for a defined length of time, the heartbeat receiving module 462 instructs a publication module 464 also at the registrar 260 to publish a transaction 254-1 - 254-N on the public register 252 indicating that the node 10, 30-1 - 30-N, 40-1 - 40-N for which it did not receive a heartbeat signal 404-1 - 404-3, 404-10 is permanently offline. The defined length of time may depend upon the form of the node 10, 30-1 - 30-N, 40-1 - 40-N and/or the regularity of the heartbeat signal 404-1 -65404-3, 404-10. For example, if the node 10, 30-1 - 30-N, 40-1 - 40-N is a user desktop computer it may be understood that the node 10, 30-1 - 30-N, 40-1 - 40-N will regularly be offline for several hours a day and may be offline for weeks at a time if the device is not used regularly. In contrast if the node 10, 30-1 - 30-N, 40-1 - 40-N is a mobile phone, it may be expected that the device will be online most of the time and the node 10, 30-1 - 30N, 40-1 - 40-N being offline for a week or even less may lead to the node being deemed permanently offline.
As also shown in Figure 9, each node 10, 30-1 - 30-N, 40-1 - 40-N may have a registrar 260 contacting module 406-1 - 406-3, 406-10. When a node losses all its data, the node uses its registrar contacting module 406-1 - 406-3, 406-10 to alert the registrar 260. The registrar 260 can then publish a transaction 254-1 - 254-N on the public register 252 indicating that the node 10, 30-1 - 30-N, 40-1 - 40-N it received notification from has lost all its data.
The first node 10, or any other node 30-1 - 30-N, 40-1 - 40-N that has sent data for storage monitors the public register 252 with register monitoring module 430. Register monitoring module 430 uses transactions 254-1 - 254-N to determine whether the data it has sent to any of nodes 30-1 - 30-N, 40-1 - 40-N is now permanently unavailable. Data at an other node 30-1 - 30-N, 40-1 - 40-N can be considered permanently unavailable if either the other node 30-1 - 30-N, 40-1 - 40-N has lost its data or if the other node is permanently offline. If the first node 10 has sent data to more than T’ nodes 30-1 - 30-N, 40-1 - 40-N at which data is permanently unavailable, the registrar monitoring module 430 instructs the first node 10 to redistribute its data. The register monitoring module 430 can monitor the nodes 30-1 - 30-N related to each data block 12 or data chunk 18, 16-1 - 16-N and only redistribute the data block 12 or data chunk 18, 16-1 - 16-N which is now permanently unavailable at T’ or more nodes. In one example, redistributing the data block 12 comprises generating new random encryption keys 16-1 - 16-N and then repeating the entire distribution process, including the encryption, again using these new encryption keys 16-1 - 16-N. Alternatively, register monitoring module 430 can redistribute all data in response to data at more than T’ nodes being permanently unavailable. This can involve generating new random encryption keys 16-1 - 16-N and repeating the entire distribution process including the encryption.
-66If a node sees a data block identifier 14 on the public register 252 for which it has previously stored data, it knows to delete that stored data. Alternatively, when first node 10 has to redistribute data, acknowledgement processing module 206 may publish a transaction 254-1 - 254-N on the public ledger 252 informing any nodes still storing the data that is to be redistributed that they can delete their stored data. This prevents nodes 30-1 - 30-N, 40-1 - 40-N from unnecessarily storing out of date data.
T’ can be determined in any suitable way. In one example T’ is a fixed number agreed between the nodes beforehand. Alternatively, if M is greater than rf then T’ could be M - rf to ensure that the data is always stored at rf nodes.
The above description has concentrated on how data is distributed from a first node 10 to multiple nodes 30-1 - 30-N, 40-1 - 40-N for storage. Once a first node 10 has sent data to other nodes 30-1 - 30-N, 40-1 - 40-N for storage, the first node may wish to obtain this data back from storage. In examples where the storage is a backup storage, then the first node 10 may only wish to reobtain the data on rare occasions. However, when the main storage of the data occurs at the other nodes 30-1 - 30-N, 40-1 - 40-N the first node 10 may request the data from storage regularly. Either way the method of obtaining the data is the same and is discussed in more detail below.
As mentioned above, encrypted data block 18 has been sent to T other nodes for storage wherein the nodes that the encrypted data block 18 have been sent to are defined by set 24-1. In some examples, the encrypted data block 18 is stored at M of these T other nodes. To obtain the encrypted data block 18, the first node 10 needs to obtain set 24-1 and request that the nodes 30-1 - 30-N from set 24-1 provide it (the first node 10) with the encrypted data block 18.
Figure 10 shows first node 10 reobtaining data from storage. The first node 10 shown in Figure 10 can be the first node 10 of any previously described Figure. However for simplicity first node 10 shown in Figure 10 only shows the features/modules used to reobtain data not the features/modules used to send data in the first place.
To obtain the previously stored data, node determining module 20 at first node 10 uses node determining function 21 to obtain the set of nodes to which the encrypted data block 18 and, when applicable, any encryption keys 16-1 - 16-N have been sent. The below
-67description will focus on how the set of T nodes to which the encrypted data block 18 was sent is obtained. However the skilled person would understand that the same method is used for all data chunks 18, 16-1 - 16-N that form an unencrypted data block 12.
The node determining module 20, takes the distribution key of the first node 22 and the encrypted data block identifier 19, and uses this to recreate set of nodes 24-1. As mentioned previously, the node determining module 20 takes as inputs the first node distribution key 22 and the encrypted data block identifier 19, and uses the node determining function 21 to calculate a set of other nodes 24. How the identifier 19 for the encrypted data block is obtained is described in more detail below. However, it is noted that in one example the data block identifier 14 can be obtained from a public register and the identifier 19 for the encrypted data block 12 can be worked out from the data block identifier 14.
The node determining function 21 creates a set of other nodes 24-1 that can be recreated provided the first node distribution key 22 and the encrypted block identifier 19 are known but which cannot be recreated by any party that does not know the first node distribution key 22.
Once the node determining module 20 has recreated the set of nodes 24-1 to which the encrypted data block 18 was sent, the set of nodes 24-1 is passed to a data requesting module 502 which performs data requesting function 504. Data requesting function 504 sends a request for the encrypted data block 18 to at least one of the nodes 30-1 - 30-N from the set 24-1 of nodes 30-1 - 30-N to which the encrypted data block 18 was previously sent.
In one example data requesting function 504, at data requesting module 502, sends a request for the encrypted data block 18 to all nodes from set 24-1, or at least all T nodes from set 24-1 to which the encrypted data block 18 was originally sent. However, this can lead to the return of a large number of copies of data block 18. Therefore, in an alternative example data requesting module 502 sequentially requests the encrypted data block 18 from each node 30-1 - 30-N in set 24-1 until one of the nodes 30-1 - 30-N returns the data block 18. At this point, the data requesting module 502 does not send any further requests for the encrypted data block 18. For example, set 24-1 can include node 30-1, node 30-2, node 30-3 and node 30-N in that order. Data requesting function 504 first sends a request
-68for the encrypted data block 18 to node 30-1. If node 30-1 provides the encrypted data block 18 in response to the request, then data requesting function 504 does not send any further requests for that encrypted data block 18. However, if node 30-1 does not provide the encrypted data block 18 in response to the request, then data requesting function 504 sends a request for the encrypted data block to node 30-2. If node 30-2 returns the encrypted data block 18, then data requesting function 504 does not send any further requests, while if node 30-2 does not send the encrypted data block 18, data requesting function 504 requests the encrypted data block from the next node 30-3 in the set 24-1 etc.
When encryption key 16-1 or encryption keys 16-1 - 16-N were also sent to other nodes 40-1 - 40-N for storage, node determination module 20 and data requesting module 502 can also be used to obtain these encryption keys 16-1 - 16-N from storage. These data chunks 16-1 - 16-N can be obtained using the method described above with respect to encrypted data block/data chunk 18.
More specifically, node determining module 20 can take as an input the encryption key identifier 11-1 - 11-N of the encryption key 16-1 - 16-N that the first node 10 is trying to obtain and the first node distribution key 22. The node determining function 21 then determines a set 24-2 - 24-N+1 to which the desired encryption key 16-1 - 16-N was sent. As mentioned above, the node determining function 21 creates a set of nodes 24-2 - 24N+1 that cannot be determined without the first node distribution key 22 but which can be recreated if the identifier 11-1 - 11-N and the distribution key 22 are known.
Once the node determining module 20 has determined set of nodes 24-2 - 24-N+1, it sends the set of nodes 24-2 - 24-N+1 to data requesting module 502 which uses data requesting function 504 to request the encryption key 16-1 - 16-N from the other nodes 401 - 40-N. As mentioned previously, the data requesting function 504 can send a request to all other nodes 40-1 - 40-N from the set 24-2 - 24-N+1. Alternatively, the data requesting function 504 can request the encryption key 16-1 - 16-N from the other nodes 40-1 - 40-N in set 24-2 - 24-N+1 sequentially, stopping when the encryption key 16-1 - 16-N has been obtained.
Node determining function 21 used to recreate sets 24-1 - 24-N+1 should be the same node determining function 21 used to create the sets 24-1 - 24-N+1 in the data distribution process described above. This ensures that the data chunks 18, 16-1 - 16-N are
-69requested from the same nodes 30-1 - 30-N, 40-1 - 40-N that they were originally sent to. Given the node determining function 21 used to recreate sets 24-1 - 24-N+1 is the same function used to calculate them originally, the skilled person would understand that all the properties of the node determining function 21 used to create sets 24-1 -24-N+1 apply to node determining function 21 used to recreate the sets 24-1 - 24-N.
In one example of the above method, either the node determining module 20, or the data requesting module 502, can take as an input T i.e. the number of nodes to which a data chunk was originally sent. The node determining module 20 or the data requesting module 502 then edits set 24-1 - 24-N+1 so that it only contains T entries.
If the set of nodes 24-2 - 24-N+1 to which the encryption keys 16-1 - 16-N were edited to remove overlapping nodes 30-1 - 30-N, 40-1 - 40-N from any previous set 24-1 - 24-N before the encryption keys 16-1 - 16-N were distributed to other nodes 30-1 - 30-N, 40-1 40-N, then a similar editing takes place before the data requesting module 502 requests the encryption keys 16-1 - 16-N from other nodes 30-1 - 30-N, 40-1 - 40-N. This editing can be done by either the node determining module 20 or the data requesting module 502 before the set 24-1 - 24-N+1 being edited, is edited to contain only T nodes.
The request for a data chunk 18, 16-1 - 16-N can include the identifier 19, 11-1 - 11-N for the data chunk 18, 16-1 - 16-N being requested. This is one way to allow the node 30-1 30-N, 40-1 - 40-N receiving the request to identify which data chunk 18, 16-1 - 16-N of the many it may have stored, it should send to the first node 10. When an alternative method of identify the data chunks 16-1 - 16-N is used, then an alternative identifier can be used in placed of the data chunk or data block identifier.
When a node 30-1 - 30-N, 40-1 - 40-N receives a request for a data chunk 18, 16-1 - 16N from the first node 10, it can verify the identity of the first node 10 before sending any data.
In one example, where the nodes 10, 30-1 - 30-N, 40-1 - 40-N have registered with a server, TLS authentication can be used. Other mutual authentication techniques that rely in a certificate may also be appropriate.
-70Alternatively or in addition, verification can be performed without registration at a server. In such a case, when sending a data chunk 18, 16-1 - 16-N for storage the first node 10 needs to ensure its public key 78 is associated with the identifier 19, 11-1 - 11-N for the data chunks 18, 16-1 - 16-N it is storing. This can be done either by sending its public key 78 along with the data chunk 18, 16-1 - 16-N and identifier 19, 11-1 - 11-N when the data chunk 18, 16-1 - 16-N is sent to the other nodes 30-1 - 30-N, 40-1 - 40-N for storage. Alternatively, it can be done by the first node 10 publishing its public key 78 on the public ledger 252 alongside the data chunk identifiers 19, 11-1 - 11-N. When a data block identifier 14 is used instead of data chunk identifiers 19, 11-1 - 11-N as the information sent to an other node 30-1 - 30-N, 40-1 - 40-N to provide a reference for a data chunk 18, 16-1 - 16-N, then the public key 78 of the first node 10 should be associated with this data block identifier 14. While the below concentrates on the data chunk and data block identifiers, any other suitable identifier could be used for a data chunk 18, 16-1 - 16-N without significantly altering the method.
When first node 10 requests data with a data chunk identifier 19, 11-1 - 11-N or data block identifier 14 it sends either the data chunk identifier 19, 11-1 -11-N or data block identifier 14 to the other node 30-1 - 30-N, 40-1 - 40-N.
In return the other node 30-1 - 30-N, 40-1 - 40-N returns a first payload containing:
1) a payload comprising: a random number R, the data chunk or data block identifier and a session key;
2) an internal token comprising the payload encrypted with the public key 332-1 - 332N of the other node 30-1 - 30-N, 40-1 - 40-N; and
3) a signature comprising a hash of the payload and the internal token signed with the private key 338-1 - 338-N of the other node 30-1 - 30-N, 40-1 - 40-N.
The first payload is encrypted using the public key 78 of the node associated with the data chunk identifiers 19, 11-1 - 11-N or data block identifier 14 i.e. the public key 78 of the first node 10.
In response to receiving this first payload, the first node 10, checks the signature from the other node 30-1 - 30-N, 40-1 - 40-N. If the signature is correct, it decrypts the first payload and generates a second payload comprising:
1) a hash of the random number, R; and
2) the internal token contained in the first payload it received.
- 71 The first node 10, encrypts this second payload using the public key 338-1 - 338-N of the other node 30-1 - 30-N, 40-1 - 40-N and returns it to the other node 30-1 - 30-N, 40-1 40-N.
The other node 30-1 - 30-N, 40-1 - 40-N decrypts the second payload and the internal token it received in the second payload. The other node 30-1 - 30-N, 40-1 - 40-N then takes a hash of the random number, R from the decrypted internal token and compares this to the hash sent by the first node 10. If this is correct, the other node 30-1 - 30-N, 40-1 40-N knows the first node 10 has the same public key 78 as the one associated with the data chunk identifiers 19, 11-11-N or data block identifier 14. As such, it has verified the identity of the first node 10. The other node 30-1 - 30-N, 40-1 - 40-N then extracts the session key and the data chunk or data block identifier, and sends the data chunk to the first node encrypted using the session key.
As shown in Figure 10, once the encrypted data block18 and encryption keys 16-1 - 16-N have been received by the data requesting function 504, the data requesting module 502 passes them to decryption module 506 which performs decryption function 508. Decryption function 508 is an inverse of encryption function 17 in that it uses encryption keys 16-1 - 16-N to decrypt encrypted data block 18 to produce unencrypted data block
12.
When encryption function 17 is a one-time pad encryption algorithm, then decryption function 508 is a corresponding one-time pad decryption function. As mentioned with respect to Figures 1 and 2, data block 12 may have been encrypted using a single encryption key 16-1 or multiple encryption keys 16-1 - 16-N using modular addition. In particular, data block 12 and encryption keys 16-1 - 16-N may each comprise strings 13, 160-1 - 160-N of length L. To encrypt the data block 12 modular addition may be performed between a character in string 13 forming the data block 12 and a corresponding character in string 160-1 - 160-N forming the encryption key 16-1 - 16-N. When multiple encryption keys 16-1 - 16-N are used to encrypt a data block 12, each encryption key 16-1 - 16-3 is used to perform the modular addition on the same data block 18 either consecutively or simultaneously.
Decryption function 508 performs an inverse of the encryption function 17. Therefore, when encryption function 17 involves using modular arithmetic to perform addition,
- 72 decryption function 21 involves using modular arithmetic to perform a subtraction. As with the addition, the subtraction is performed between each character of the string 13 forming the data block 12 and the corresponding character of the string 160-1 - 160-N forming the encryption key 16-1 - 16-N.
When strings 13, 160-1 - 160-N are binary strings the decryption function 508 can use an XOR operation as the one-time pad decryption. More specifically, decryption function 508 involves performing an XOR operation between each bit of the string 13 forming the data block 12 and the corresponding bit of the string 160-1 - 160-N forming the encryption key 16-1 - 16-N.
As discussed above with respect to Figure 5a, it is possible to perform a pre-encryption on the data block 12 before it is encrypted using encryption keys 16-1 - 16-N. When such a pre-encryption is performed, then after encrypted data block 18 has been decrypted using encryption keys 16-1 - 16-N it needs to be further decrypted using secondary decryption module 510 which performs secondary decryption algorithm 512.
In this case, secondary decryption module 510 performs secondary decryption algorithm 512 which is an inverse of pre-encryption algorithm 72. In this context the decryption algorithm 512 being an inverse of the pre-encryption algorithm 72 means that preencryption algorithm 72 takes data block 12 and returns a pre-encrypted data block 76, while secondary decryption algorithm 512 takes pre-encrypted data block 76 and returns data block 72. When pre-encryption algorithm 72 is a symmetric key encryption algorithm, then secondary decryption algorithm 512 uses the same private key 74 as was used to encrypt data block 12 in order to decrypt pre-encrypted data block 76. When preencryption algorithm 72 is a public-private key algorithm then the secondary decryption algorithm 512 uses a private key 74 that corresponds to the public key 78 used to preencrypt data block 12. Example symmetric-key algorithms and public-private key algorithms were discussed above with respect to the pre-encryption of the data block 12.
Once data block 12 has been fully decrypted, node 10 is able to access the data block 12 and make use of the contents of the data block 12 for any purpose.
When a node 30-1 - 30-N, receives a request for a data chunk 18, 16-1 - 16-N from the first node 10, it will send that data chunk 18, 16-1 - 16-N to the first node 10. In order to
-73increase the security of this transfer, the other node 30-1 - 30-N may perform an additional encryption on this data chunk 18, 16-1 - 16-N before sending it to the first node 10. This additional encryption is performed using additional encryption module 530-1 - 530-N located at the node 30-1 - 30-N. The first node 10 has a public key 78 which is available to all nodes 30-1 - 30-N, 40-1 - 40-N. The additional encryption module 530-1 - 530-N at the other node uses this key 78 to encrypt the data chunk 18, 16-1 - 16-N it is sending to the first node 10 to form additional encrypted data chunk 532-1 - 532-N. Any suitable publicprivate key encryption algorithm can be used, for example EIGamal, elliptical curve cryptography, lattice-based cryptography, McEliece cryptosystem, multivariate cryptography, Paillier cryptosystem, RLCE, RSA, and Cramer-Shoup cryptosystem or any other suitable cryptographic system.
When the first node 10, receives an additional encrypted data chunk 532-1 - 53-2N, a additional decryption module 540 obtains data chunk 18, 16-1 - 16-N by decrypting the additionally encrypted data chunk 532-1 - 432-N using a private key 74 that corresponds to the public key 78 used by additional encryption module 530.
This additional encryption and decryption makes it harder for nodes 30-1 - 30-N, 40-1 40-N or any other third party to be able to obtain the contents of the data chunks 18, 16-1 16-N being returned to the first node 10.
Once the encrypted data block 18 has been obtained by first node 10 and decrypted to obtain data block 12, the data block 12 can be verified. This verification performs checks on the data block 12 in an attempt to confirm that data block 12 has not been modified by the other node 30-1 - 30-N, 40-1 - 40-N at which it was stored. The skilled person would understand that it may not be possible to confirm with 100% accuracy that a data block 12 has not been modified. However, the techniques outlined below provide some mechanism for the user to determine if modification has occurred.
Figure 11 shows multiple data blocks 552-1 - 552-N of which data block 12 is one. As discussed earlier, when using a block chain for verification the data blocks 552-1 - 552-N are arranged in a hash tree or private block-chain 126 before they are distributed to nodes 30-1 - 30-N, 40-1 - 40-N. This private block-chain 126 is private in the sense that it is only accessible by the first node. Using a private block-chain 126 for verification is an optional step and one of several possible ways of performing verification.
-74Once data blocks 552-1 - 552-N have been received and decrypted by first node 10 they can be rearranged into the ordered list used to form the original block-chain/hash tree 126. Placing the data blocks 552-1 - 552-N into the ordered list does not necessarily involve physically moving or rearranging the data blocks 552-1 - 552-N, instead it involves establishing a relationship between the data blocks 552-1 - 552-N. As discussed previously the second and subsequent data blocks 552-2 -552-N in the ordered list were provided with a hash 128-1 - 128-N-1 of the previous data block 552-1 -552-N-1 in the ordered list.
In order to verify the data blocks 552-1 - 552-N, the hashes 550-1 - 550-N-1 are now retaken and compared to the original hashes 128a-1 - 128a-N that were stored at the data blocks 552-1 - 552-N before they were sent for storage. This can be done by an error determiner 556. It is noted that while hashes 128a-1 - 128a-N are referred to as the original hashes, if there has been any modification to data blocks 552-1 - 552-N during storage, the hashes 128a-1 - 128a-N may have also been modified. In one example, a hash 550-1 of the first data block 552-1 in the ordered list is taken and is compared to the hash 128a-1 of the first data block 552-1 stored in the second data block 552-2 of the ordered list, a hash 550-2 of the second data block 552-2 is taken and is compared to the hash 128a-2 of the second data block 552-2 stored in the third data block 552-3 etc. The hashes 550-1 - 550-N-1 can be taken using the same hash module 122 and hash function 124 used to take the original hashes 128-1 - 128-N-1.
If the new hashes 550-1 - 550-N agree with the original hashes 128a-1 - 128a-N they are compared to, then the data blocks 552-1 - 552-N are considered to be unmodified. However, if the new hashes 550-1 - 550-N disagree with the original hashes 128a-1 128a-N they are compared to, then it is considered that the data blocks 552-1 - 552-N have been modified. In such a scenario, the error determiner 556 can either cause the node requesting module 502 to request all the data blocks 552-1 - 552-N again or can try to further determine which data block 552-1 - 552-N has an error. In such a case the data requesting module 502 will request the data block from nodes that were sent the data block but that did not previously provide the data block to the first node 10.
In particular, the error determiner 556 can determine which new hash 550-1 - 550-N-1 disagrees with the hashes 128a-1 - 128a-N-1 stored in data blocks 552-2 - 552-N to
-75determine which data block 552-1 - 552-N has been modified. The error determiner 556 will find that the new hash 550-1 - 550-N-1 and the original hash 128a-1 - 128a - N-1 disagree if either the block the hash is being taken of, or the block the hash is being stored in, has been modified. The error determiner 556 can then cause the data requesting module 502 to request the two blocks that may have an error for a second time. In such a case, the data requesting module 502 will request the data blocks from nodes that were sent the data blocks but that did not previously provide the data blocks to the first node 10.
Alternatively, error determiner 556 can determine which data block 552-1 - 552-N of the two data blocks is most likely to have an error. When a disagreement between the new hash 552-1 - 552-N and the original hash 128-1 - 128-N is found, the error determiner looks at the previous and subsequent hashes in relation to the ordered list to see if there is also an error. If an error is also found for the previous hash, then the node the hash is being created from is considered to have an error. In contrast, if an error is also found for the subsequent hash, then the node the hash is being stored at is considered to have an error.
For example, if an error is found between the new hash of the second data block 552-2 and the hash 128a-2 of that block stored in the third data block 552-3, the error determiner compares the new hash of the first data block 550-1 to the hash 128a-1 of the first data block 550-1 stored in the second data block 552-2 and if an error is also found here assumes there is an error in the second data block 552-2. The error determiner 556 also compares the new hash 550-3 of the third data block 552-3 with the hash 128a-3 of the third data block stored at the fourth data block 522-4, and if there is an error here assumes the third data block 552-3 has been modified. The error determiner 556 can then cause the data requesting module 502 to only request the data block believed to have an error rather than two data blocks. Once again, the data requesting module 502 will request the data block from nodes that were sent the data block but that did not previously provide the data block to the first node 10.
While the above assumes the verification is performed on the unencrypted data blocks, it is noted that if the data blocks were originally arranged in a hash tree 126 after encryption, then the above procedure would be performed on the received data blocks before they were decrypted.
-76The above technique allows the first node 10 to provide some verification that the data blocks 552-1 - 552-N it is receiving from the nodes 30-1 - 30-N, 40-1 - 40-N have not been modified in storage. This allows it to trust the returned data more than if such a verification was not performed.
As mentioned, in addition or as an alternative, a hash of the data block 12 can be used as the block identifier 14 for the data block 12. Therefore, the data block 12 can be verified by retaking the hash and comparing it to the hash stored as the block identifier 14.
As discussed previously, in order to recreate sets 24-1 - 24-N+1, it is necessary to know the data chunk identifiers 19, 11-1 -11-N.
In the simplest example these data chunk identifiers 19, 11-1 -11-N are consecutive so the first node 10 simply has to work out how many data chunks it distributed to ensure it knows all the identifiers. This could be done by storing that information securely at a third party, storing the information on a public register or by counting the number of entries it made on the public register 252. Alternatively, the first node 10 could simply increment the data chunk identifier 19, 11-1 - 11—N and use the above method to request a data chunk with that identifier 19, 11-1 - 11-N. When a data chunk 18, 16-1 - 16-N is not returned with respect to a identifier 19, 11-1 - 11-N or several data chunks 18, 16-1 - 16-N are not returned with respect to adjacent identifiers 19, 11-1 - 11-N the first node 10 can assume it has obtained all its data.
In another example, the data chunk identifiers 19, 11-1 - 11-N can be determined from the identifier 14 of the data block 12 the chunks form. This was discussed earlier with respect to Figure 4. In summary, the data chunks 18, 16-1 - 16-N that form part of a data block 12 i.e. the encrypted data block 18 and the keys 16-1 - 16-N used to encrypted the data block 12 are each assigned a unique reference 60-1 - 60-N. Preferably these references are consecutive. Assigning each data chunk 18, 16-1 - 16-N can be considered to be placing the data chunks 18, 16-1 - 16-N in an ordered list. These references are then used as an input to a identifier module 64 which also takes as an input a identifier 14 of the data block
12. For each data chunk 18, 16-1 - 16-N a identifier calculation 66 at the identifier module 64 combines the data chunk references 60-1 - 60-N and the data block identifier 14 for example by appending one to other. In some examples this is then used as the data chunk 18, 16-1 - 16-N identifier 19, 11-1 - 11-N. In other examples the results of the identifier
-77calculation 66 are sent to a hashing function 68 and the results of this hashing function 68 are used as the data chunk identifiers 19, 11-1 - 11-N. Either way, in the above, in order to be able to calculate the data chunk identifiers 19, 11-1 - 11-N, and hence recreate sets 241 - 24-N+1 it is important to know the data block identifier 14.
In one example, where there are multiple data blocks 12, the data block identifiers 14 are merely consecutive numbers. This allows the first node 10 to obtain these numbers without any further work. However, as discussed above, the data block identifiers 14 can also be less predictable. In such a case, the first node 10 has to obtain these identifiers before it can recreate sets 24-1 - 24-N+1.
As mentioned previously, when data block 10 sends data chunks 18, 16-1 - 16-N for storage, it can receive acknowledgements 214-1 - 214-N, 216-1,216-N which it publishes as a transaction 254-1 - 254-N on a public register 252. This transaction 254-1 - 254-N can also include an identifier 35 associated with the node 10 and the data block identifier
14.
Returning now to Figure 10, when a node 10 wishes to obtain the data block identifiers 14, a block identifier obtaining module 562 can search the register 252 for transactions 254-1 254-N containing the first block identifier 14. The block identifier obtaining module 562 can then extract from these transactions 254-1 - 254-N the data block identifiers 14. These can then be used as an input to identifier module 64 to recalculate the data chunk identifiers 19, 11-1 - 11-N. In one example, the data block identifiers 14 are a hash of the data blocks 12. As such, the data block identifiers 14 can be compared with a new hash of the returned data block to confirm the data block has not been modified. In an alternative example, the data block identifiers 14 are not a hash of the data blocks 12 and a hash of the data blocks is published in the transaction 254-1 - 254-N along with the data block identifiers 14. In such an example, the returned data block can be verified by taking a hash of the returned data block and comparing it to this stored hash. All the hashes can be taken by the hash module 122 previously mentioned.
For completeness, it is noted that if data chunk identifiers 19, 11-1 - 11-N were stored on the public register 252 in place of or as well as the data block identifier 14 then the above procedure can be used to directly obtain the data chunk identifiers 19, 11-1 - 11-N and no further calculations would be needed to obtain these.
-78The above procedure allows the first node 10, to obtain the data chunk identifiers 19, 11-1 - 11-N even when these are not predictable and the first node 10 has lost all its data. This ensures that the first node 10 is always able to get its data back.
A method of sending one of multiple data blocks 12 from a first node 10 to other nodes 301 - 30-N, 40-1 - 40-N for storage is now described with respect to Figure 12. This method is a summary and can be used with any of the details described above. As such, Figure 12 represents a method of implementing the previously described features.
The method starts with step 1000 in which first node 10 can pre-encrypt the data block 12 to form pre-encrypted data block 76. This stage is voluntary and the skilled person would understand that the method can proceed without pre-encryption. Pre-encryption can be performed using encryption module 64 and can be a symmetric key encryption algorithm using private key 74 of the first node 10 or a public-private key encryption algorithm using public key 78 of the first node 10. When a public-private key encryption algorithm is used private key 74 corresponds to public key 78 and is suitable for decrypting the pre-encrypted data block 76.
At stage 1002 the data chunks are processed to allow subsequent verification. Once again this is a voluntary step. Processing the data chunks for verification can be performed using any of the methods described above. However, in one example this involves having multiple data blocks 120-1 -120-N and arranging these in a hash tree/private block chain where a hash of each data block 120-1 - 120-N-1 is stored in the subsequent data block 120-2 - 120-N in the chain. In one example a hash of the data block is used as a block identifier 14 which is saved and later accessible by the first node 10. This pre-processing can be done by hash module 122.
After the above stages, the method proceeds to encryption stage 1004. If no verification and pre-encryption is used, the method starts at encryption stage 1004. In this stage, data block 12 or pre-encrypted data block 76 (dependent on whether there is a pre-encryption stage) is encrypted using encryption key 16-1 or encryption keys 16-1 - 16-N to form encrypted data block 18. This can be done at encryption module 15. Preferably encryption key 16-1 or encryption keys 16-1 - 16-N are random strings 160-1 - 160-N and data block 12 or encrypted data block 76 is also a string 13. In such a case, encrypting data block 12
-79or pre-encrypted data block 76 involves performing a one-time pad encryption algorithm between the encryption key 16-1 - 16-N or encryption key 16-1 and the (pre-encrypted) data block 12/76. The encrypted data block 18 and the encryption keys 16-1 - 16-N used to encrypt the data block 12 can be considered to be data chunks.
In stage 1006, each data chunk 18, 16-1 - 16-N is provided with a data chunk identifier 111 - 11-N. In one example the data chunks 18, 16-1 - 16-N are each provided with a unique reference 60-1 - 60-N where the reference is unique for that data block 12 (i.e. data chunks from another data block may have the same identifier as a data chunk 18, 16-1 16-N in this data block 12). In this example, the data block 12 has a identifier 14 which can be a hash of the data block 12. A identifier calculation 66 at a identifier module 64 uses the unique reference 60-1 - 60-N of the data chunk and the data block identifier 14 to determine the data chunk identifiers 18, 16-1 - 16-N. These will be used later.
In stage 1008, a set of nodes 24-1 -24-N+1 is generated for each data chunk 18, 16-1 16-N where the set of nodes 24-1 - 24-N+1 reflect the nodes 30-1 - 30-N, 40-1 - 40-N to which the data chunk 18, 16-1 - 16-N is to be sent. Set of nodes 24-1 - 24-N+1 can be generated at a node determining module 20 using a distribution key 22 of the first node 10 and the identifier 19, 11-1 -11-N of the corresponding data chunk. The set of nodes 24-2 - 24-N+1 for the second and subsequent data chunk 16-1 - 16-N can be edited to ensure they don’t contain any of the same nodes as a previous set 24-1 - 24-N. All the set of nodes 24-1 - 24-N can be edited to contain only T entries. T represents the number of nodes a data chunk 18, 16-1 - 16-N has to be sent to in order to have a specific probability of being returned (a) at the time requested and (b) at a later time. The specific probability is determined based on the importance of the data and can be chosen by a node administrator. A technique for calculating T is described above.
Once set of nodes 24-1 - 24-N+1 has been chosen, a voluntary post-encryption 1010 can be performed on each data chunk 18, 16-1 - 16-N. Post-encryption can be performed by post-encryption module 80. Post-encryption can involve obtaining the public key 85-1 - 85N of the nodes 30-1 - 30-N in set of nodes 24-1 - 24-N+1 corresponding to the data chunk 18, 16-1 - 16-N. The data chunk 18, 16-1 - 16-N is then copied so that there is a separate copy for each node 30-1 - 30-N from the set of nodes 24-1 - 24-N. Each copy of the data chunk 18, 16-1 - 16-N can then be encrypted with the public key 85-1 - 85-N of the node
-80to which it will be sent. This results in post-encrypted data chunks 90-1 - 90-N. The encryption could be performed using any suitable public-private key encryption algorithm.
At stage 1012, storage requests 312-1 - 312-N can be generated. Again this stage is voluntary. A separate storage request 312-1 - 312-N is generated for each node 30-1 30-N, 40-1 - 40-N that will store a data chunk 18, 16-1 - 16-N i.e. each node in sets 24-1 24-N+1. The storage requests can be generated by storage request module 302. The storage requests 312-1 - 312-N comprise a token 322-1 - 322-N which is unique and can be random or pseudo-random and data that allows verification the request 312-1 - 312-N comes from the first node. This verification can take the form of the public key 78 of the first node 10 or a signature from the first node 10 or both. The storage request 312-1 312-N can be encrypted using the public key 85-1 - 85-N of the node to which it is to be sent.
Finally, at stage 1014 each data chunks 18, 16-1 - 16-N or, when post-encryption is performed, post encrypted data chunk 90-1 - 90-N is sent to the nodes in the corresponding set 24-1 - 24-N+1 along with, where appropriate, a storage request 312-1 312-N. When post-encryption has been performed, the post-encrypted data chunk 90-1 90-N is sent to the node corresponding to the public key 85-1 - 85-N used to perform the post-encryption. Similarly if the storage requests 312-1 - 312-N are encrypted using a public key 85-1 - 85-N of one of the nodes, they are sent to the node that corresponds to that public key 85-1 - 85-N. This distribution can be done using a distribution module 28.
Figure 13 shows a method that can be used with the above method described in Figure 12 and the details in Figures 1 to 11. In particular, this method enables an other node 30-1 30-N, 40-1 - 40-N to decide when to store a data chunk 18, 16-1 - 16-N received from a first node 10.
In block 2001 an other node 30-1 - 30-N, 40-1 - 40-N receives a data chunk 18, 16-1 -16N from the first node 10. The data chunk 18, 16-1 - 16-N may be received by a data receiving module 334-1 - 334-3. The other node 30-1 - 30-N, 40-1 - 40-N may also receive a storage request 312-1 - 312-N along with the data chunk 18, 16-1 - 16-N.
At block 2002 the other node 30-1 - 30-N, 40-1 -40-N decrypts the storage request 312-1 312-N to access the token 322-1 - 322-N and verify the data chunk 18, 16-1 - 16-N came
- 81 from the first node 10. This can be performed by an unencryption module 336-1 - 336-3. When the data chunk 18, 16-1 - 16-N has been post-encrypted, the unencryption module 336-1 - 336-3 can also decrypt the data chunk 18, 16-1 - 16-N.
In block 2004, the token 322-1 - 322-N from the decrypted storage request 312-1 - 312-N is returned to the first node 10. In one example, the token 322-1 - 322-N is returned to the first node 10 along with an additional string 346-1 - 346-N that can be randomly generated by the other node 30-1 - 30-N, 40-1 - 40-N and, before being sent, is only known by the other node 30-1 - 30-N, 40-1 - 40-N. The token 322-1 - 322-N and, where applicable, the additional string 346-1 - 346-N can be sent by a token sending module 340-1 - 340-3. In one example, these are encrypted before being sent to the first node 10 using a public key 78 of the first node 10. The token 322-1 - 322-N and, where applicable, the additional string 346-1 - 346-N can be considered to be an acknowledgement of receipt 214-1 - 214N.
At block 2004, the first node 10 receives the acknowledgement of receipt 214-1 - 214-N from the other node 30-1 - 30-N, 40-1 - 40-N and where necessary decrypts this. At this stage the first node 10 may also receive acknowledgments of receipt 214-1 - 214-N from several other nodes 30-1 - 30-N, 40-1 - 40-N. These acknowledgements of receipt 214-1 - 214-N can be received by an acknowledgment receiving module 204.
In block 2005, the first node 10 can publish the acknowledgments of receipt 214-1 - 214-N in a transaction 254-1 - 254-N on a public register 252 such as a public block-chain. Publishing the acknowledgments of receipt 214-1 - 214-N comprises publishing the tokens 322-1 - 322-N and/or the additional strings 346-1 - 346-N received from the other nodes 30-1 - 30-N, 40-1 - 40-N in the transaction 254-1 - 254-N. The first node 10 may publish an identifier 35 of the first node 10 and a block identifier 14 along with the acknowledgements 214-1 - 214-N. The acknowledgments of receipt 214-1 - 214-N may be split so that only the acknowledgments 214-1 - 214-N relating to a single data chunk 18, 16-1 - 16-N are published in each transaction. In one example, the first node 10 sends each data chunk 18, 16-1 - 16-N to more nodes than the number of nodes, M, it wants to store the data chunk 18, 16-1 - 16-N. If the first node 10 receives more than M acknowledgments 214-1 - 214-N fora data chunk 18, 16-1 - 16-N, it publishes only M acknowledgments 214-1 - 214-N forthat data chunk 18, 16-1 - 16-N and discards the rest.
- 82 The acknowledgments of receipt 214-1 - 214-N can be published by an acknowledgment processing module 206.
At block 2006 the other node 30-1 - 30-N, 40-1 - 40-N can check the public register 252 to see if the acknowledgement of receipt 214-1 - 214-N it sent to the first node 10 has been published. In particular, the other node 30-1 - 30-N, 40-1 - 40-N can monitor the transactions 254-1 - 254-N on the public register 252 and determine whether they contain the token 322-1 - 322-N or additional string 346-1 - 34-N that the other node 30-1 - 30-N, 40-1 - 40-N sent to the first node 10. This checking of the public register 252 can be performed by a register monitoring module 342.
In block 2007, the other node 30-1 - 30-N, 40-1 - 40-N either stores or deletes the data chunk 18, 16-1 - 16-N it was sent. In particular, if at block 2006 the other node 30-1 - 30N, 40-1 - 40-N finds the acknowledgment 214-1 - 214-N it sent to the first node 10 on the public register 252 then it stores the data chunk 30-1 - 30-N, 40-1 - 40-N. On the other hand, if at block 2006 the other node 30-1 - 30-N, 40-1 - 40-N does not find the acknowledgment 214-1 - 214-N it sent to the first node 10 on the public register 252 it deletes the stored data chunk 30-1 - 30-N, 40-1 - 40-N. This step can be performed by the register monitoring module 342.
Having a first node 10 publish the acknowledgments 214-1 - 214-N on the public register 252 and only storing data for which an acknowledgment 214-1 - 214-N was published allows the amount of data any node 10, 30-1 - 30-N, 40-1 - 40-N is storing to be monitored. In particular, a registrar 260 or any other node 30-1 - 30-N, 40-1 - 40-N can monitor the public register 252 and if any node publishes transactions 254-1 - 254-N containing too many acknowledgments 214-1 - 214-N (i.e. more than M acknowledgments 214-1 - 214-N per data chunk 18, 16-1 - 16-N) or too many transactions 254-1 - 254-N, action can be taken to reject the transaction and prevent publication of future transactions.
The registrar 260 and public register 252 also has other uses. For example, the registrar 260 can monitor heartbeat signals 202-1 -202-3, 202-10 from the nodes 10, 30-1 - 30-N, 40-1 - 40-N. If it does not receive a heartbeat signal 202-1 - 202-3, 202-10 for a length of time that suggests a node is permanently offline, it can post a transaction 254-1 - 254-N on the register 252 indicating this. Similarly, if a node 10, 30-1 - 30-N, 40-1 - 40-N loses its
-83data, it can inform the registrar 260 who will then publish a transaction 254-1 - 254-N on the public register 252 indicating this.
All nodes in the system 10, 30-1 - 30-N, 40-1 - 40-N can monitor the public register 252. If a node 10, 30-1 - 30-N, 40-1 - 40-N monitoring the register 252 finds that more than T copies of a data chunk 18, 16-1 - 16-N were stored at nodes 10, 30-1 - 30-N, 40-1 - 40-N that have either lost their data or are permanently offline, then that node can redistribute that data chunk 18, 16-1 - 16-N and potentially reform and redistribute all the data chunks 18, 16-1 - 16-N that form the data block 12 the data chunk 18, 16-1 - 16-N was part of.
Figure 14 shows a method for a first node 10 can reobtain its data. This may be particularly beneficial when the first node 10 has lost its data. However, first node 10 may wish to reobtain data for any other reason. This Figure can be used with the examples descried above for reobtaining data at a first node 10.
The method starts at block 3001 where the first node 10 obtains its distribution key 22 and where necessary its private key 74. These keys may be obtained from a secure third party server or the first node 10 may obtain them from storage at the node that is separate from the storage which lost its data. The distribution key 22 and the private key 74 may be connected. For example, the distribution key 22 may be derivable from the private key 74. Alternatively, both the private key 74 and the distribution key 22 may be derivable from another key the first node 10 has obtained.
In box 3002, the first node 10 obtains the data chunk identifiers 19, 11-1 - 11-N. In one example, the data chunk identifiers 19, 11-1 - 11-N are obtained from the data block identifier 14 using a identifier module 64 as described previously. In this example, the data block identifiers 14 can be obtained from the public register 252. In another example the data chunk identifiers 19, 11-1 - 11-N themselves can be obtained from the public register 252. In order to obtain the data block identifiers 14 or data chunk identifiers 19, 11-1 -11-N from the public register 252, the first node 10 searches the register 252 for transactions 254-1 - 254-N containing its node identifier 35 and extracts the data block identifiers 14 or data chunk identifiers 19, 11-1 - 11-N from these transactions 254-1 - 254-N. This can be done by a block identifier obtaining module 562.
-84In block 3003, the first node 10, recalculates sets 24-1 - 24-N of nodes 30-1 - 30-N, 40-1 40-N to which each data chunk 18, 16-1 - 16-N was sent. The sets 24-1 - 24-N are determined in the same manner described above with respect to storing data in order to ensure they are identical to the sets 24-1 - 24-N calculated above. More specifically, each set of nodes 24-1 - 24-N is calculated using the obtained distribution key 22 and the identifier for the corresponding data chunk 18, 16-1 - 16-N. In examples where the sets of nodes 24-1 - 24-N were edited to remove overlapping nodes and to contain only T entries, such editing is also done here. The sets 24-1 - 24-N can be generated using a node determining module 20.
In block 3004, the first node 10 can request the previously distributed data chunks 18, 16-1
- 16-N from the nodes in the sets 24-1 - 24-N that were recreated in block 3004. This requesting can be performed by data requesting function 504. In one example, a request for the desired data chunk 18, 16-1 - 16-N is sent to each node 30-1 - 30-N, 40-1 - 40-N from the set being processed 24-1 - 24-N sequentially. When one node 30-1 - 30-N, 40-1
- 40-N returns the data chunk 18, 16-1 - 16-N, no subsequent requests are sent for the data chunk 18, 16-1 - 16-N. The node 30-1 - 30-N, 40-1 - 40-N to which the first node 10 is sending a request may verify the identity of the first node 10 before any data is sent.
Block 3005 is a voluntary stage. When returning the requested data chunk 18, 16-1 - 16N, the other node 30-1 - 30-N, 40-1 - 40-N that stored that data chunk may encrypt the data chunk 18, 16-1 - 16-N using a public key 78 of the first node 10. If the other node 301 - 30-N, 40-1 - 40-N has encrypted the data chunk 18, 16-1 - 16-N in this fashion the first node 10 decrypts the data chunk 18, 16-1 - 16-N. This may be done using additional decryption module 540.
As mentioned above, data chunks 18, 16-1 - 16-N comprise the encrypted data block 18 and encryption keys 16-1 - 16-N. At box 3006, the first node 10 decrypts the encrypted data block 18 using the encryption keys 16-1 - 16-N. This can be done at decryption module 506. The decryption function 508 used to decrypt the encrypted data block 18 depends upon the encryption function 17 used to encrypt the data block 12. As such, when one-time pad encryption is used, a corresponding one-time pad decryption is used to decrypt encrypted data block 18.
-85Box 3007 is a voluntary stage. At this stage, the returned data block 12 can be verified. In one example the pre-processing to allow for verification, described in step 1002 involved arranging having multiple data blocks 120-1 - 120-N and arranging them in a hash tree 126. In such a case, the returned data blocks 552-1 - 552-N are rearranged in a hash tree 558 at this stage. The hashes 550-1 - 550-N-1 are retaken and compared to original hashes 128a-1 - 128a-N-1 stored in the data blocks 552-1 - 552-N. If the hashes agree, it is assumed the data blocks 552-1 - 552-N are unmodified. However, if they disagree it is assumed at least one data block 552-1 - 552-N has been modified and further processing is done to establish which data block 552-1 - 552-N was modified. This process can be performed by an error determining 556. In an example which can be used as an alternative or an addition to the above verification, a hash of the data block 12 may have been used as the block identifier 14. In such a case a hash can be taken of the returned data block 12 and compared to the block identifier 14 to confirm the data block 12 has not been modified.
Block 3008 is also a voluntary stage. If the data block 12 was pre-encrypted to form preencrypted data block 76, then a corresponding decryption algorithm is now performed to obtain data block 12. In one example, this involves using a private key 74 of the first node 10 to decrypt the pre-encrypted data block 76. This stage can be carried out by a secondary decryption module 510.
Now the first node 10 has verified all the data blocks 552-1 - 552-N it can recreate information 110 that was originally sent for storage. The first node 10 can then use this information 110 in any desired way.
The above provides a general overview of the invention. A more detailed description of a particular embodiment of the invention follows below. The following paragraphs are high level explanation of how the different system parts work to assure the required qualities namely, assurance that the stored data's integrity, confidentiality and availability cannot be compromised. In the below paragraphs, it is assumed that a centralised secure registry system (a rendezvous server) stores the address of the nodes. It is also assumed that nodes are publicly addressable either via public addresses or some NAT traversal mechanism.
In an example method that can be used with the features described above:
1. A node 10 generates private 74, public 78 key pairs.
2. A node will register its address and public key 78 with the Rendezvous server.
3. Data Block Encryption key and Distribution key 22 can be deterministically derived from the private key using some secure key derivation technique. The public key 78 associated with the private key is also node's proof of identity.
4. When a node 10 receives some data (the data could have been created by the node or received from any other source), it divides them up into pre-defined size data blocks 12. These data blocks 12 are then encrypted and stored in a private block-chain 126 that in its original form is only stored on the users device (but it can be regenerated securely using a method that will be discussed later).
5. Data blocks 12 will be turned into data chunks 18, 16-1 - 16-N and each data chunk 18, 16-1 - 16-N will be distributed to a number of other nodes 30-1 - 30-N, 40-1 - 40-N according to a redundancy factor rf.
6. If a node 10 loses its data or it would like to restore the data on another device that belongs to the node it can revive them using the data in the public block-chain 252 and the chunks 18, 16-1 - 16-N that have been stored on other nodes.
As mentioned above, the content of the encrypted data block needs to be distributed for availability without compromising integrity or confidentiality. As such, data chunks are created using the example method below.
1. Q random strings (S) of the same size as the block will be created using the following formula: B ® Sx ® S2 ® S3... Sg_t = Sq. Each Sn will be stored in a data structure called a chunk 16, 18-1 - 18-N. Chunk has an ID 19, 11-1 - 11-N, payload which is Sn and a message authentication code (MAC) which is a signed hash of the payload.
2. Chunk IDs 19, 11-1 - 11-N are created deterministically from Block IDs 14. ChunkID = Hash(BlocklD+ Sequence). The number of chunks that are created are predetermined so we always know that a data block creates q chunks. A chunk has the following structure:
-87{
Chunk: {
Chunk ID:,
Owner Public Key :,
Related Block ID:,
Proof_of_Storage_request: { token:, owner_public_key :
},
MAC:
} }
Proof-of-storage-request token 322-1 - 332-N is a randomly generated string. How it will be used will be discussed later. A checksum of the chunk gets created at this stage for each created chunk. This checksum will be used later for chunk verification.
Chunks 18, 16-1 - 16-N will be distributed to rf number of nodes 30-1 - 30-N, 40-1 - 40-N. While this has been discussed above, the following paragraphs provides additional details on how this rf will be calculated and how the list of nodes that each chunk will be distributed to will be identified.
Redundancy factor (rf) can be calculated based on acceptable resilience factors related to availability and based on the following two important probabilities at the time of data revival, rf can be calculated using the following concepts:
• Permanent _UnavailabilitynOde: Probability of a node being permanently un-available in regards to a specific data chunk if it is either malicious or has had a data loss incident and has lost the data chunk [Estimated probability].
Temporary_UnavailabilitynOde: Probability of a node being temporarily un-available if it is offline at time of a chunk data request [Estimated probability based on historical data].
• Permanent_UnavailabilityChunk: Probability of all nodes that stored the chunk are permanently unavailable. This can be calculated using the equation previously provided.
• Temporary _UnavailabilityChunk: Probability of all nodes that stored the chunk are unavailable but at least one node is temporary unavailable. This can be calculated using the equation previously provided.
The following probabilities can be calculated for blocks:
• Permanent_Unavailabilitybiock: Probability of all nodes that stored the chunk are permanently unavailable. This can be calculated using the equation previously provided.
• Temporary_Unavailabilitybiock: Probability of all nodes that stored the chunk are unavailable but at least one node is temporary unavailable.
Following the same pattern, we can calculate the probability of a temporary and permanent data loss using the probabilities of block loss and now using the acceptable resilience factors we can calculate the rf so that the calculated properties are in the acceptable range. The pseudo code for the above probabilities calculations can be found in Appendix One.
The method also involves identifying the distribution list of nodes. Depending on the redundancy factor, rf receiving nodes will be selected using a method that is deterministic from the sender's perspective and non-deterministic from the receiver's perspective (so the nodes that receive the chunks can't find out who else has received either the same chunk or other chunks related to the same data block. In one example, this method comprises:
(a) Calculate the number of nodes that need to be contacted to assure that with a predefined probability at least rf of them are online taking the probability of a node being offline into account - We call this number T.
(b) Choose T unique nodelDs output from a Cryptographically Secure Pseudo Random Generator (SPRNG) with the seed: Seed(Hash(distriubtionKey;ChunklD)).
(c) Get the address of the nodes from the Rendezvous server.
(d) For each chunk and each node that the chunk needs to be distributed to replace the Proof-Of-Storage-Request (POSR) content with the following construct:
POSR — EncryptReceivingNodePubiicKey (SignprjVateKey(token, PublicKey)) (e) Send the chunk to all T nodes in the distribution list.
(f) Nodes have received the chunk will send back an ack 214-1 - 214-N, 216-1,216-N that includes the token 322-1 - 322-N and another random string.
(g) Choose rf received random strings and add them to the transaction that will be published on the public ledge 252, the node also stores the ack 214-1 - 214-N, 216-1, 216-N for those rf nodes to be used as proof of previous storage if the receiving node loses all its data (this will be discussed later).
(h) Distribute the transaction.
Nodes on the receiving end cannot run any offline attack as what they are receiving are just random bits (Unless they can get hold of all S strings) but the sender can retrieve all strings and revive the data block if it required in case of a data loss.
Calculating T can be beneficial as the list of nodes which store the chunk cannot be stored anywhere as in data revival time the node must be able to re-generate the list of nodes that with a high probability have stored the chunk using the SPRNG with appropriate Seed and recalculating T. Equations for calculating T have been provided previously.
An example of how to calculate T: Let's assume we have 40000 nodes and the probability of a node being offline is 30% and our redundancy factor rf = 6 using the above calculations with around 95% certainty we can be sure that if we read the first thresholdxrf = 4x6 = 24 unique outputs of a SPRNG with uniform distribution we will be able to find at least rf = 6 online nodes. Therefore on revival we know that we need to request the chunks from only these 24 nodes (output from the same SPRNG and the same seed). If we raise the threshold to 5 our certainty rises to around 98.5% and if we raise the threshold to 6 the
-90certainty rises to around 99.5%. It means out of 40000 possible nodes, if we only query the first 6x6 = 36 nodes that are being returned from the SPRNG there is 99.5% probability that at least rf = 6 of these nodes are online. For every chunk that needs to be distributed in this scenario we need to at most generate the 36 node IDs using the SPRNG and in block revival we are sure that we only need to query at most 36 nodes for every chunk.
The pseudo-code in Appendix one illustrates how this threshold can be calculated using a recursive function.
On the receiving end when a node receives a chunk, it provisionally accepts it:
• decrypts the Proof-Of-Storage-Request and extracts the token.
• generates an ack token .
• returns the ack token alongside the Proof-Of-Storage-Request token to the requester node (ack is signed by the sender) {
chunk id :, proof_of_storage_request:, ack_token :, signature :
}
The example method can also comprise publishing a transaction on the public ledger corresponding to created block and distributed chunks. When a node receives more than rf acks from receiving nodes, it chooses rf ack tokens and publishes a transaction on the public block-chain. The transaction will have the following form:
- 91 ( ''< hoi„pubh„kv'. ' ' .
' s-Ί si .sdhss t<.-r <>': : ..
B.h'.rt'k.JEQ·’ ;”'γ <
M:UodJDK ,KK ., 'Lua ί ^_Γ'Ι<« kjn .'· .
” Cfornks’’ : ( {
' huuk..id ' , vd_As k_ l\»kvu J : i t ! s i g&M A' ; ' ’ J y
T
When the corresponding transaction is published in the block-chain the nodes that had provisionally accepted a chunk verify the following:
• If its ack token is included in the published transaction the node checks if the chunk's checksum corresponds to the calculated checksum using the decrypted token and if the verification succeeds it stores the chunk permanently.
• If the ack token is not included in the transaction or the verification fails, the nodes that their ack token has not been added to the block are free to delete the chunk.
The above mechanism is required so that malicious nodes cannot flood the network storage using fake chunks or replicate their chunks more than rf. As we don't want one node to take over the whole storage, a transaction won't be added to a block if total size of the created blocks is greater than a predefined threshold. Total size can be calculated as TotalSize = Transactionsequence χ BlockSize, calculating and updating the threshold is can be done in numerous ways and is domain specific.
- 92 The following paragraphs explain how the data can be revived (assuming that the node has access to its private key): First the node needs to revive its own data:
1. The node gets the public block-chain (from a number of peers).
2. The node extracts its blocks IDs from the transactions inside the public blockchainusing its public key as the identifier. It then verifies the integrity of the transactions (using the MAC and its own private key).
3. Create chunkIDs for each block ID (ChunkID = Hash(BlocklD + Sequence)).
4. Calculate T using the method mentioned above for each block (if total number of nodes has changed for that block).
5. Derive distribution key 22 from the private key 74.
6. For each Chunk ID generate T node IDs using the SPRNG with the correct seed: (Seed(Hash(distriubtionKey;Chunkid)))
7. Get the node address for each node ID one by one and contact them until one node returns a chunk that passes the integrity check.
8. Construct Blocks from Chunks, the private block-chain from the Blocks and all data from the private block-chain.
When a node has revived its own blocks, it generates a transaction that announces that it had lost its data. It then publishes the transaction on the general ledger. When other nodes, which have previously stored chunks on the node with data loss event, see the transaction, they can create a new transaction for the blocks that their (redundant) chunks have been lost and repeat the chunk distribution process as discussed above.
It is beneficial to have permanently offline node detection and dynamic chunk redistribution. Every time a node goes online and for preconfigured intervals (e.g. 24 hours), it sends a heartbeat message to the registrar. An example of the heartbeat message can be seen below:
-93{ node id :, timestamp:,
IPaddress :, signature :
}
When the registrar receives the heartbeat it updates its records. If a node has not sent a heartbeat message for a preconfigured amount of time (e.g. 10 days) registrar will announce that node as possibly dead via the public block-chain. Nodes which have their chunks stored on more than a threshold T permanently offline nodes can redistribute their chunks by going through chunk distribution process.
The example method above can also comprise node storage optimisation. As nodes can redistribute chunks related to one block depending on many different factors, nodes can optimise their storage (in configurable intervals). Storage optimisation involves getting rid of provisionally accepted or permanently accepted chunks that their related ack token has not been published in the latest published transaction for the block that comprises that specific chunk, or for chunks whose owners have been identified permanently offlline (dead) by the registrar. Nodes can go through the chunks they have stored and check them against the latest published blocks (events) on the public block-chain that contains the relevant block-id and delete the chunk if their ack token is no longer associated with that block or the chunk owner has been identified as dead.
While the above describes one way of implementing the invention, the skilled person would understand that other variations can be envisaged. For example, instead of using one-time pad cryptography, any other form of symmetric-key or public-private key cryptography could be used. In some examples a mixture of cryptographic methods may be used and only certain keys distributed to other nodes for storage. In other examples, encryption keys may be generated but not used for cryptography.
While in many examples, the data returned from other nodes is verified, the skilled person would understand that such a step is not essential and can be skipped in some embodiments. Similarly, in other embodiments any pre or post cryptography can be
-94skipped. The other nodes may identify the data stored in any suitable way. As such, the sending of the data block or data chunk identifiers can be considered voluntary. Similarly, alternative methods of verification can be used so the first node does not always need to distribute its public key along with data chunks or even publish its public key on the public ledger/register.
While some embodiments use a public leger/register to store data about transactions, the skilled person would understand that the disclosure above provides details that allow some embodiments to be implemented without this public ledger. In such cases, any identifiers could be stored at a secure server or otherwise be predictable.
Furthermore, other embodiments can be envisaged that do not depart from the scope of the claims.
Figure 15 schematically illustrates an example of a computer system 5100 that can be used in accordance with the above invention. The system 5100 comprises a computer 5102. The computer 5102 could be used as the first node 10 or any other node 30-1 - 30N, 40-1 - 40-N, or the registrar 260 of the above invention. The computer 5102 comprises: a storage medium 5104, a memory 5106, a processor 5108, an interface 5110, a user output interface 5112, a user input interface 5114 and a network interface 5116, which are all linked together over one or more communication buses 5118.
The storage medium 5104 may be any form of non-volatile data storage device such as one or more of a hard disk drive, a magnetic disc, an optical disc, a ROM, etc. The storage medium 5104 may store an operating system for the processor 5108 to execute in order for the computer 5102 to function. The storage medium 5104 may also store one or more computer programs (or software or instructions or code).
The memory 5106 may be any random access memory (storage unit or volatile storage medium) suitable for storing data and/or computer programs (or software or instructions or code).
The processor 5108 may be any data processing unit suitable for executing one or more computer programs (such as those stored on the storage medium 5104 and/or in the memory 5106), some of which may be computer programs according to embodiments of
-95the invention or computer programs that, when executed by the processor 5108, cause the processor 5108 to carry out a method according to an embodiment of the invention and configure the system 5100 to be a system according to an embodiment of the invention. The processor 5108 may comprise a single data processing unit or multiple data processing units operating in parallel or in cooperation with each other. The processor 5108, in carrying out data processing operations for embodiments of the invention, may store data to and/or read data from the storage medium 5104 and/or the memory 5106. The interface 5110 may be any unit for providing an interface to a device 5122 external to, or removable from, the computer 5102. The device 5122 may be a data storage device, for example, one or more of an optical disc, a magnetic disc, a solid-state-storage device, etc. The device 5122 may have processing capabilities - for example, the device may be a smart card. The interface 5110 may therefore access data from, or provide data to, or interface with, the device 5122 in accordance with one or more commands that it receives from the processor 5108.
The user input interface 5114 is arranged to receive input from a user, or operator, of the system 5100. The user may provide this input via one or more input devices of the system 5100, such as a mouse (or other pointing device) 5126 and/or a keyboard 5124, that are connected to, or in communication with, the user input interface 5114. However, it will be appreciated that the user may provide input to the computer 5102 via one or more additional or alternative input devices (such as a touch screen). The computer 5102 may store the input received from the input devices via the user input interface 5114 in the memory 5106 for the processor 5108 to subsequently access and process, or may pass it straight to the processor 5108, so that the processor 5108 can respond to the user input accordingly.
The user output interface 5112 is arranged to provide a graphical/visual and/or audio output to a user, or operator, of the system 5100. As such, the processor 5108 may be arranged to instruct the user output interface 5112 to form an image/video signal representing a desired graphical output, and to provide this signal to a monitor (or screen or display unit) 5120 of the system 5100 that is connected to the user output interface 5112. Additionally or alternatively, the processor 5108 may be arranged to instruct the user output interface 5112 to form an audio signal representing a desired audio output, and to provide this signal to one or more speakers 5121 of the system 5100 that is connected to the user output interface 5112.
-96Finally, the network interface 5116 provides functionality for the computer 5102 to download data from and/or upload data to one or more data communication networks. It will be appreciated that the architecture of the system 5100 illustrated in Figure 15 and described above is merely exemplary and that other computer systems 5100 with different architectures (for example with fewer components than shown in Figure 15 or with additional and/or alternative components than shown in Figure 15) may be used in embodiments of the invention. As examples, the computer system 5100 could comprise one or more of: a personal computer; a server computer; a mobile telephone; a tablet; a laptop; a television set; a set top box; a games console; other mobile devices or consumer electronics devices; etc.
It will be appreciated that the methods described have been shown as individual steps carried out in a specific order. However, the skilled person will appreciate that these steps may be combined or carried out in a different order whilst still achieving the desired result. It will be appreciated that embodiments of the invention may be implemented using a variety of different information processing systems. In particular, although the Figures and the discussion thereof provide an exemplary computing system and methods, these are presented merely to provide a useful reference in discussing various aspects of the invention. Embodiments of the invention may be carried out on any suitable data processing device, such as a personal computer, laptop, personal digital assistant, mobile telephone, set top box, television, server computer, etc. Of course, the description of the systems and methods has been simplified for purposes of discussion, and they are just one of many different types of system and method that may be used for embodiments of the invention. It will be appreciated that the boundaries between logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or elements, or may impose an alternate decomposition of functionality upon various logic blocks or elements.
It will be appreciated that the above-mentioned functionality may be implemented as one or more corresponding modules as hardware and/or software. For example, the abovementioned functionality may be implemented as one or more software components for execution by a processor of the system. Alternatively, the above-mentioned functionality may be implemented as hardware, such as on one or more field-programmable-gate-arrays (FPGAs), and/or one or more application-specific-integrated-circuits (ASICs), and/or one or more digital-signal-processors (DSPs), and/or other hardware arrangements. Method steps implemented in flowcharts contained herein, or as described above, may each be
-97implemented by corresponding respective modules; multiple method steps implemented in flowcharts contained herein, or as described above, may be implemented together by a single module.
It will be appreciated that, insofar as embodiments of the invention are implemented by a computer program, then a storage medium and a transmission medium carrying the computer program form aspects of the invention. The computer program may have one or more program instructions, or program code, which, when executed by a computer carries out an embodiment of the invention. The term “program” as used herein, may be a sequence of instructions designed for execution on a computer system, and may include a subroutine, a function, a procedure, a module, an object method, an object implementation, an executable application, an applet, a servlet, source code, object code, a shared library, a dynamic linked library, and/or other sequences of instructions designed for execution on a computer system. The storage medium may be a magnetic disc (such as a hard drive or a floppy disc), an optical disc (such as a CD-ROM, a DVD-ROM or a BluRay disc), or a memory (such as a ROM, a RAM, EEPROM, EPROM, Flash memory or a portable/removable memory device), etc. The transmission medium may be a communications signal, a data broadcast, a communications link between two or more computers, etc.

Claims (61)

1. A method of sending a data block from a first node to a plurality of other nodes for storage, the first node having a first node distribution key, and the method comprising:
encrypting the data block to form an encrypted data block, the encrypted data block having an identifier;
computing a set of other nodes using the first node distribution key and the identifier of the encrypted data block; and sending the encrypted data block from the first node to T nodes from the set of other nodes for storage.
2. The method of claim 1 wherein:
encrypting the data block comprises encrypting the data block using one or more encryption keys wherein each encryption key has a corresponding encryption key identifier;
and for each of the one or more encryption keys the method further comprises: computing a set of other nodes using the first node distribution key and the encryption key identifier; and sending the encryption key from the first node to T nodes from the set of other nodes for storage.
3. The method of claim 2 wherein:
the data block comprises a string of length L;
at least one of the one or more encryption key comprises a random or pseudorandom string of at least length L; and encrypting the data block comprises using the at least one of the one or more encryption keys as a one-time pad to encrypt the data block.
4. The method of claim 3 wherein using an encryption key as a one-time pad comprises performing modular addition between corresponding bits of the data block and the encryption key.
5. The method of claim 3 wherein using an encryption key as a one-time pad comprises performing an XOR operation between corresponding bits of the data block and the encryption key.
- 102 -
6. The method of any previous claim wherein a third party re-computing a set of nodes without the first node distribution key is computationally infeasible.
7. The method of any previous claim wherein:
the computing of a set of other nodes comprises using a secure pseudo-random number generator wherein a seed used by the secure pseudo-random number generator is generated using the first node distribution key and the identifier of either the encrypted data block or the one or more encryption keys.
8. The method of any of claims 2 to 7 wherein:
the data block has a data block identifier;
the encrypted data block and the one or more encryption keys are provided with a reference representing their position in an ordered list; and the identifier for each of the encrypted data block and the encryption keys is a hash function of the data block identifier and the reference representing the position of the encrypted data block or the one or more encryption keys within the ordered list.
9. The method of any previous claim wherein the first node has a pre-encryption key and the method further comprises:
pre-encrypting the data block using the pre-encryption key before encrypting the data block.
10. The method of claim 9 wherein:
the pre-encryption key is a public key; and pre-encrypting the data block comprises using a public-private key encryption algorithm to pre-encrypt the data block.
11. The method of claim 9 wherein:
the pre-encryption key is a private key; and pre-encrypting the data block comprises using a symmetric key encryption algorithm to pre-encrypt the data block.
12. The method of any previous claim further comprising:
- 103 post-encrypting the encrypted data block and/or the one or more encryption keys after encrypting the data block using a public key of the node from the set of other nodes to which that encrypted data block or the one or more encryption keys is being sent.
13. The method of any previous claim wherein T is calculated based upon an acceptable probability of the data block being permanently unavailable and an acceptable probability of the data block being temporarily unavailable.
14. The method of any previous claim further comprising pre-processing the data block in order to allow a user to verify that the data block has not been modified.
15. The method of claim 14 wherein the method is performed on multiple data blocks and the pre-processing comprises:
arranging the data blocks in a hash tree before encryption of the data blocks.
16. The method of claim 14 or claim 15 wherein the pre-processing comprises:
before encrypting the data block performing a hash function on the data block and using the results of the hash function as an identifier for the data block.
17. The method of any previous claim further comprising:
receiving from at least one of the other nodes from each of the sets of other nodes that received data in the form of either the encrypted data block or an encryption key an acknowledgement of receipt of the data; and publishing the received acknowledgements as a transaction on a public ledger.
18. The method of claim 17 wherein the public ledger is a block chain.
19. The method of claim 17 or claim 18 wherein:
the first node has a node identifier;
and the method further comprises:
publishing the first node identifier in the transaction on the public ledger along with the acknowledgments.
20. The method of any of claims 17 to 19 wherein:
the data block has a data block identifier;
- 104and the method further comprises:
publishing the data block identifier in the transaction on the public ledger along with the acknowledgments.
21. The method of any of claims 17 to 20 wherein publishing the acknowledgements comprises:
publishing the acknowledgements received from each set of other nodes as separate transactions.
22. The method of any of claims 17 to 21 wherein:
publishing the acknowledgements on the public ledger comprises publishing up to M acknowledgements on the public ledger for each set of other nodes and discarding any other acknowledgements.
23. The method of claim 22 wherein M is a minimum number of nodes that must store data in order to achieve a desired level of availability for the data.
24. The method of claim 22 or claim 23 further comprising at each of the other nodes to which data is sent:
sending an acknowledgement of receipt of data to the first node;
searching the public ledger for the acknowledgement of receipt;
storing the data if the acknowledgement of receipt is found on the public ledger; and deleting the data if the acknowledgement of receipt is not found on the public ledger.
25. The method of any of claims 17 to 24 wherein for each encrypted data block and/or encryption key:
sending the encrypted data block or the encryption key from the first node to a set of other nodes for storage further comprises sending a proof of storage request along with the encrypted data block or the encryption key wherein the proof of storage request comprises a token; and receiving an acknowledgement of receipt from a particular node comprises receiving the token sent to that particular node.
26. The method of claim 25 wherein a different token is sent to each node.
- 105 -
27. The method of claim 25 or claim 26 wherein:
the proof of storage request further comprises a public key of the first node.
28. The method of any of claims 25 to 27 wherein:
receiving an acknowledgement of receipt comprises receiving an additional string along with the token; and publishing the acknowledgements on the public ledger comprises publishing the received additional strings on the public ledger.
29. The method of any of claims 25 to 28 wherein:
sending a proof of storage request comprises encrypting the proof of storage request sent to a particular node with the public key of that particular node before sending.
30. The method of any of claims 25 to 29 wherein:
sending a proof of storage request further comprises signing the proof of storage request using private key of the first node before sending the proof of storage request.
31. The method of any previous claim further comprising:
monitoring a public ledger for transactions that indicate data at a node is now permanently unavailable wherein data at a node is permanently unavailable when the node storing the data is either permanently offline or has lost its data, and if either an encrypted data block or an encryption key has been sent to more than T’ nodes at which data is permanently unavailable redistributing that encrypted data block or encryption key to a new set of other nodes using the method of any previous claim.
32. The method of claim 31further comprising:
receiving, at a registrar for the public ledger, a heartbeat message from the first node and each of the other nodes to which data is sent at regular time intervals; and if the registrar has failed to receive the heartbeat message from a particular node for a defined length of time, publishing a transaction on the public register indicating the particular node is permanently offline and is thus permanently unavailable.
33. The method of claim 31 or claim 32 further comprising:
- 106publishing by any node in the system that has lost data a transaction on the public register indicating that the node has lost its data.
34. The method of any previous claim further comprising:
obtaining by the first node the encrypted data block sent for storage by: re-computing the set of other nodes using the first node distribution key and the identifier of the encrypted data block;
sending a request for the encrypted data block to at least one of the nodes from the set of other nodes;
obtaining the encrypted data block from at least one of the set of other nodes; and decrypting the encrypted data block.
35. The method of claim 34 further comprising:
obtaining by the first node the one of more encryption keys sent for storage by, for each of the one or more encryption keys:
re-computing the set of other nodes using the first node distribution key and the encryption key identifier; sending a request for the encryption key to at least one of the nodes from the set of other nodes; and obtaining the data block from at least one of the set of other nodes.
36. The method of any of claim 34 or claim 35 wherein decrypting the encrypted data block comprises using at least one of the one or more encryption keys as a one-time pad to decrypt the data block.
37. The method of claim 36wherein using an encryption key as a one-time pad to decrypt the encrypted data block comprises performing an inverse of modular addition between corresponding bits of the data block and the encryption key.
38. The method of claim 37 wherein using an encryption key as a one-time to decrypt the encrypted data block comprises performing an XOR operation between corresponding bits of the data block and the encryption key.
39. The method of any of claims 34 to 38 wherein sending a request for the data block or an encryption key and obtaining the data block or an encryption key comprises:
- 107 iteratively repeating steps comprising: (a) requesting the data block or encryption from one node in the set of other nodes and (b) in response to the request failing to receive the data block or encryption key, until a node provides the data block or encryption key in response to the request; and after obtaining the data block or encryption key from a node ceasing to send any further requests for the data block or encryption key.
40. The method of any of claims 34 to 38 wherein the first node has a pre-encryption key and the method further comprises:
performing a secondary decryption of the data block after decrypting the data block.
41. The method of claim 40 wherein:
the pre-encryption key is a public key and the first node has a private key corresponding to the public key; and performing a secondary decryption comprises using the private to decrypt the data block.
42. The method of claim 40 wherein:
the pre-encryption key is a private key; and performing a secondary decryption comprises using a symmetric key decryption algorithm to decrypt the data block.
43. The method of any of claims 34 to 42 wherein:
obtaining the data block or encryption key from at least one of the plurality of other nodes comprises receiving the data block or encryption key further encrypted using a public key of the first node; and the method further comprises:
decrypting the further encrypted data block or encryption key using a private key of the first node.
44. The method of any of claim 43 further comprising:
at a node that receives a request for a data block or encryption key from the first node;
encrypting the data block or encryption key using the public key of the first node; and
- 108 sending the data block to the first node.
45. The method of any of claims 34 to 44 further comprising:
at a node that receives a request for a data block or encryption key from the first node;
verifying the identity of the first node; and sending the data block to the first node after the identity of the first node is verified.
46. The method of any of claims 34 to 45 wherein the method is performed on multiple data blocks and the method further comprises:
after decrypting the data blocks rearranging the data blocks in a hash tree; and using the hash tree to verify the data blocks have not been modified during storage.
47. The method of any of claims 34 to 46 further comprising:
performing a hash function on the data block; and comparing the results of the hash function to the data block identifier to verify that the data block has not been modified during storage.
48. The method of any of claims 34 to 47 further comprising:
using the node identifier of the first node to identify relevant transactions on the public ledger;
extracting from the relevant transactions the data block identifier; and using the block identifier to determine the identifier of the encrypted data block and/or the encryption key identifiers.
49. The method of claim 48 wherein using the data block identifier to determine the identifier of the encrypted data block and/or the encryption key identifiers comprises:
providing the encrypted data block and the encryption keys to be obtained with a reference representing their position in an ordered list; and using a hash function of the data block identifier and the reference representing the position of the encrypted data block or the encryption key within the ordered list to obtain the identifier for each of the encrypted data block and the encryption keys.
50. A first node comprising:
- 109 an encryption module configured to encrypt a data block to form an encrypted data block, the encrypted data block having an identifier;
a node determining module configured to compute a set of other nodes using a first node distribution key and the identifier of the encrypted data block; and a distribution module configured to send the encrypted data block from the first node to T nodes from the set of other nodes for storage.
51. The first node of claim 50 wherein:
the encryption module is configured to encrypt the data block using one or more encryption keys wherein each encryption key has a corresponding encryption key identifier;
and for each of the one or more encryption keys:
the node determining module is further configured to compute a set of other nodes using the first node distribution key and the identifier of the encryption key; and the distribution module is further configured to send the encryption key from the first node to T nodes from the set of other nodes for storage.
52. The first node of claim 50 or claim 51 wherein:
the node determining module is configured to compute a set of other nodes using a secure pseudo-random number generator wherein a seed of the secure pseudo-random number generator is generated using the first node distribution key and the identifier of either the encrypted data block or the encryption key.
53. The first node of any of claims 50 to 52 further comprising:
a hash module configured to arranging the data blocks in a hash tree before the encryption module encrypts the data block.
54. The first node of any of claims 50 to 53 further comprising:
an acknowledgment receiving module configured to receive an acknowledgement of receipt from at least one of the other nodes from each of the sets of other nodes; and an acknowledgment processing module configured to publish the received acknowledgments on a public ledger.
55. The first of claim 54 further comprising:
-110a storage request module configured to generate proof of storage requests wherein each proof of storage request comprises a token; and wherein the data distribution module is configured to send a proof of storage request along with the encrypted data block and/or the one or more encryption keys; and the acknowledgment receiving module is configured to receive the tokens sent in the proof of storage requests.
56. The first node of any of claims 50 to 55 wherein:
the node determining module is further configured to re-computer the set of other nodes using the first node distribution key and the identifier of the encrypted data block;
and the first node further comprises:
a data requesting module configured to send a request for the encrypted data block to at least one of the nodes from the set of other nodes and receive the encrypted data block from at least one of the nodes from the set of other nodes; and a decryption module configured to decrypt the encrypted data block.
57. The first node of claim 56 wherein:
for each of the one or more encryption keys:
the node determining module is further configured to re-compute the set of other nodes using the first node distribution key and the identifier of the encryption key;
the data requesting module is configured to send a request for the encryption key to at least one of the nodes from the set of other nodes and receive the encryption key from at least one of the nodes from the set of other nodes; and the decryption module is configured to decrypt the encrypted data block using the returned encryption keys.
58. A system comprising the first node of any of claims 50 to 57 and a plurality of other nodes wherein the sets of other nodes are subsets of the plurality of other nodes.
59. An apparatus arranged to carry out a method according to any one of claims 1 to 49.
60. A computer program which, when executed by a processor, causes the processor to carry out a method according to any one of claims 1 to 49.
-111
61.
A computer-readable medium storing a computer program according to claim 60.
GB1815423.7A 2018-09-21 2018-09-21 Distributed data storage Active GB2574076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1815423.7A GB2574076B (en) 2018-09-21 2018-09-21 Distributed data storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1815423.7A GB2574076B (en) 2018-09-21 2018-09-21 Distributed data storage

Publications (3)

Publication Number Publication Date
GB201815423D0 GB201815423D0 (en) 2018-11-07
GB2574076A true GB2574076A (en) 2019-11-27
GB2574076B GB2574076B (en) 2022-07-13

Family

ID=64024300

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1815423.7A Active GB2574076B (en) 2018-09-21 2018-09-21 Distributed data storage

Country Status (1)

Country Link
GB (1) GB2574076B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220069981A1 (en) * 2020-09-03 2022-03-03 Google Llc Distribute Encryption Keys Securely and Efficiently

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110099112B (en) * 2019-04-28 2022-03-29 平安科技(深圳)有限公司 Data storage method, device, medium and terminal equipment based on point-to-point network
CN110175819B (en) * 2019-05-29 2023-03-24 贵州电网有限责任公司 Online multi-person cooperation daily affair personalized service system and operation method
CN110968899B (en) * 2019-11-27 2022-04-01 杭州趣链科技有限公司 Data blocking confirmation method, device, equipment and medium based on block chain

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060089936A1 (en) * 2004-10-25 2006-04-27 Tom Chalker System and method for a secure, scalable wide area file system
US20140365541A1 (en) * 2013-06-11 2014-12-11 Red Hat, Inc. Storing an object in a distributed storage system
US20160026684A1 (en) * 2014-07-22 2016-01-28 Oracle International Corporation Framework for volatile memory query execution in a multi node cluster
EP2988218A1 (en) * 2014-08-22 2016-02-24 Nexenta Systems, Inc. Multicast collaborative erasure encoding and distributed parity protection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060089936A1 (en) * 2004-10-25 2006-04-27 Tom Chalker System and method for a secure, scalable wide area file system
US20140365541A1 (en) * 2013-06-11 2014-12-11 Red Hat, Inc. Storing an object in a distributed storage system
US20160026684A1 (en) * 2014-07-22 2016-01-28 Oracle International Corporation Framework for volatile memory query execution in a multi node cluster
EP2988218A1 (en) * 2014-08-22 2016-02-24 Nexenta Systems, Inc. Multicast collaborative erasure encoding and distributed parity protection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220069981A1 (en) * 2020-09-03 2022-03-03 Google Llc Distribute Encryption Keys Securely and Efficiently

Also Published As

Publication number Publication date
GB201815423D0 (en) 2018-11-07
GB2574076B (en) 2022-07-13

Similar Documents

Publication Publication Date Title
US10615985B2 (en) Achieving consensus among network nodes in a distributed system
CA3053208C (en) Performing a change of primary node in a distributed system
US10911231B2 (en) Method for restoring public key based on SM2 signature
US11818262B2 (en) Method and system for one-to-many symmetric cryptography and a network employing the same
US11082482B2 (en) Block chain encoding with fair delay for distributed network devices
US9467282B2 (en) Encryption scheme in a shared data store
Bellare et al. Message-locked encryption and secure deduplication
US9116849B2 (en) Community-based de-duplication for encrypted data
GB2574076A (en) Distributed data storage
US11595365B1 (en) Method and apparatus for third-party managed data transference and corroboration via tokenization
Mo et al. Two-party fine-grained assured deletion of outsourced data in cloud systems
TW202025666A (en) Computer implemented system and method for sharing a common secret
Yang et al. Provable Ownership of Encrypted Files in De-duplication Cloud Storage.
US20220385453A1 (en) Secure file transfer
Kanagamani et al. Zero knowledge based data deduplication using in-line Block Matching protocolfor secure cloud storage
Abraham et al. Proving possession and retrievability within a cloud environment: A comparative survey
Khudaier et al. A Review of Assured Data Deletion Security Techniques in Cloud Storage