WO2022117763A1

WO2022117763A1 - Methods, systems and networks for recovering distributed databases, and computer program products, data carrying media and non transitory tangible data storage media with computer programs and/or databases stored thereon useful in recovering a distributed database

Info

Publication number: WO2022117763A1
Application number: PCT/EP2021/084050
Authority: WO
Inventors: Victor Ermolaev; Aleksei Koren; Gamze TILLEN; Mattijs VAN DEN BOS
Original assignee: Ing Bank N.V.
Priority date: 2020-12-04
Filing date: 2021-12-02
Publication date: 2022-06-09
Also published as: NL2027048B1

Abstract

Methods of operating a distributed database arrangement, and of building a recovery database are described. The recovery data is generated at a first system in a network of systems participating in the distributed database arrangement. This comprises generating encrypted transition data representing a cyphertext version of transition data containing information about a transition of at least one data record of the data records from a first state to a second state. The encrypted transition data is stored in a recovery data memory, separate from the distributed database at a second system in the network. The second system verifies whether the encrypted transition data encrypts a transition to be performed on the data record and if so, stores the recovery data in the recovery data memory, separate from the distributed database. With the recovery data, the distributed database can be restored to a correct past state.

Description

Methods, systems and networks for recovering distributed databases, and computer program products, data carrying media and non-transitory tangible data storage media with computer programs and/or databases stored thereon useful in recovering a distributed database.

Description

Field of the invention

This invention relates to methods, systems and networks for recovering distributed databases, and in particular for recovering distributed ledgers. The invention further relates to computer program products, data carrying media and non-transitory tangible data storage media with computer programs and/or databases stored thereon useful in recovering a distributed database.

Background of the invention

This background is merely to provide context to the present invention and shall not be held as a statement by the applicant or the inventors that anything described herein is prior art to the present invention, unless explicitly identified as belonging to the prior art by terms "known" or "prior art" or the like.

Distributed databases are stored distributed over a number of computing devices, contrary to a centralized database in which the database is stored in a central device and with a central control over which data is stored in the database. Distributed databases and their use are as such known, as well as ways in which the data records of the distributed database can be shared, replicated and synchronized over the devices. For example, distributed ledger technologies (DLT) are known, in which the database is stored in a DLT network distributed over a number of nodes of the DLT networks, and the nodes execute a predetermined consensus protocol to decide which changes to the data records are accepted, and to maintain the data records synchronised.

However, despite existing mechanisms aiming to make this difficult, the integrity of the distributed database may be compromised due to weaknesses in the security mechanisms. Examples of weaknesses are known from Putz B., Pernul G. (2019) "Trust Factors and Insider Threats in Permissioned Distributed Ledgers", page 25—50 of Hameurlain A., Wagner R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLIL Lecture Notes in Computer Science, vol 11860. Springer, Berlin, Heidelberg. For example, one or more of the devices participating in the distributed database system may succeed in injecting unauthorized changes to the data records which become noticed only after the unauthorized changes have been implemented in the entire distributed database system. This especially, but not exclusively, presents risks to permissioned and/or private DLTs. A malicious participant compromising the ledger, e.g. by unauthorized modification of a data record, a security failure in the privacy protection mechanism or by exploiting security flaws in the consensus protocol, not only causes a risk of leaking sensitive data but is also impacts the reliability of the data stored in the ledger. This is therefore detrimental to the required level of trust required to operate the DLT network.

Weaknesses may be mitigated by appropriate security measures inhibiting a breach of the security measures. Until now, focus has been on identifying weaknesses and finding solutions thereto, as for example in section 6 of the aforementioned paper by Putz et al., thus seeking to prevent breaches. The present inventors realized that, though useful, mitigating weaknesses and flaws will not be sufficient and that security breaches can never be completely excluded.

The inventors further realized that in case of a breach, the integrity of the distributed database has to be restored and therefore solutions have to be found which enable to re-establish the distributed database to an uncompromised state. However, for distributed database networks, integrity recovery of the database in case of a security breach is problematic.

A first way to restore the integrity of the distributed would be to request and obtain from all participating devices all details of the changes made by the respective devices in order to restore the data records to a previous state which has not been compromised. However, this still allows a participating device to disclose incorrect information and hence conceal the tampering. In addition, this requires the participating devices to disclose all information about the data records, their current and past states and the modifications, some of which may simply not be available anymore. For example, the information may have been intentionally erased to conceal the tampering by a device, or the malicious participating device may have been shut down. Also, some of the data may be sensitive and may be required to be kept secret, for example in some DLTs the identity of a party involved in a modification of the data record is kept private and concealed for the other participants in the DLT network.

Recovery is especially, but not exclusively, problematic for permissioned DLTs and/or closed DLTs because the contents of the data records and of the transitions of the data records may be sensitive, private data. In the last years, several privacy-preserving techniques have been adopted to assure the confidentiality in DLT transactions, such as the use of zero-knowledge proofs (ZKP), the use of protected, private regions of memory (a feature of special integrated circuit processors sold by Intel Corporation of Santa Clara, California, United States of America, referred to as "Software Guard Extensions", SGX, enclaves) which can only be changed by software running within the protected, private region, and the use of cryptographic hashes. In such DLTs, to request all nodes to disclose their transitions and authorizations in reaction to detecting that the distributed ledger is compromised is contrary to the privacy-preserving nature of the DLT. Summary of the invention The present invention provides method, systems and networks as described in the accompanying claims. The invention further provides a computer program product, a computer readable medium, and non-transitory, tangible data carrier as described in the accompanying claims. Specific embodiments of the invention are set forth in the dependent claims. These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter. Brief description of the drawings Further details, aspects and embodiments of the invention will be described, by way of example only, with reference to the drawings. In the drawings, like reference numbers are used to identify like or functionally similar elements. Elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. FIG.1 schematically shows an example of an embodiment of a network of systems participating in the distributed database arrangement. FIG.2 schematically shows a flow chart of an example of a method for recovering a distributed database. FIG.3 schematically shows a topology of an example of an embodiment of a network with a distributed database. FIG.4 schematically shows a flow chart of an example of method of operating a distributed database in a data communication network. FIG. 5 schematically shows an architecture of an example of state transition software and verification software suitable for the examples of FIGs.3 and 4. FIG.6 shows a block diagram of a part of state transition software and verification software suitable for the example of FIG.5. Detailed description of the preferred embodiments Because the illustrated embodiments of the present invention may, for the most part, be implemented using components and computer programs known to those skilled in the art, details will not be explained in any greater extent than that considered necessary as illustrated for the understanding and appreciation of the underlying concepts of the present invention and in order not to obfuscate or distract from the teachings of the present invention.

In summary, in the examples, a secure integrity recovery solution is provided in order to be able to recover a distributed database in case a decision is made to reinstate the distributed database to a previous state. The decision can for example be made when the integrity of the database is deemed to be compromised, e.g. in response to detecting a breach of security in the distributed database system. In this recovery solution, when a set of one or more modifications to the data records is made, encrypted data representing the set of modifications is stored in a secure recovery database, separate from the distributed database, with the correctness and storage of the encryption in the secure backup validated. That is, the encrypted data is only accepted for storage if a verification yields that the encrypted data reflects the set of modifications and hence is correct that the encrypted data reflects. More specific, only if the encrypted data meets a set of predetermined verification criteria, the encrypted data is accepted for storage in the recovery database. The encrypted data is therefore only stored when this has been verified to correspond to the set of modifications, and to contain a correct representation thereof. In addition, the set of modifications is only accepted by the participating devices in the distributed database system if proof of storage of the encrypted data in the recovery database is provided. Said differently, the modifications to the data records are only implemented if proof of storage is provided.

When the verification has another outcome, e.g. no further action may be taken or the storage of the recovery data In the recovery database be rejected, for example by outputting a rejecting message to the initiating node and/or to some or all of the participating nodes..

Using the encrypted data in the secure recovery database, if the distributed database is detected to be compromised, the distributed database can be reconstructed to a past valid state, e.g. the latest valid state or a state preceding the latest valid state. Because the encrypted data in the recovery database is used, the distributed database can be recovered to a past state without the nodes or devices that operate the distributed database being required to resend the original data, or without even revealing the original data to the other nodes or devices, as is explained below in more detail. More specifically, the modifications, also referred to as transitions, of the distributed database can be reconstructed by decrypting on a set-by-set basis the encrypted data. For example, the modifications can be traced back from the current state to the past state by decrypting set-by-set the modification data corresponding to the transitions from the current state back to the past state, and reconstructing the modification from the decrypted data. Or, for example, the transitions of distributed database can be reconstructed from a past state, e.g. the initial state, known to be un-compromised up to the latest uncompromised state, for example. Furthermore, the validation of the correctness of the encrypted data prior to storing in the recovery database, ensures the integrity of the recovery database, i.e. that it contains the correct encrypted data. This not only obviates the risk that the distributed database cannot be recovered, e.g. because data needed for the recovery is not available or a malicious node is involved in the recovery which provides data concealing the security breach, but also avoids the need for the nodes to disclose private or confidential data. In this respect, the recovering of the data from the recovery database can e.g. be performed by a device not storing or modifying the distributed database itself and without revealing sensitive data to the participating devices. Also, the corresponding encrypted data may differ in encryption characteristics between sets, e.g. the decryption key required to decrypt the data of a given set may differ from the key required to decrypt the data of another set. (This may additionally apply to the encryption key used). Thereby, recovery can only be performed on a set-by-set basis. This limits the impact of a breach of security of the recovery database, and in case an intruder procures a decryption key prevents the intruder thereby having access to all data of the complete distributed database via the recovery database.

Referring now to the drawings, FIG. 1 schematically illustrates an example of a method of operating a distributed database. In this example the method is separated in (A) a method of generating recovery data of a distributed database in a distributed database arrangement, (B) a method of storing recovery data of the distributed database, and recovery data of a distributed database, and (C) performing operations on the distributed database. Together (A) and (B) form a method of building a recovery database, whereas (A)-(C) form the operating of the distributed database, where data records of the distributed database are read and modified while in parallel a recovery database is maintained which allows to reconstruct a past state of the distributed database.

The distributed database arrangement comprises a number of systems 2-5 which run software on respective integrated circuit processors to perform the methods illustrated in FIGs. 1 and/or 2. In this example the distributed database arrangement is a distributed ledger technology network 1, such as explained in more detail with reference to FIGs. 3-5, but it will be apparent that the method is generally suitable for other types of distributed database arrangements, and in particular those with a decentralized control of the modifications of the data records of the distributed database.

The systems comprise participating systems 2 (represented with "Transactor Node" in FIG. 1) which store the data of the distributed database, and which may modify the data records of the distributed database. A respective participating system 2 may e.g. initiate a transaction, and submit that to the other participating systems 2 or another approval system for approval and acceptance. The arrangement may further comprise non-participating systems which can read the stored data, without being capable of modifying the data records and/or without storing the data records, as well as indirectly participating systems which do not store the data records but which can cause other, directly participating, systems storing the data records to perform a modification to a data record, e.g. by providing a corresponding message which contains approval data which triggers the modification for example.

On the systems in the arrangement, software executing a consensus protocol is running, represented in FIG. 1 with "Consensus Service", which in the example of FIGs. 3-5 is performed by node 3. The consensus protocol is a set of, predetermined, rules and procedures which allows the systems to determine whether or not there is consensus, that is agreement that a proposed database transition is validated by the systems executing the protocol. The consensus protocol may e.g. be performed by a separate layer, similar to the OSI abstraction layers, separate from the database layer which handles the transactions on the distributed database. In this respect, the consensus protocol can be a deterministic consensus protocol, i.e. the transition is either agreed and executed by all the systems, or by none of the systems. In case of a DLT, this is thus a forkless system. Alternatively, the consensus protocol may be non-deterministic, that is the transition can be agreed and executed by some, but not all, of the systems, and not be accepted by the other systems. In case of a DLT, this can thus be a system with forks. In order to ensure a deterministic recovery, in such a case though, the DLT network may be implemented such that forks are excluded, e.g. by excluding from the DLT network the systems not implementing the proposed database transition and thus having a non-deterministic but forkless consensus. The participating systems 2 storing the data of the distributed database implement the outcome of the consensus protocol, and thus modify the database records in a consistent manner.

More specifically, in this example the systems run software to operate and use a distributed ledger in the sense of the definition of this term in ISO 22739:2020 (en) point 3.22, and the distributed database is a distributed ledger, with transactions to the distributed ledger being validated through the execution of the consensus protocol. However, the distributed database may be another type of distributed database, and e.g. partly or fully mutable, for instance. The distributed database may for example be one or more of: a distributed ledger, a private ledger, a permissioned ledger, a consortium ledger, segregated ledger. The consensus software may e.g. be executed by the participating system or, for instance a separate system such as a validating node 3 as explained with reference to FIG. 3.

The generation of recovery data is performed in this example by a first system 2 (represented with "Transactor Node" in FIG. 1) participating in the distributed database arrangement. In FIG. 1 this first system 2 is a node in a network 1 (as illustrated in FIG. 3) of nodes 2-5 participating in the distributed database arrangement. The storing is performed in this example by one or more other systems 4, represented in FIG. 1 by "LI RS Storage Service", and hereinafter referred to as "recovery data storage system". The recovery data storage system may be a centralized storage or a distributed storage, for instance. In this example, these recovery data storage systems are systems not participating in the distributed database arrangement, in the sense that this/these recovery data storage system(s) 4 do(es) not store or modify the data records of the distributed database itself. In case of a DLT for instance, the recovery database can be stored off-ledger on a system not storing the ledger. The recovery data storage system 4 may however perform a role in operating the distributed database arrangement and can for example be a system which also runs software executing a part of the consensus protocol, such as verification software to verify intended modifications of the data records by the first systems against a predetermined set of one or more verification rule, and which authorizes the modifications in case the modification requirements, such as a "notary service" in a Corda network (as described in WO2017182788A1, the contents of which are incorporated herein by reference). However, the recovery data storage system 4 may alternatively be an independent system not involved in operating the distributed database arrangement, and just store the recovery data in accordance with the method described below.

As described in more detail below with reference to FIGs. 3 and further, the distributed database arrangement used to perform the methods illustrated may for example be implemented as a network in which transition data is only shared with nodes which store a data record which is modified or which, in accordance with the consensus protocol, have to authorize a transition in order to be accepted. In such a network, the encrypted data in the recovery database can be used to recover the distributed database without requiring sharing the data with other nodes. In such a network, for example, a transition, once authorized by the required nodes, may require validation by a separate node which verifies whether the current state of the data records is allowed to be modified or not. The database arrangement may be implemented such that this separate node does not receive the content of the current state or the, intended, new state of the data record but does receive information allowing to verify the validity of the transition. The encrypted data in the recovery database then further allows to recover the distributed database without requiring the content to be provided to this separate node.

For instance, the network may be a network as described in European patent application 19203108.6, titled "methods of operating a distributed database, network, nodes, computer program product and data carrier suitable for such methods", filed 14 October 2019 in the name of the current applicant, the content of which is herein incorporated herein by reference. Likewise, the devices may be implemented as nodes described in the referenced document. The additions described hereinafter may then be applied thereto: the transmitting first node initiating a transition is arranged to also perform a method of generating recovery data as explained below. In addition, the rules and constraints against which the transition is checked by what is called in the referred document the "second node" (and which may also be referred to as a transaction validating node) include a rule that the transmitting first node has to provide evidence that the encryption data has been stored in the secure recovery. Furthermore, the network may be provided with a recovering data storing node 4 which performs the storing of the recovery data of the distributed database.

In another example, the network and nodes used to perform the methods illustrated may for example be implemented as a network in which two or more nodes can run a subnetwork with a distributed sub-database of which some or all of the content is not visible to the nodes outside the subnetwork. An example of such a distributed database arrangement is known as "Hyperledger Fabric" in which the subnetwork is referred to as a "channel". The subnetwork may then provide only a hash of the transaction to the other systems in the network, e.g. to be stored in the ledger in case of a DLT, for example. In such an example, the encrypted data in the recovery database can be used to recover the distributed database without requiring sharing the data in the sub-database with the nodes outside the subnetwork.

In another example, the network and nodes used to perform the methods illustrated may for example be implemented as a DLT network in which the consensus protocol requires that the transmitting node obtains approval of only a specified subset of the participating devices before the transition is accepted, and participating devices only accept the transition if the required approvals are obtained. The transaction may then be executed when the transmitting node transmits a message with evidence of the obtained approvals to the participating devices storing the respective data records, e.g. by a broadcasting messaging service. In such a case, the nodes not storing the data records or being in the subset may not get the information about the content of the transaction. In such a case, for example, as a separate, additional requirement, approval of a node storing the back-up may be used as additional requirement before the participating nodes accept the transition, or the storing device can be included in the specified subset and provide its approval only after verifying the encrypted data as corresponding to the transition.

In another example, the network and nodes used to perform the methods illustrated may for example be implemented as a public, non-permissioned or permissioned, DLT with a separate distributed data storage, such a distributed data storage distributed as open source software under the name Swarm (as described in "Hartman, John H., Ian Murdock, and Tammo Spalink. "The Swarm scalable storage system" Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No. 99CB37003). IEEE, 1999) or Filecoin (as described in Benet, J., and N. Greco. ''Filecoin: A decentralized storage network" Protoc. Labs (2018): 1-36). In such a case, the encrypted data can e.g. first be sent to the distributed data storage service for storage, and a storage receipt received from the distributed data storage service that evidences that data (containing the encrypted data) with a certain hash was stored. The nodes participating in the transaction may then create a public transaction but without storing the content of the transaction in the distributed database. The public transaction contains proof of validity of the transaction represented by the stored encrypted data, the storage receipt and proof that encryption of the data (or if the stored data contains more than just the encrypted data and the hash is for the complete content, the data) hashes to the hash value in the storage receipt from the distributed data storage service. Said differently, in such a case the encrypted data stored is stored off-ledger, while the DLT ledger itself does not contain a plain text version of the encrypted data, rendering the content of the transaction private to the nodes participating in the transaction. This allows to offload computationally intensive operations to the nodes participating in the transaction, making the network more scalable and shrinking the overall size of the public blockchain, while on the other hand the encrypted data in the recovery database. The public transaction may in addition to the data above, contain e.g. metadata or other data.

Referring now in detail to FIG. 1, with respect to (A) in FIG. 1, the building of a recovery database for a distributed database arrangement with a distributed database, may comprise generating recovery data at a transacting system 2. As explained in more detail with reference to FIGs.3-5, the transacting system 2 may for example comprise a memory 23 for storing data of at least a part, or all, of the data records of the distributed database, and the recovery data may be generated when the system 2 initiates a transition of one or more data records of the distributed database from a first state to a second state. In the following, as an example of such a transition, a database transaction is used, i.e. a set of modifications that is completed as a unit, i.e. either all modifications of the set are performed, and the unit is completed, or the modifications are not performed and the unit not completed, such as a DLT transaction in the sense of point 3.77 of ISO 22739:2020 (en). The transaction can e.g. be single atomic operation on a database, that is the smallest set of modifications possible, or for example be a series of atomic operations coupled together to form a single transaction.

As illustrated with block 100, a, first, transacting system 2 in the network of systems participating in the distributed database arrangement may initiate the transition {tx}, e.g. from a state A to a next state B (as illustrated in FIG. 2 (A)). The transition may be subject to passing, in accordance with the procedures of the consensus protocol, a set of predetermined checks and verifications, which is jointly executed by the systems 2,3 in the arrangement to decide which changes to the data records are accepted, and to maintain the data records stored on the systems 2 synchronised. In this respect, starting from an initial, common state of the distributed database, the execution of the consensus protocol, and the messages exchanged therein, ensure that the systems 2 accept the same changes to the values of the data records, and hence that the distributed database remains synchronized over the systems, assuming that there is no breach of security or a malicious system that compromises the database.

As illustrated with block 101, the generating of recovery data may comprise generating encrypted transition data {E_tx} representing a cyphertext version of transition data containing information about the transition. For example, the plain text data may be encrypted with a conventional encryption scheme, such as AES or RSA. In these examples, the encrypted transition data {Etx} is generated per transaction {tx}, but alternatively the encrypted transition data {E_tx} may represent a number of transactions {txi} ... {tXj} together. For example, a transacting node 2 may initiate a set of consecutive transactions {txi}... {tXj} modifying a single data record, or modifying data records stored at the same location, making the transactions inseparable. In such a case encrypting data for the transactions {txi}... {tXj} together may provide a more efficient storing and reduce the number of operations required.

As illustrated in block 101, the method may also comprise the transacting system 2 generating proof {n} evidencing that the encrypted transition data {E_tx} encrypts the transition to be performed, also referred to as Proof of Encryption or PoE, and transmitting a message containing the proof {n} to the recovery data storage system 4. The recovery data storage system 4 may in response verify with the received proof {n} whether the encrypted transition data { E_tx} encrypts the transition to be performed.

As indicated with the arrow ETx,PoE,TxlD in FIG. 1, after generating, the encrypted transition data {E_tx} is sent to the recovery data storage system 4 to be stored in a recovery data memory, separate from the distributed database. In the shown example, indicated with PoE, in addition the cryptographic proof of encryption {n} that the encrypted transition data {E_tx} represents the initiated transition(s) is sent to the recovery data storage system 4 as well. Furthermore, an identifier TxID for the transition may be sent, such that the encryption transition data {E_tx} is stored coupled to the identifier, and in case of a recovery can be linked to a specific transition to be recovered.

As illustrated with block 110, in response to receiving the encrypted transition data {E_tx}, the recovery data storage system 4 verifies whether the encrypted transition data encrypts the initiated transition to be performed on the data record. Preferably, the verification is performed without decrypting the encrypted transition data, and more preferably the recovery data storage system 4 is not capable of doing so, e.g. because it does not possess the decryption key. For example, the encrypted transition data {E_tx} may be verified with the received proof of encryption {n} or another check may be performed. For instance, the recovery data storage system 4 may send to other participating systems 2, e.g. by broadcasting, a message requesting encrypted transition data {E_tx} from the other participating systems for the specific transition and compare the received encrypted transition data {E_tx} to determine whether or not the differences meet a predetermined comparison criterion, and correspond sufficiently to the encrypted transition data {Etx}. For example the comparison criterion can be that the encrypted transition data has no differences, e.g. is the same, such as can be determined by calculating the hash value thereof and comparing the hash values of the encrypted transition data {Etx} received from the different participating systems.

If the verification yields that the encrypted transition data does not encrypt, the storage is refused, and e.g. a corresponding storage refusal message can be sent to the participating nodes or to the nodes running the consensus protocol. The transition may then be stopped and the distributed database remain in its current state (until another transition is initiated, of course).

If the verification yields that the encrypted transition data does encrypt the transition to be performed (and is not an encryption of other, e.g. fake or dummy, data), the recovery data storage system 4 stores the recovery data, separate from the distributed database. For example, in case of a DLT the recovery data is stored off-ledger, separate from the world state data and separate from the ledger. To that end, the recovery data storage system 4 may e.g. have a recovery data memory 44, as illustrated in FIG.3. As indicated with the arrow from block 110 to block 102 in FIG. 1, the recovery data storage system 4 may thereafter output to the transacting system 2 a storage confirmation message Sc confirming that the encrypted data {E_tx} was stored in the recovery data memory 44.

In an example, the participating system(s) 2 also generate(s) validation data {S} representing a validation of the transition {tx} by systems in the network. The validation data {S} may for example comprise one or more digital cryptographic signatures SA,S_B, ... from nodes A,B, ... (or users of those nodes) linked to one or more data record fields modified by the transition, comprise data from other data records or other data required by the nodes executing the consensus protocol to determine whether or not to validate a transition. In such a case, in the recovery data storage system 4 a cyphertext version of validation data {S}, i.e. encrypted validation data {Es}, may be stored as well, and the stored recovery data may thus further comprise the encrypted validation data {Es}. The transacting system 2 may for instance generate the encrypted validation data {Es} by encrypting the validation data {S}, e.g. with the same key and protocol as the encrypted transition data {E_tx} and transmit the encrypted validation data {Es} and the proof {n} to the recovery data storage system 4

The recovery data storage system 4 may, after receiving the encrypted validation data {Es}, verify whether the encrypted validation data {Es} encrypts the validation {S_p} of the transition {tx} (and is not an encryption of other, e.g. fake or dummy, data e.g. used to conceal absence of a validation). The recovery data storage system 4 stores the recovery data in the recovery data memory 44 if the verification yields that the encrypted validation data {Es} does correspond to that validation {S}. Preferably, the verification is performed without decrypting the encrypted transition data, and more preferably the recovery data storage system 4 is not capable of doing so, e.g. because it does not possess the decryption key. This allows to preserve the confidentiality of the stored data, while on the other hand ensuring that the stored data allows to restore the distributed database. For example, the transacting system 2 may generate a, e.g. cryptographic, proof {ns} that the encrypted validation data {Es} encrypts the validation {S_p} of the transition {tx} to be performed and transmit the proof {ns} to the recovery data storage system 4. The recovery data storage system 4 may then verify with the received proof {ns} whether the encrypted validation data encrypts the validation of the transition to be performed, without decrypting the encrypted validation data.

For example, the transacting system 2 may, after generating the encrypted transition data {Etx}, transmit a message to one, or more than one, other system 2,3 involved in the transition to obtain a verification of the encrypted transition data {E_tx} by the one, or more than one, other system. The verification may be given in the form of a cryptographic signature {S_E} by the other system of the encrypted transition data {E_tx}. The other system(s) may for example verify with the information about the transition received from the transacting system 2 in order to validate the transaction {tx} itself, whether the encrypted transition data { E_tx} encrypts (or not) for the transition {tx}. If this does encrypt, the other system may return to the transacting system 2 a message with a cryptographic signature {S_E} on the encrypted transition data {E_tx}, for example. The transacting system 2 may in response generate encrypted verification data. For example, the transacting system 2 may concatenate the cryptographic signatures {S_E} from the other systems and subsequently encrypt the concatenated signatures into collective encrypted verification data.

The transacting system 2 may send the encrypted validation data {Es} and the encrypted verification data as separate data. Alternatively, the transacting system 2 may e.g. generate encrypted collective verification data. The verification data may comprise linked data representing the validation {S_p} and data representing the verification {S_E}, such as cryptographic signatures thereof, and encrypt the linked data to obtain the encrypted collective verification data {Es}. One or more of the proofs above may comprise calculating a cryptographic proof of knowledge n. The cryptographic proof of knowledge n may be calculated by performing an interactive proof of knowledge protocol or a non-interactive proof of knowledge protocol by the transacting system 2, preferably a zero-knowledge proof protocol ZkPoK(w,x), w representing a secret input and x a public input.

The calculation may use as secret input {w} one or more, preferably all, of the group consisting of: the encrypted transition {E_tx}, the encrypted validation data {Es}, and a decryption key {sk}. The decryption key can be a decoding key, different from the encoding key pk, such as the private key sk of a public-private key pair pk-sk. As public input {x} for example one or more, preferably all, may be used of the group consisting of: a value {H_tx} calculated with a one-way function from the transition data {tx}, a value {H^Es} calculated with a one-way function from the encrypted validation data {E^E _S}, a value {H^Es] calculated with a one-way function from the encrypted collective validation data {Es}. One, some or all of the one-way functions may for example be hashfunctions or other one-way functions. In this respect, the value {H_tx} forms a unique identifier {txi_D} for the transition data {tx}.

The proof n may evidence one or more, preferably all, of the following: a decryption of the encrypted transition data {E_tx} with a predetermined private key {sk} results in the transition data {tx}; a value calculated with a predetermined one-way function from the transition data {tx} corresponds to the identifier {txi_D} of the transition to be performed; the transition data {tx} and/or encrypted transition data {E_tx} have been verified by one, or more than one, other system maintaining the data record, e.g. that the encrypted validation data or encrypted collective verification is given to those (This may e.g. be a proof that the transition data {tx} and/or encrypted transition data {E_tx} have been signed with the cryptographic signatures SA,S_B,... of those systems); the transition meets a set of predetermined transition rules.

Specifically, the proof of knowledge ZkPoK(w,x,sk) may use as witness {w} the combination of (E_tx,Es, sk), and as instance {x} the combination of (txi_D, H^Es, H^E _tx). This allows to evidence with a single proof all of the above, while at the same time allowing the recovery data storage system to verify those statements without access to plain text transition data. In addition, this allows with the same proof to verify that the transition is a valid one, as well as that the encrypted data contains a cypher text version of, i.e. encrypts for, the transition data.

In the example of FIG. 1, the generation of recovery data (A) may for example comprise performing one or more, or all operations described with the following pseudo-code. In this pseudo-code, the right-to-left arrow indicates that the parameter at the left is the outcome of the function at the right-hand side: start encryption key = pk { tx ← create_tx() Etx ← encrypt(pk,tx) S ← collect signatures(tx, Etx) check_tx(tx) ES ← encrypt(pk,S) π_tx ← generate proof(tx, E_tx, S) } end One or more of the operations in the above pseudo-code, may be a procedure as described with the following pseudo-code: procedure create_tx Build tx return tx procedure encrypt(tx) Etx ← enc(pk, tx) return Etx procedure collect sigs(tx, Etx) S_P ← sign(ks P, tx_data) S^E _P ← sign(ks P, E_tx) S ← S_P |SE P return S procedure check tx Check transaction rules return true procedure Generate proof(tx, E_tx, S) ES ← enc(pk, S) txid ← Hash(tx) H^Etx ← Hash(Etx) H^E _S ← Hash(E_S) π_tx = PROVE(PK, (tx_id, HE t_x, HE S), (E_tx, E_S, sk)) return π_tx In this, PROVE may be a cryptographic proof function π ← prove(PK, x, w), where a prover generates a cryptographic proof on a public instance x and a private (secret) witness w, with a proving key PK. In the example of FIG. 1, (A) may for example comprise performing one or more, or all operations described with the following pseudo-code: π = ZkPoK {(Witness : (E_tx, E_S, sk), Instance : (tx_id, HE t_x, HE S )) : predicate : tx ← decrypt(sk, E_tx) txid == Hash(tx) H^Etx == Hash(Etx) H^SE == Hash(ES) S ← dec(sk, E_S) ∀ S_i ∈ S, verify sign(k_i ^v, S_i), validate contract rules() } As illustrated with the example of FIG. 1, the recovery data storage system 4 can be kept agnostic about the content of the transition to be performed, and does not need to receive a plain text version of the transition data. Preferably, the recovery data storage system 4 does not dispose over a key required to decrypt the encrypted transition data {E_tx}. The key may e.g. be provided to systems other than the recovery data storage system and the transacting system. For example, the key may be a private key sk of a decrypting system. The encrypted data may in such a case be encrypted with a public key pk assigned to the transacting system 2, the public key being part of the public-private key pair pk-sk while the transacting system 2 does not dispose over the private key sk of the public-private key pair pk-sk. This ensures that the transacting system 2 itself cannot compromise a recovery decrypt, e.g. by decrypting the encrypted data and manipulating the decryption. The decrypting key may be a threshold decryption key distributed over a number n of systems in the network other than the transacting system 2, n being a positive integer and preferably at least 2, and preferably the decryption key may be not distributed to the recovery data storage system 4. This avoids a single point of failure, e.g. a single decrypting system being compromised resulting in unauthorised access to the data.

As illustrated with (C) in FIG. 1, the generating and/or storing of the recovery data may be performed in parallel to operating the distributed database. For example, the operations represented by blocks 102,120,121 may be performed together, by performing a method as illustrated in FIG. 4. For example, transitions may be conditioned on storing the encrypted data in the recovery database, and a transition only be accepted as valid if evidence of storing the encrypted data is provided, such as a cryptographically message acknowledging storage issued and signed by the recovery data storage system 4. Alternatively or additionally, for example, the recovery data storage system may receive messages informing of a transition and determine whether or not corresponding encrypted data has been stored. In case the recovery data storage system determines there is no corresponding encrypted data, the system may then e.g. output a warning message to other systems in the database arrangement, for example.

As illustrated with the arrow from block 110 to block 102, in this example, after storing the recovery data the recovery data storage system 4 may transmit an attesting message Storage confirmation (SC) to the transacting system 2 attesting of storage of the recovery data. The attesting message may for example comprise a cryptographic signature S_E of the data stored issued by the recovery data storage system 4.

In FIG. 1, for instance, as illustrated with block 102, the transacting node 2 may for example use the storage confirmation message SC from the recovery data storage system 4 as evidence of storage and in response to receiving the storage confirmation message Sc proceed with the transition {tx} by transmitting a transaction message which evidences of the storage confirmation message Sc to the systems executing the consensus protocol, as illustrated by the arrow from block 102 to block 120. In response to the transaction message, as illustrated with block 120, the transaction is verified against the rules of the consensus protocol and if this verification yields that the transaction complies with the rules and requirements of the consensus protocol the distributed database is updated, as illustrated with block 121.

Alternatively, for example, the transacting node 2 may have transmitted a transaction prior to, or together with sending the encrypted data. In the example of FIG. 1, for example the operations illustrated with blocks 101 and 102 may be performed prior to sending the messages, and the messages be transmitted to a system which performs both the storing of the encrypted data and the verification of the transaction. The operations illustrated with blocks 110 and 120 of (B) and (C) may for example be performed by the same system. In such an example, the transacting system 2 creates a transition and transmits a message containing information about the transition. The receiving system 4 then judges the validity of a transition based on a predetermined set of criteria and the information. For example, this combined recovery data storage and transaction validating system may perform one or more, or all operations described with the following pseudo-code: { validate(tx) if result = Sv then Store(Etx) else error } In the above pseudo-code, the commands “validate” and “result” may be procedures as described with the following pseudo-code: procedure Validate validator check (tx) if tx = valid then SV ← sign(tx) Result = SV else Result = error return Result procedure Store(Etx) VERIFY proof(VK, (tx E E id, H tx, H S), πtx) if proof (πtx) = valid then store (πtx, Etx, ES) S_S ← sign(π_tx|E_tx|E_S) Result == S_S else Result = error return Result In this code, “VERIFY” may be a cryptographic veriﬁcation function b ← verify(VK, x, π) where a veriﬁer veriﬁes the proof π using the veriﬁcation key(s) VK complementary to the proving key PK and by executing a cryptographic veriﬁcation function b complementary to the proving function ZKPoK used by the transacting node 2 with the instance {x} as input. The outcome b is either true, i.e. the proof is correct, or false, i.e. the proof is not correct. Alternatively, for example the procedures “Validate” and the operation “Store” may be performed separately, and e.g. the other participating systems accept the transition only after receiving the outcome of the procedure "Store" and the outcome of the procedure "Validate". These outcomes may e.g. be sent directly by the systems executing the procedures or be sent first to the transacting system 2, which then transmits the outcomes to the other systems.

The operating of the distributed database may e.g. be performed as described below with reference to FIGs. 4-6. In the example of FIGs. 4-6, at least a part of the consensus protocol is executed by validating systems 3 not participating in the transition of the transaction, and more specifically once a transition has been authorized by the other systems involved in the transition, a message is sent to the validating system 3 to validate the transition. In case the validating systems validates the transition, a message is sent and the participating node storing the data records modified by the intended transition update the records according to the approved transition. Like the recovery data storage system, this validating system may be kept agnostic about the content of the transition to be performed, and may perform a verification method in accordance with a zeroknowledge protocol to obtain a mathematically sound proof whether or not the intended transition meets the rules and constraints of the consensus protocol.

However, it will be apparent that the distributed database may e.g. be operated in another manner. For example, another consensus protocol may be executed. In the consensus protocol participating devices may e.g. verify whether the transmitting node has obtained approval of a specified (sub)set of the participating devices before the transition is accepted, and the participating devices only accept the transition if i) the required approvals are obtained and ii) the transmitting node transmits a message with evidence of the obtained approvals to the participating devices storing the respective data records, e.g. by a broadcasting messaging service. In such a case, the participating devices in the specified set may receive the data about the transaction {tx}, whereas the other participating devices are kept agnostic about the content of transaction {tx}, or alternatively of the entire transaction {tx}. Said differently, the devices in the specified set may run a private and/or confidential subnetwork. An example of such a distributed database arrangement is known as "Hyperledger Fabric", in which two or more network member can run a private sub database, referred to as "channel" in Hyperledger Fabric terminology. In such a case, for example, as a separate, additional requirement, approval of a node storing the back-up may be used as additional requirement before the participating nodes accept the transition, or the storing device can be included in the specified set and provide its approval only after verifying the encrypted data as corresponding to the transition.

In another example, the network and nodes used to perform the methods illustrated may for example be implemented as a public, non-permissioned or permissioned, DLT with a separate distributed data storage, such a distributed data storage distributed as open source software under the name Swarm (as described in "Hartman, John H., Ian Murdock, and Tammo Spalink. "The Swarm scalable storage system" Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No. 99CB37003). IEEE, 1999) or Filecoin (as described in Benet, J., and N. Greco. "Filecoin: A decentralized storage network" Protoc. Labs (2018): 1-36). In such a case, the encrypted data is first sent to the distributed data storage service and a storage receipt is received that evidences that content with a certain hash was stored. The nodes participating in the transaction then create a public transaction that contains proof of validity of the transaction represented by the stored encrypted data, the storage receipt and proof that encryption of the data hashes to the hash value in the storage receipt from the distributed data storage service. The public transaction may in addition contain e.g. metadata. This allows to offload computationally intensive operations to the participating nodes, making the network more scalable and shrinking the overall size of the public blockchain.

FIG. 2 schematically illustrates an example of a method of recovering records a database. The execution of the method may be triggered by determining that integrity of the distributed database may comprised. For example, this may be determined as illustrated in FIG. 2 (A). In this example a single data record X is changed, and the change comprises or consists of a modification of the node assigned as owner of the record. As shown with the blocks in FIG.2(A) "Move" the data record (and hence the distributed database) transitions from state SA to state S_B, from state S_B to state Sc etc.. The index A,B, ... indicates here which node "owns" the record and is authorized to initiate a modification of the record in the respective state (which if, and after, passing the consensus protocol will then be adjusted accordingly by the nodes storing the record). As illustrated, the data records transition from the initial state A via an uninterrupted chain of state-transitions up to a last uncompromised state S_D. A transition, e.g. a transaction, may be more complex and e.g. be a transition of a single record, and hence in the recovery database the encrypted transition data be stored per transition of a single record, or a transition may be of multiple records. Also, a transition may modify values of fields of different records, create new or delete records and/or fields, etc..

After the last uncompromised state S_D the chain is broken in FIG. 2(A). As illustrated with "Move S_E S_F", the record is at some point moved by node E to node F, without the record being transferred by node D to the node E, e.g. because the node E has breached security and provides the data required to be deemed owner when the consensus protocol is executed. Said differently, there is no link "Move" between state S_D and state S_E. When node D then initiates a transaction to move the record from state S_D to state SG this transition will be refused because nodes D and F cannot both own the data record. The integrity of the database is then compromised, since without further information, the other nodes do not know whether the record should be in state S_D or state S_F and hence which of node D and node F had breached the security. Said differently, the state of the record is ambiguous.

Upon detecting this ambiguity, one or more of the nodes A ... F may send a request for recovery of the distributed database. If this request is accepted a method of recovering one or more records of the distributed database is started and performed, as illustrated in FIG. 2. The request may also be sent to the other participating nodes, in order to inform the participating nodes of the breach and/or to obtain approval for the recovery. Prior to starting the recovery method, the request may be verified to meet one or more predetermined requirements, such as being sent by a predetermined number of participant nodes 2 or being send by a transaction validating node. This allow to reduce the risk of a denial of service attack by a node repeatedly sending such request.

In the shown example, as illustrated with block 600, a participating node 2 sends a request to the, one or more, decrypting nodes 5, which are communicatively connected to the recovery data storage system 4 with the recovery data memory 44 in which the recovery data of the distributed database is stored. As mentioned, the recovery data is stored in a recovery database which separate from the distributed database, and preferably which is on a different physical device(s). This recovery data may e.g. be stored there by performing a method as illustrated in FIG. 1, and comprises encrypted transition data { E_tx} representing a cyphertext version of transition data {tx} containing information about a transition of one, or more than one, data record of the distributed database from a first state to a second state. The second state can be a current state of the data record or a preceding state preceding the current state.

Preferably, in the recovery data memory 44 encrypted transition data {E_tx} is stored for a, branched or unbranched, chain of transitions from a starting state of the chain up to an end state of the chain, e.g. from an initial state of the distributed database up to the current state. The chain may be unbroken or broken, with for example a unbroken chain-part from the initial state up to a the most current, uncompromised state, and a chain part from a first compromised state to the current state, and a missing link between the most current, uncompromised state and the first compromised state. In the missing link, one or more intermediate states and transition may be missing, such as in the example of FIG. 2(A) the transition from state S_D to state S_E and state S_E.

In this example, states of the distributed database for which no encrypted data is stored in the recovery database are deemed compromised states. E.g. in the example in FIG. 2 (A) state S_E will be deemed compromised, since the node D will not have sent the corresponding encrypted transition data {Etx} for the transition from state SD to state SE to the recovery data storage system 4. Referring to FIG.2 (B), as illustrated with block 610, if the decrypting nodes 5 decide to accept the request, a method of recovering one, or more than one, record of the distributed database from a current state to a past state is performed. If a malicious party somehow achieved to compromise the distributed database, e.g. by generating fake signatures on transactions or otherwise, this can be revealed by performing the recovery. Once the faulty transaction(s) is/are detected, for example the transactions following thereafter are can be discarded, and the distributed database can be maintained in the state resulting from the last valid transaction, for example. In such a case, starting from the last valid transaction, the nodes involved in the discarded transactions can repeat them (with the exclusion of the faulty transaction(s) of course), and record them again in the recovery database. This reduces the risk that the integrity of the restored database remains compromised due to transactions being derived from a malicious transaction. Also, counter measures can be taken to prevent further breaches, such excluding some or all of the nodes involved the faulty transactions from the network. In this example, as illustrated with block 611 the decrypting node 5 starts with a first state Sn to be recovered by decrypting, as illustrated with block 612, the corresponding encrypted data, verifies the state as illustrated with blocks 613-614, and repeats recovery state-by-state until a compromised state is found. The decrypting nodes then recover a state preceding, e.g. the directly preceding state, the compromised state and the data records of the distributed database can be reinstated to that state. Depending on the specific implementation, the method may be used to restore in a reverse direction (from the current state Sn backwards, towards the initial state S1 of the distributed dataset) all records from the current state Sn to a last uncompromised state Sn-p. E.g. in the example of FIG.2(A) the last uncompromised state is state D, whereas the current state is state F. In this respect, only the transitions back to the last uncompromised state may be recovered, or alternatively all transitions from the current state back to the initial state, or another state preceding the last uncompromised state, e.g. in order to perform a security analysis to detect any further breaches. For example, all transitions from the current state back to the initial state may be recovered, and the distributed database be reconstructed from the initial state on to a more current state, such as up to the last uncompromised state. The recovery may comprise, as illustrated with blocks 611,620, retrieving recovery data by the decrypting node(s) from the recovery data memory 44. As illustrated, the decrypting node 5 may send a message requesting encrypted transition data {E_tx}, {E_S} corresponding to the transition to be recovered from the recovery data memory. In response thereto, the requested encrypted data may be sent by the recovery data storage system 4. In this respect, the system 4 may e.g. be protected against unauthorized retrieval or unauthorized decrypting. For example, the decrypting key sk may be a distributed decryption key, distributed over a number n, n being an integer equal or larger than 2, of decrypting nodes and the decryption only being possible if a threshold number m of the decrypting nodes has requested or collaborates in the decryption, for instance. The recovery data {Etx}, {ES} requested and/or retrieved may comprise encrypted transition data {Etx}, and optionally other data, such as encrypted validation data {Es}, and optionally further data, such as a transaction identifier or other data stored coupled to the recovery data. As illustrated with block 612, the encrypted validation data {ES} may be decrypted to obtain a decrypted version of the validation data. The decrypted validation data {E_S} may then be used to determine whether or not the transition {tx} was a valid transition, e.g. whether the transition {tx} was authorized by the required participating nodes, for example as evidenced by cryptographic signatures {S} thereof in the decrypted validation data. In this example, the decrypting nodes 5 receive encrypted signature data {ES} and decrypt the encrypted signature data {ES}. As illustrated with block 612, the decrypting nodes or another node may then verify from the decrypted encrypted signature data, i.e. the recovered signature data {S}, whether the transition was at the time validated in accordance with the rules of the consensus protocol. The validity of the transition can thus be determined. As further shown in FIG.2 (B) with block 614, after retrieving the encrypted transition data {Etx}, the decrypting nodes may decrypt the encrypted transition data {Etx}. The decrypted transition data {tx} may then be used to determine the content of the transition and restore the database in the reverse direction, i.e. to the state from which the transition started. In addition, the decrypted transition data may be verified to be compromised or not. In the shown example, the encrypted transition data {E_tx} is decrypted if the decrypted encrypted validation data {E_S} has been verified and the transition {tx} has been determined to be a valid transition. As shown with block 615, the method may comprise determining from the recovery data a faulty transition in one, or more than one, record of the distributed database. For example after decrypting the data, a test can be performed to determine whether or not the transition {tx} is faulty or not and/or the state Sn are compromised or not, i.e. whether or not they are valid and for example whether they met the requirements of the consensus protocol. In case the transition is determined to be faulty, e.g. because some signatures {S} are missing or invalid, the state S_n in which the transition S_n-1 -> S_n resulted is deemed a compromised state, and the transition {tx_n} a faulty transition. In case a state is determined to be compromised, the state Sn-1 directly preceding the compromised transition can then be deemed a last uncompromised state, and the distributed database be restored to this state, and accordingly the last valid transition has been the directly preceding transition {txn-1}. Alternatively, for example, the method described may be performed on the directly preceding transition {tx_n-1} to determine may whether or not this was a valid transition and accordingly whether the state S_n-1 is an uncompromised state. In this example, the method is repeated with the directly preceding transition {txn-1} only in case the transition is determined to be valid or not faulty, then as illustrated with the arrow. The distributed database can then be restored from the current state backwards to the uncompromised state. This allows to reduce the computational load since for only a part of transaction the encrypted data has to be retrieved. However, alternatively, the method may be repeated for each transition up to the initial state and the distributed database be restored from the initial state on, i.e. in the forward direction. This allows to reduce the risk that the restored database is compromised as well, since each transition between the initial state and the state to which the distributed database is restored can be verified for breaches and all faulty transitions be detected. The method may further comprise determining from the recovery data a system from which the faulty transition originated and excluding the system from the network. In the shown example, for instance, assuming that the transition SE-SF was stored in the recovery database, when performing the method of FIG.2 (B), the chain of recovered transactions will be broken, since there is no transition S_D-S_E stored in the recovery data memory. Hence, the chain of recovered transition will be interrupted there. Accordingly, the decrypting nodes may determine that the sub-chain from the current state SF back to point of interruption (state SE) is invalid, and that the database has to be reinstated at state SD. In addition, since the state SE is compromised, the node E may be deemed to be the origin and be excluded from the network. Referring to FIG. 3, an example of a network 1 operating a distributed database is shown therein. The database can e.g. be a relational database. The database can e.g. represent an unused state transition output data model where each record of the database represents a single current, also referred to as un-used, state of an object only. When the current state is used by a state transition, a new state of the object is generated as determined by predetermined rules and constraints which define the state transition. The current state then becomes a used, also referred to as past or historic, state. Hence, after a state transition the current state, which was used as input for the transition, is not available anymore for use as input for another state transition. The network 1 comprises first nodes 2,2’,2”, second nodes 3, a recovery data storage system 4 and, in this example one but there may as explained above be two or more, decrypting node 5. The nodes 2-5 are connected to each other via suitable network connections 6. The network connections can e.g. use publicly available network infrastructure, such as the internet, or private network infrastructure, such as a private IP network. Each of the nodes comprises an active electronic device that is attached (or attachable) to the network connection, and when attached is capable of creating, receiving, or transmitting data over a communications channel, not shown, of the network 1. As shown, each of the nodes 2-5 comprises integrated circuit processor 21,31,41,51 and a memory unit 23,34,44,54, as well as a network interface 20,30,40,50. The recovery data storage system 4 further comprises a memory in which the recovery data 42 is stored. In the memory units 23,34,44,54 respective software is stored, which the integrated circuit processor 21,31,41,51 can execute.

In a database memory 22 of the first nodes 2, 2', 2", data records of the distributed database are stored, which are maintained and synchronized between the nodes 2, 2', 2". In this example, the first nodes are communicatively connected to exchange messages about state transitions to the data records of the distributed database to synchronize corresponding data records between them.

In this respect, each first node may have a set of data records. The sets of the different first node may be the same or may be different and a data record may be present in multiple versions, i.e. different nodes having a version, or may be present in a single version only. In this example, each first node only stores a subset of the database in the database memory. For example, the network 1 may be preconfigured such that each first node 2, 2', 2" only stores a predetermined subset of the set of data records. As a result, each first node has the part of the contents of the database corresponding to the subset. For instance, for each of the first nodes 2, 2', 2" the subset may be smaller than the set, in which case no database memory contains the full database.

For example, the predetermined subset of records may be the records for which a first node is pre-set, or requested, to perform operations or phrased more colloquially: each first node stores in its database memory a subset of database records on a "need to know" basis, where the "need to know" is defined as being preconfigured or requested to perform an operation involving a record. More specifically, in this example, each first node stores in the database memory only records the first node is:

■ authorized to modify, e.g. to apply a state transition to; and/or

■ is listed as an approver for state transitions initiated by another first node.

In this example, the network is configured such that each modification to a record of the database has to be send and approved by the second node 3. If a malicious party somehow achieves to generate fake proofs and convince the second node 3 of the validity of a modification, e.g. on a a data record that does not belong to that party, this can be revealed by performing the recovery as described above with reference to FIG. 2. Once the faulty transaction is detected, the subsequent transactions are discarded, and the database can be maintained in the state resulting from the last valid transaction. The second node 3 comprises a register memory 32 in which data is stored which identifies whether or not a current state of a data record is allowed to be modified or not.

In this example, the network 1 has multiple first nodes but the invention is not limited to such a network 1, and likewise applies to a network 1 with a single first node 2.

A node 2-5 may have a network address, such as an IP address, which allows other nodes in the network to address the node and send data to that address. In this example, the network 1 is a private network and each node has a private address, which within the network is unique and, preferably, is not visible by nodes outside the network. However, alternatively, the network 1 may uses public addresses and e.g. the nodes be connected via public data communication network infrastructure, such as the Internet, to each other, and network membership and communication rights be defined a higher OSI level than the networking layer or higher than the transport layer of the infrastructure. The network can e.g. be a peer-to-peer network.

Preferably but not necessarily, a node can only communicate with other nodes in the network when authorized to join the network by an access control node, also referred to as a "gatekeeper" node or a "doorman" node. The access control node may e.g. sign a public key certificate, and a node can only communicate with other nodes in the network with a signed certificate obtained from the gatekeeper node. In such a case, access to the network is limited to the nodes meeting the requirements set by the access controlling node for issuing the certificate. The network may for instance comprise a, not shown in this example, central registry in which certificates and identities are stored. A node can then be added to the network by sending a request for a certificate to the central registry and receiving a response from the central registry with a signed certificate assigned to the node. For example, the central registry may store digital certificates and identities of the nodes in the network. The central registry may be part of a node which assigns in response to receiving a request from a joining node a certificate to the node permissioned to the network upon reception of a public key, and the nodes may use the certificate. However, other techniques for controlling access to a network may be used instead. For instance, the network may be a private IP-based network and the access control node assign a private IP address to the first node, just to give an example.

The first node 2 shown in FIG. 3 comprises a network interface 20 which connects the node to the network. An integrated circuit processor 21 is arranged to run state transition software to manipulate data records of the distributed database. The integrated circuit processor may e.g. be a general-purpose processor. However, alternatively, the integrated circuit processor 21 may be an application specific processor, or a set of several processors. The processor may e.g. be one or more general-purpose processors as commercially available from Intel Corporation of Santa Clara, California, United States of America under the name "Intel® Xeon® Scalable".

At least some records of the database are stored in a database memory 22 connected to the integrated circuit processor 21. The node 2 further comprises a software memory 23 connected to the integrated circuit processor 21 in which the software is stored as instructions executable by the integrated circuit processor. The software memory 23 may for example be a non-transitory, tangible memory such as a non-volatile memory, e.g. a hard-disk or a read-only memory or a volatile memory, such as ready-access memory or processor cache memory, and it will be apparent that the software memory 23 may comprise different memories (e.g. hard-drive, RAM and cache) used by the processor in the execution of the software.

The first node 2 further comprises a filter 24 connected to the database memory 22, which when operating, filters predetermined content data out of a data record of the database, and provides filtered data identifying a current state of the filtered data record.

A set of one or more, predetermined and pre-set, state machines 25 is also present in the first node 2. The state machine 25 has one or more state inputs 250 and one or more state outputs 251. The state machine generates, when the state inputs 250 are filled with input values, e.g. with the values of the fields of a data record, output values presented at state outputs 251 which represent a new state for a data record, if and when the rules and constraints of the state machine are met and as defined by the input-output transfer function of the state machine. The state machine may for example require additional data to generate a new state, for example one or more digital cryptographic signatures from nodes, or users of nodes, linked to one or more fields of the data record, or data from other data records, for example.

The state machine may for example be implemented as program code defining a set of instructions executable by the integrated circuit processor 21. The state machine 25 may be any suitable type of state machine, and for example be formed by a code executable by the integrated circuit processor in which the rules and constraints are expressed, for example as a finite state machine such as a virtual arithmetic circuit or TinyRAM instructions. In this respect the term "arithmetic circuit" refers to code defining in software a virtual circuit of gates and connections between the gates where the gates, instead of Boolean operations, perform addition or multiplication operations on the input of the gates. Reference is made to Ben-Sasson, Eli, et al. "TinyRAM architecture specification, vO. 991." (2013), incorporated herein by reference, for a description of TinyRam instructions. Preferably, the state machines, and more preferably the set of state machines, is predetermined and preconfigured and are not modifiable once the network has been put in operation. The state machine 25 may define the rules and constraints of a simple state transition, such as a simple operation involving a single current state, which transitions the object from the current state to a single new state. However, the operation may be more complex, and e.g. also use other current states, e.g. transition other objects from a current state to a new state. For example, multiple current states may transition into a single new state. Likewise, a state transition may create new states that may be descendants of used states (e.g. copies with changed access rights to modify the record of the used state) or they may be unrelated new states, e.g. newly created records created by the state transition.

The integrated circuit processor 21 is connected to the database memory 22, to (connections not shown in FIG. 3) all the state machines 25 of the set, and to the software memory 23 to retrieve and execute the instructions therein, as is explained below with reference to FIGs. 4-6. Depending on the specific implementation, in a distributed database system each computing device stores a complete copy of all records, or only a part of the records, or even only a partial version of some of all records stored in the computing device.

FIG. 3 further shows a second node 3. In the shown example, for sake of simplicity, the network is illustrated with only a single second node, however it will be apparent that the network may comprise two or more second nodes. In such a case, each or several second nodes may perform a verification process in response to receiving a verification request message from a transmitting first node 2 and the second nodes 3 may be provided with software which, when executed by the integrated circuit processors thereon, causes the second nodes to synchronize the verification. Alternatively, the verification request message may be sent to one selected second node selected by the transmitting first node 2, e.g. selected out of the multiple second nodes based on the predetermined rules coupled to the data record to be changed or coupled to the selected state machine.

The second node 3 shown in FIG. 3 comprises a network interface 30 which connects the second node 3 to the network 1. The second node 3 comprises an integrated circuit processor 31 arranged to run verification software to verify the intended state transition against a predetermined set of one or more verification rule defined by a verification state machine. The integrated circuit processor may e.g. be a general-purpose processor. However, alternatively, the integrated circuit processor 21 may be an application specific processor, or a set of several processor. The processor may e.g. be one or more general-purpose processors as commercially available from Intel Corporation of Santa Clara, California, United States of America under the name Intel® Xeon® Scalable. The second node 3 further comprises a software memory 34 in which the software is stored as instructions executable by the integrated circuit processor. The software memory 34 may for example be a non-transitory, tangible memory such as a non-volatile memory, e.g. a hard-disk or a read-only memory or a volatile memory, such as ready-access memory or processor cache memory, and it will be apparent that the software memory 34 may comprise different memories (e.g. harddrive, RAM and cache) used by the processor in the execution of the software.

The second node 3 has a register memory 32 in which a register is stored with data identifying whether or not current states of the data records are allowed to be modified or not. The register memory 32 can, like the software memory 34, be any suitable type of memory. More specifically in this example, the second node 3 is arranged to verify whether or not the current state has been modified already. To that end, for instance, in the register memory 32 a database may be stored with a transition history, such as at least identifiers for at least each of the used states directly preceding the current states. In such a case, the second node 3 can verify whether an intended state transition pertains to a current state, i.e. to a record that is not marked as used in the register memory 32, or not.

However, various other alternative types of such data are possible as well. For example, the register memory 32 may contain at least identifiers for each of the current states. In such a case the second node may compare an intended state transition pertains to a current state on the list, and if not reject the intended state transition.

Also, for example, the register memory 32 may contain identifiers for used states further back. For example, the register memory 32 may contain identifiers of all past states preceding the current state. In such a case, the second node can verify that the intended state transition does not pertain to a state with an identifier in the register memory. In addition, in such a case the second node 3 can verify whether or not a current state is part of an uninterrupted chain of state transitions from the state in the initial database to the current state.

The second node 3 further comprises a set of verification state machines 33. The verification state machines 33 may be any suitable type of state machines such as described above for the state machines 25 of the first node.

Each state machine 25 of the first node 2 has an equal in a verification state machine 33 of the second node 3. The set of verification state machines thus comprises, or consists of, equals for all the state machines of the set of the first node. In this respect, the term "equal" refers to the state machine generating the same output state in response to the same input state and having the same rules and constraints. Equal state machines may for example be generated by compiling the same human readable code into a suitable object code. In case the network comprises two or more first nodes, the set of verification state machines of second node may comprise an equal verification state machine for each of the state machines of all the first nodes.

The integrated circuit processor 31 is connected to the register memory 32 and to (connections not shown in FIG. 3) all the state machines 33 of the set and is further connected to the software memory 34 to retrieve and execute the instructions of the verification software therein, as is explained below with reference to FIG. 4 and 6.

The integrated circuit processor 41 of the recovery data storage system 4 is arranged to run storage software which e.g. can perform the method illustrated in FIG. 1(B) and the method illustrated in FIG.2 with block 620. In the memory 44 of the recovery data storage system 4 corresponding software is stored as instructions executable by the integrated circuit processor 41.

The integrated circuit processor 51 of the decryption node 5 is arranged to run software which e.g. can perform the method illustrated in FIG. 2 with blocks 610-616. In the memory 54 of the decryption node 5 corresponding software is stored as instructions executable by the integrated circuit processor 51.

The example shown in FIG. 3 can be used in a method of operating a distributed database, such as illustrated with the flowchart in FIG. 4, which can be used to perform the operations represented by blocks 100, 102, 120 and 121 in FIG. 1. For simplicity, in FIG. 4 the generation and storage of the recovery data is omitted, and it is assumed that after verification by the second node 3, the first node 2 performs the storage operation represented by block 101 and presents together with the verification of by the second node 3 the proof of storage from the recovery data storing node 4 in block 102 to the Consensus Service . The method of operating a distributed database illustrated therein is performed with a data communication network 1 comprising one or more, transmitting, first nodes 2 and one or more second nodes 3, e.g. as implemented in accordance with the example of FIG. 3. As illustrated with block 40 in FIG. 4, the method comprises the integrated circuit processor 21 of the first node 2 running the state transition software.

The running 40 comprises, as illustrated with block 401, inputting a selected data record into a selected state machine selected out of the set of state machines and thereby as illustrated with block 402, generating an intended state transition of the selected data record from a current state of the selected data record to a new state of the selected data record in accordance with rules and constraints of the selected state machine.

As illustrated with block 403, the integrated circuit processor of the first node 2 passes the data record in the current state through the filter 24 to obtain the state identification data representing an identifier unique for the current state of the selected data record, with a content of the fields of the selected data record, at least partially or completely, filtered out of the identifier by the filter. The output of the filter 24 may be contentless data from the record, such as e.g. identifiers of the current state(s), an identifier of the intended state transition and/or a hash value obtained from the content, just to name some examples.

As illustrated with block 404, the integrated circuit processor 21 of the first node 2 further calculates verification data with a zero-knowledge verification function of which the input variables comprise the current state and the new state. In this example, for instance, the input variables comprise the full intended state transition, i.e. current states, new states, operands, etc.. The zeroknowledge verification function generates with these input variables one or more output values which are used, or at least usable, by the second node to verify whether or not the intended state transition meets the rules and constraints of the selected state machine, without the first node conveying any information about the intended state transition

As illustrated with block 405, the transmitting first node 2 then transmits to the second node a verification request message without the input variables. In this example, the verification request message contains: the state identification data, state machine identification data representing an identification of the selected state machine, and the verification data. Said differently, the verification request message does not contain the content of the intended state transition, but only data allowing verification of validity of the intended state transition, such as:

1. compliance of the state transition with the rules and constraints of the selected state machine; and

2. uniqueness of the intended state transition, i.e. that the current state identified in the verification request message has not already been changed by another, hence conflicting, verification request message.

The verification request message may contain various other data required for the verification by the second node but does not contain the content of the current state or the new state. For example, the verification request message may contain a cryptographic signature of the first node, e.g. a digital certificate signed with the private key of the transmitting node for example, to enable the second node to verify authenticity of the verification request message for instance.

As illustrated with block 50, in this example, the verification request message is received at the second node 3. Thus, the information about the state machine involved is shared between the second node and the transmitting first node, in the sense that the second node receives information identifying the verification state machine. Accordingly, the type of transition is known but the second node does not receive the content of the current state or the, intended, new state of the data record. In case of a breach of security of the second node, e.g. by unauthorized external access thereto, the actual contents of the distributed database can therefore not be obtained, and the impact of the breach can be reduced. In this respect, the term "security of the second node" refers to the protection of data in the register memory or other memories of the second node against unauthorized access.

Thus, the protection of the contents of the database is less dependent on the protection and strength of encrypting data on the second node 2. It will be apparent though that the second node may be, and preferably is, secured and protected against unauthorized access, e.g. to avoid tampering with the verification software running on the second node.

In response to receiving the verification request message, the integrated circuit processor in the second node will run the verification software, as illustrated with block 50, to verify the intended state transition for compliance with the rules and constraints of the selected state machine.

In this example, the running comprises, as illustrated with block 501, determining the verification state machine equal to the selected state machine using the state machine identification data. The integrated circuit processor further verifies with the verification data whether or not the intended state transition meets rules and constraints of the verification state machine. For example, the integrated circuit processor may, as is explained below in more detail, use the verification data as public input to a verification method in accordance with a zeroknowledge protocol to obtain a mathematically sound proof whether or not the intended state transition meets the rules and constraints of the verification state machine. Such a zero-knowledge protocol can e.g. be a non-interactive protocol in which except for the verification request message, the first node and the second node do not exchange data for the verification. The running may further comprise verifying whether or not the current states of the intended state transition have not been used already, i.e. that the current states identified in the verification request message are actually current states and not past or used states. To that end, the state identification data in the verification request message may be compared with the data in the registry memory 32.

As illustrated with block 502 if the state transition meets the rules and constraints of the equal verification state machine, the integrated circuit processor approves the intended state transition and, as illustrated with block 503, sends a confirmation message to the transmitting first node 2. Depending on the specific implementation, if the state transition does not meet the rules and constraints, the integrated circuit processor 30 may output via the interface 30, a reject message to the transmitting first node 2 (and optionally to other nodes in the network 1), as is illustrated with blocks 504 and 505. Alternatively, the second node may do nothing, and the first nodes e.g. be implemented to cancel the state transition process after a predetermined time-out period has expired from the transmission of the verification request message. If and when the confirmation message is outputted, the method further comprises reception by the first node 2 of the confirmation message. As is illustrated with block 406, the first node 2 accepts the intended state transition in response to the reception and adjusts the data record according to the intended state transition. As illustrated with block 407, the first node may cancel the state transition process if the confirmation message is not received, e.g. when the reject message is received instead.

A method as illustrated in FIG. 4 may be performed by execution of state transition software by one or more first nodes 2 and of verification software by one or more second nodes 3. Referring to FIG. 5, an example of an architecture of the state transition software and of the verification software is illustrated.

As shown, the state transition software comprises instructions 200-203 executable by the integrated circuit processor 2.

Input instructions 200 cause, when executed, inputting of a selected data record into a selected state machine 25 and thereby generate an intended state transition of the selected data record from a current state of the selected data record to a new state of the selected data record in accordance with rules and constraints of the selected state machine 25.

For example, execution of the input instructions 200 may be triggered by inputting at a userinterface of data by a human operator of the node. In FIG. 5, for example the input instructions 200 are shown to receive input from the user interface Ul. Alternatively, or additionally, the execution may be triggered by receiving e.g. at the interface 20 data triggering the execution, such as data generated by a non-human operator, e.g. a machine. For example, in case data records are linked and several nodes are authorized to change data records, a change to a linked data record may trigger execution. Likewise, as another example, in case the distributed database is used to track a product during manufacturing, a machine may send data indicating that a manufacturing step has been completed and that the semi-finished product is handed over to another machine, and accordingly that the status and location field have to be updated, or a machine may send barcode scan data acknowledging that a semi-finished product has been received and thus that a location has to be updated.

The operator may for example provide input, such as an identification of the database record, the type of transition and values of at least some fields of the record in the desired new state. The integrated circuit processor 21 executes the input instructions 200 in response to the operator input. The instructions 200 may for example be to fetch the values of the corresponding data record and input them into the selected state machine. The input in the selected state machine generates an intended state transition of the selected data record from a current state of the selected data record to a new state of the selected data record in accordance with rules and constraints of the selected state machine. In case the input is not sufficient to meet those constraints, the integrated circuit processor 21 may e.g. execute instructions to obtain the missing input, e.g. such as electronic signatures of nodes set by the rules and constraints as required approvers of the state transition, or additional data from other data records, just to name some examples. In response to receiving the additional input, and once all the input required for the selected state machine is received, the input instructions 200 may then cause output of the intended state transition to other instructions of the software with which the input instructions 200 interface.

Input instructions 200 may comprise instruction to perform operations as described with references to FIG. 1 for blocks 100, 101 and 102.

Filter instructions 201 cause, when executed, to pass the data record in the current state through the filter 24 to obtain state identification data representing an identifier of the current state of the selected data record. Execution of the filter instructions 201 may e.g. be triggered by the same event as execution of the input instructions 200, and may be performed, before, in parallel with, or after execution of the input instructions 200.

Verification data calculating instructions 202 may interface with the input instructions 200 to receive the intended state transition and cause, when executed, to calculate verification data with the zero knowledge verification function described above with reference to block 403 in FIG. 4. An example of such a calculation is explained below in more detail with reference to FIG. 6.

Transmission instructions 203 may interface with input instructions 200, the filter instructions 201 and the verification data calculating instructions 202, to receive the output of those instructions as input. Transmission instructions 203 cause, when executed, transmission to the second node 3 of the verification request message and/or transmission of recovery data (as indicated with arrow ETx,PoE,TxlD in FIG. 1) to the recovery data storage system 4 to be stored in the recovery data memory 44. Execution of the transmission instructions 203 may e.g. be triggered by the completion of input instructions 200, filter instructions 201 and verification data calculating instructions 202. The transmission instructions 203 may cause e.g. the opening of a data channel with the second node or the recovery data storage system 4, and transmitting a message, e.g. in accordance with a pre-established protocol such as the Advanced Message Queuing Protocol defined in ISO/IEC 19464:2014. The transmission instructions 203 may further comprise a subset of instructions which causes assembling of the verification request message with the state identification data, the verification data and the state machine identification data, and, optionally, any other data not providing information about the contents of the intended state transition. Acceptance instructions 204 are connected to the network interface 20 and cause, when executed, in response to receiving a confirmation message from the second node at the network interface 20, acceptance of the intended state transition and adjusting in the memory the data record according to the accepted state transition, or else rejection of the intended state transition and maintain the data record unchanged.

The verification software comprises instructions executable by the integrated circuit processor. These instructions comprise initiating instructions 301 coupled, as shown, to the network interface 30 to run the verification software in response to receiving at the second node a verification request message.

The initiating instructions may further, for example, determine the uniqueness of the current state(s) subject of the intended state transition, as was explained above with reference to block 501 of FIG. 4. To that end, initiating instructions 301 may, as shown, interface with the registry memory 301.

State machine selection instructions 302 interface the initiating instructions 301 and cause, when executed, a selection of a verification state machine equal to the selected state machine using the state machine identification data. Execution of the selection instructions 302 may e.g. be triggered by a call from the initiating instructions 301.

Verification instructions 303 interface the selection instructions 302 and cause, when executed, verification with the verification data that the intended state transition meets rules and constraints of the verification state machine, for example as explained below.

Confirmation instructions 304 interface the verification instructions and the network interface 30. When executed, confirmation instructions 304 cause, if the state transition meets the rules of the equal verification state machine, output at the interface of a confirmation message to the first node. Execution can e.g. be triggered by the confirmation instructions 304 outputting an accept or a reject message.

In the example of a method described above, the transmitting first node 2 may prior to sending the verification request message to the second node 3, interact with other first nodes. For example, the transmitting first node may transmit an intended transition message to a recipient first node 2' when the rules and constraints of the selected state machine impose approval of the recipient first node. In the shown example, the network inhibits broadcasting messages and the intended transition message is only transmitted to selected recipient first nodes, selected by the transmitting first node based on the intended state transition. Referring to the example of FIG. 3, the network shown therein comprises for example two first nodes. In the interaction, a first node is the transmitting first node 2 described above, while another first node is a recipient first node 2'. The transmitting first node 2 and the recipient first node 2' may for example have corresponding data records stored in their respective database memory, i.e. the records are duplicated over the transmitting first node 2 and the recipient first node 2'. Additionally or alternatively, they may have complementary records, such as where for the intended state transition of a first record on a node 2 data from another, not corresponding, record on another node 2'is required.

For example, the rules and constrains of the selected state machine may require an approval of the recipient first node 2'. The state machine may e.g. have an input at which a signature value of the recipient first node 2' is to be inputted, such as a signature provided with a public-private key signing process, and without which the state machine does not generate a new state. The recipient first node 2' may then be implemented similar to the transmitting node described above, with the following addition. The software in the software memory 23 may include instructions to verify the intended state transition, intended by the transmitting first node 2, against a set of predetermined rules and constraints defined by a selected state machine selected from the set.

The method may then for example comprise a verification by the recipient first node of the intended state transition prior to the transmitting first node sending the verification request message at least to the second node 3. The recipient first node 2' may receive an intended transition message from the transmitting first node 2. The intended transition message informs the recipient node 2' of an intended state transition to a selected data record and indicates a selected state machine to be used. In response to receiving the change message, the integrated circuit processor of the recipient node 2' runs the software, i.e. executes the instructions, and verifies whether the intended state transition meets the rules and constraints of the selected state machine. If the state transition meets the rules and constraints: the integrated circuit processor 21 outputs an approval message to the transmitting first node and/or the second node.

In response to receiving the approval message from the recipient node 2', the nodes (e.g. transmitting first node 2 and the second node 3 in this example), continue the state transition and proceed with the method described above. For example, after receiving the approval message, the transmitting first node 2 may output the verification request message. Else, i.e. when the state transition does not meet the rules and constraints, the integrated circuit processor 21 outputs an error message to the transmitting first node and/or the second node. In response to the error message, the nodes receiving the error message stop the state transition and maintain the data record in its current state. Like the transmitting first node, the recipient node 2' may receive a confirmation message from the second node 3. In response to receiving the conformation message, the integrated circuit processor of the recipient first node 2'may then execute instructions to execute the intended state transition and adjust in the memory the data record according to the accepted state transition, provided that also proof of storage is received by the recipient node 2' - e.g. from the transmitting first node 2 or from the recovery data storage system 4. When the confirmation message is not received, e.g. when a time-out period has expired or when the reject message is received from the second node, the recipient first node 2'may reject the intended state transition and maintain the data record unchanged.

With reference to FIG. 6, the verification may generally be performed in any manner suitable for the second node 3 to verify, using the verification data, whether or not the intended state transition meets the rules and constraints of the equal verification state machine.

As illustrated in FIG. 6, the verification data can comprise a transition identification value (hash in FIG. 6) unique for the state transition. The transition identification value can for example be unique for the current state, and e.g. be a cryptographic hash value calculated from the current state. This allows to verify that the second node verifies the correct state transition. The transition identification value can for example be a vector commitment, such as a Merkle tree root hash value calculated from a Merkle tree of which one or more leaves are values of the current state. This allows in addition the second node to verify that the state transition is of the correct current state, and not of another current state. To enhance the correctness, the transition identification value can for example be calculated with more input parameters, such as the new state. For example, the Merkle tree may have more leaves and for example at least three leaves, the leaves comprising current state, new state, transition. In the latter case, for example, a string value derived from the code defining the transition may be used as input for the leave.

The one-way function can generally be any suitable one-way function, preferably with collision resistance, such as a trapdoor one-way function, a hash function or otherwise. Accordingly, the verification data may comprise any suitable values, provided that these are verifiable by an equal verification state machine. The verification data may for example comprise a zero-knowledge proof of a statement, as determined in accordance with a predetermined zero-knowledge protocol, such as a non-interactive protocol, for example a succinct non-interactive argument of knowledge protocol. Preferably, the zero knowledge verification function generates a proof which satisfies the requirement of completeness, that is if the statement is true, the honest verifier (that is, one following the protocol properly) will be convinced of this fact by the proof. The proof may further satisfy the requirement of soundness, that is if the statement is false, no cheating prover can convince the honest verifier that it is true, except with some small probability. Preferably, with the verification data therein, the verification request message provides zero-knowledge. That is, if the statement is true, no verifier learns from the verification request message anything else about the input variables than that the statement is true. Suitable zero-knowledge protocols are for example (the cited documents herein incorporated by reference): ■ Groth16 zkSNARKS - J. Groth. On the size of pairing-based non-interactive arguments. In EUROCRYPT, pages 305–326, 2016; ■ Bulletproofs - Bulletproofs: Short Proofs for Confidential Transactions and More, Published in: 2018 IEEE Symposium on Security and Privacy (SP); ■ STARKs - E. Ben-Sasson, I. Bentov, Y. Horesh, and M. Riabzev. Scalable, transparent, and post- quantum secure computational integrity. Cryptology ePrint Archive, Report 2018/046, 2018. https://eprint.iacr.org/2018/046; ■ SONIC - Mary Maller, Sean Bowe, Markulf Kohlweiss, and Sarah Meiklejohn.2019. Sonic: Zero- Knowledge SNARKs from Linear-Size Universal and Updatable Structured Reference Strings. In Proceedings of CCS’19, London, UK, November 2019, 20 pages. In particular, the zero knowledge verification function may be one as is included the C++ library named “libsnark” as currently available from https://github.com/scipr-lab. In a practical example, the state machine 25,33 is defined in human readable and legible program code, such as in C or C++ code, which is subsequently compiled in an arithmetic circuit or other computer executable code which can be used with such the libsnark library, to obtain a proofing machine 26 at the first node 2 and a verification machine 35 at the second node 3 which perform the proof generation and the verification respectively. As illustrated in FIG. 6, the verification data may for example comprise a zero-knowledge proof parameter Pr(x,W) (in which x represents a parameter shared in the verification request message with the second node, and W a secret parameter not shared with the second node) of a zero-knowledge protocol ZK that the state transition satisfies the rules and constraints of the selected state machine. The verification software may in such a case comprise instructions to verify the proof with the verification state machine in accordance with verification rules of the zero- knowledge protocol. For example, the zero-knowledge protocol may use the intended state transition as secret parameter W and the proof Pr(x,W) be that the intended state transition satisfies the rules and constraints of the selected state machine 25. The verification data may then include an identification of the state machine, such that the second node can determine the equal verification state machine and use this in the verification. In such a case, e.g. the identification of the state machine may be the public input parameter x of the zero knowledge protocol. Alternatively, the proof may be that, using the selected state machine to generate the proof, an intended state transition is obtained from which the root hash value can be calculated, and the verification use the equal verification state machine to verify the correctness of this statement. In this respect, the parameters x,W may be single values, e.g. be scalar values, but alternatively be composed of multiple values x1,x2,…, xn ; W1,W2,…, Wn and e.g. be vector or matrix values. The verification request message can for example comprise the transition identification value as public input parameter x to the verifier part V(x,Pr) of the zero-knowledge protocol, and the second node use the transition identification value as public input parameter in the verification. For example, the zero-knowledge proof may be proof Pr(x, W) that the Merkle tree root hash value is calculated from the intended state transition. In such a case, for example the root hash value can be used as public input parameter x for the protocol, and as secret witness parameter W the intended state transition. In such a case, the proof allows to verify that the transmitting first node knows a transition that is identified by this Merkle root hash. This ensures that the second node is actually verifying the state transition is instructed to be verifying. The verification may verify other aspects as well. For example, the verification data may e.g. comprise one or more digital signatures, each signed with a private key of a public-private key combination, as required by the rules and constraints of the selected state machine for the intended state transition. The second node may then be provided with public keys corresponding to the private keys of the public-private key combination used in the digital signatures. When verifying the intended state transition, the second node determines from the verification state machine the digital signatures required for the intended state transition, decrypts the digital signature using the public key and determines whether or not all digital signatures decrypt to the same digital message. This allows the second node to verify whether or not the transition that matches the public input x is actually signed by the required signers. Thus, the risk of denial-of-state attack can be reduced because, although an attacker could theoretically create a valid zero- knowledge proof for this transition that would satisfy the contract rules, the required signatures would not be present. The digital message can for example derived from the root hash of the Merkle tree, and for example a hash calculated from the root hash. The invention may also be implemented in a computer program for running on a computer system, at least including code portions for performing some or all steps of a method according to the invention when run on a programmable apparatus, such as a computer system, or enabling a programmable apparatus to perform functions of a device or system according to the invention. A computer program is a list of instructions intended to be executed by a programmable apparatus such as a particular application program and/or an operating system. The computer program may for instance include one or more of: a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system. The computer program or software may be stored internally on computer readable storage medium or transmitted to the computer system via a computer readable transmission medium. All or some of the computer program may be provided on computer readable media permanently, removably or remotely coupled to an information processing system. The computer readable media may include, for example and without limitation tangible, non-transitory data carriers and data transmission media. The tangible, non-transitory data carriers can for example be any number of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD-ROM, CD-R, etc.) and digital video disk storage media; nonvolatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM, etc.. The data transmission media may e.g. include data-communication networks, point-to-point telecommunication equipment, and carrier wave transmission media, just to name a few. In the foregoing specification, specific examples of the invention have been illustrated. It will, however, be evident that various modifications and changes may be made therein without departing from the broader scope of the invention as set forth in the appended claims. In particular, even though in the preceding a preference may have been expressed, less preferred variants may be used as well, and the claims are not limited to the preferred examples. For example, the distributed database may be structured as a database with current values of the data records, e.g. a “world state” database in case of DLT, as well as logging data tracking the transitions of the data records from a past state to the current values. Such logging data may be stored on the participating devices in a file separate from the database with current values, e.g. in the form of a blockchain or other file with an immutable structure. This data may e.g. be maintained by each of the individual participating devices based on the transitions visible to the individual participating devices, i.e. each individual participating device maintaining its own logging data, or e.g. be maintained in shared manner where the individual participating device stores a copy of the logging data and the individual participating device synchronise their logging, data by an appropriate exchange of data, such as e.g. known from DLT’s like bitcoin. For instance, the distributed database can be implemented as a set of state machines, with a state machine on each participating device, of which the state represents the current values of the records of the database and logging data representing the state transitions of the respective state machines. Also, the distributed database arrangement can e.g. be permission-less, meaning that any device can join the distributed database arrangement, or be permissioned, meaning that the access to the distributed database arrangement is limited to selected devices, of which the selection can be static of dynamic. The permissioned distributed database arrangement can be public permissioned, meaning that any device can read data from the database but only selected devices can write data and/or participate in the consensus protocol. The distributed database arrangement may alternatively be a private permissioned one, meaning that only selected devices can read and/or write and/or participate in the execution of the consensus protocol. In case of a permissioned distributed database arrangement with a static selection, the distributed database arrangement is fully private. In addition, the distributed database arrangement can be open, meaning that the data records and changes therein are transparent to the participating devices, or be partially or fully confidential meaning that some or all of the data records and/or changes therein are only visible to a (selected) subset of the participating devices. For example, only the devices that store the data records and/or are involved in a change in the data record. In an open distributed database arrangement, for example, a participating device modifying a data record may broadcast to the other participating device a broadcast message informing of the changes made. In a partially closed distributed database arrangement, for example the broadcast message may e.g. not contain the content of the changes but only identify the data record. In a fully closed distributed database arrangement, for example, a participating device modifying a data record may only exchange messages with the other participating devices involved in the modification, and optionally send a message to another devices that determines whether or not the modification is a valid one compliant. For example, in the example of FIG 1, each node in the network is a separate physical node. However, it is alternatively possible to implement multiple nodes on the same physical device, e.g. by storing multiple certificates issued by the central registry based on different private-public key pairs. Also, the nodes in the network may be any suitable data processing device, such as mainframes, minicomputers, servers, workstations, personal computers, mobile phones and various other wired or wireless data processing devices. Such a device may for instance include one or more integrated circuit processor, associated memory and a number of input/output (I/O) devices. When executing the computer program, the computer system processes information according to instructions the computer program and produces resultant output information via I/O devices. Also, the term “message” as used herein refers to electronic messages exchanged between computer systems, and does not include message directly exchanged between human beings. However, an electronic message may be sent by a computer system under the supervision of a human operator. Furthermore, the keys used in the encryption, verification, and signing operations described may be managed and generated in any manner suitable for the specific implementation. For example, a scheme meeting the cryptographic strength desired for the specific implemtation may be selected for the generation, and the keys may be stored in secure, tamper protected modules, such as hardware security modules or in a secure element. It will be apparent that in the preceding “not capable of decrypting” refers to from a practical viewpoint not capable of doing so within an acceptable period of time, and does not refer to theoretically being capable e.g. by unauthorized access to decryption keys or by hard number crunching decrypt. Said differently, this refers to being designed to not be able to decrypt encrypted data. However, other modifications, variations and alternatives are also possible. The specifications and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word ‘comprising’ does not exclude the presence of other elements or steps then those listed in a claim. Furthermore, the terms “a” or “an,” as used herein, are defined as one or more than one. Also, the use of introductory phrases such as “at least one” and “one or more” in the claims should not be construed to imply that the introduction of another claim element by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an." The same holds true for the use of definite articles. Unless stated otherwise, terms such as “first” and “second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements The mere fact that certain measures are recited in mutually different claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

Claims 1. A method of generating recovery data of a distributed database in a distributed database arrangement, performed by a first system participating in the distributed database arrangement, the first system comprising a memory for storing data of at least a part, or all, of the data records of the distributed database, the method comprising: generating encrypted transition data representing a cyphertext version of transition data representing a transition of at least one data record of said data records from a first state to a second state, the second state being a current state of the data record or a preceding state preceding the current state; and transmitting the encrypted transition data to be stored in a memory, separate from the distributed database.

2. A method of storing recovery data of a distributed database in a distributed database arrangement performed by a second system participating in the distributed database arrangement, the method comprising: receiving from a first system participating in the distributed database arrangement recovery data for a transition of at least one data record of the distributed database from a first state to a second state, the second state being a current state of the data record or a preceding state preceding the current state; verifying whether the encrypted transition data represents a cyphertext version of transition data containing information about the transition and only storing the recovery data, separate from the distributed database, in a recovery data memory when this is the case.

3. A method of building a recovery database for a distributed database arrangement with a distributed database, comprising: generating recovery data at a first system in a network of systems participating in the distributed database arrangement, the first system comprising a memory for storing data of at least a part, or all, of the data records of the distributed database, the generating of recovery data comprising: generating encrypted transition data representing a cyphertext version of transition data containing information about a transition of at least one data record of said data records from a first state to a second state, the second state being a current state of the data record or a preceding state preceding the current state; transmitting the encrypted transition data to be stored in a recovery data memory, separate from the distributed database; at a second system in the network: receiving the encrypted transition data; verifying whether the encrypted transition data represents the cyphertext version of the transition data and only storing the recovery data in the recovery data memory, separate from the distributed database when this is the case.

4. The method of one or more of the preceding claims, comprising: the first system generating proof evidencing that the encrypted transition data represents the cyphertext version of the transition data, and transmitting the proof to the second system; and the second system verifying with the received proof whether the encrypted transition data represents the cyphertext version of the transition data.

5. The method of one or more of the preceding claims, wherein the recovery data comprises encrypted validation data, and comprising: the first system generating encrypted validation data representing a cyphertext version of a validation of the transition by systems in the network, and transmitting the encrypted validation data and the proof to the second system; the second system, after receiving the encrypted validation data, verifying whether the encrypted validation data represents the cyphertext version of the validation of the transition and only storing the recovery data in the memory when this is the case.

6. The method of claim 5, comprising: the first system generating proof that the encrypted validation data represents the cyphertext version of the validation of the transition to be performed, and transmitting the proof to the second system; and the second system verifying with the received proof whether the encrypted validation data represents the cyphertext version of the validation of the transition.

7. The method of one or more of the preceding claims, comprising the first system transmitting, after generating the encrypted transition data, a message to at least one other system involved in the transition to obtain a verification of the encrypted transition data by the at least one other system.

8. The method of claims 6 and 7, comprising the first system generating encrypted collective verification data comprising linked data representing the validation and data representing the verification, such as cryptographic signatures thereof, and encrypting the linked data to obtain the encrypted collective verification data.

9. The method of one or more of the preceding claims, as far as referring to one or more of claims 4-8, wherein: the proof comprises calculating a cryptographic proof of knowledge with as secret input one or more of the group consisting of: the encrypted transition, the encrypted validation data, a decryption key; and as public input one or more of the group consisting of: a value calculated with a one-way function from the transition data, a value calculated with a one-way function from the encrypted validation data, a value calculated with a one-way function from the encrypted collective verification data.

10. The method of claim 9, wherein the cryptographic proof of knowledge is calculated by performing a non-interactive protocol by the first system.

11. The method of claims 9 or 10, wherein the cryptographic proof of knowledge is a zero- knowledge proof.

12. The method of one or more of claims 4-11, wherein the proof evidences at least one of: a decryption of the encrypted transition data with a predetermined private key resulting in the transition data; a value calculated with a predetermined one-way function from the transition data corresponding to a public identifier of the transition to be performed; the transition data and/or encrypted transition data having been verified by at least one other system maintaining the data record; the transition meeting a set of predetermined transition rules.

13. The method of one or more of the preceding claims, wherein the second system is kept agnostic about the content of the transition to be performed.

14. The method of one or more of the preceding claims, wherein the second system is not authorized to dispose over a key required to decrypt the encrypted transition data, and preferably does not dispose over the key.

15. The method of claim 14, wherein the key required is a threshold decryption key distributed over a number n of systems in the network other than the first system, n being a positive integer and preferably at least 2, and wherein preferably the decryption key is not distributed to the second system.

16. The method of one or more of the preceding claims, wherein: the encrypted data is encrypted with a public key assigned to the first system, the public key being part of a public-private key pair, and the first system does not dispose over the private key of the public-private key pair.

17. The method of one or more of the preceding claims, wherein the second system, after storing the recovery data transmits an attesting message to the first system attesting of storage of the recovery data.

18. The method of claim 17, wherein the attesting message comprises: a cryptographic signature of the second system of the data stored.

19. The method of one or more of the preceding claims, wherein the first system creates a transition and transmits to the second system a message containing information about the transition, and wherein the second system judges the validity of a transition based on a predetermined set of criteria and the information.

20. The method of one or more of the preceding claims, wherein the distributed database is one or more of: a distributed ledger, a private ledger, a permissioned ledger, a consortium ledger, segregated ledger.

21. A method of operating a distributed database arrangement with a distributed database, comprising: initiating by a first system in a network of systems participating in the distributed database arrangement of a transition of at least one data record from a preceding state to a current state, the first system comprising a memory for storing data of at least a part, or all, of the data records of the distributed database; generating transition data by the first system representing the transition; and performing a method as claimed in one or more of claims 1-20 to store recovery data for the transition to be performed.

22. A method of recovering at least one record of a distributed database of a distributed database arrangement from a current state to a past state, comprising: retrieving recovery data comprising encrypted transition data from a recovery data memory, separate from the distributed database, the encrypted transition data representing a cyphertext version of a transition of the at least one record from a first state to a second state, the first state preceding the second state and the second state being the current state or a preceding state preceding the current state, retrieving a decryption key; decrypting with the decryption key the retrieved encrypted transition data; determining using the decrypted transition data a correct past state of the data record; and restoring the data record in the correct past state.

23. The method of claim 22, wherein the recovery data comprises encrypted validation data, and comprising decrypting the encrypted validation data to obtain a decrypted version of the validation data and using the decrypted validation data to determine the correct past state.

24. The method of claim 22 or 23, comprising determining a validity of the transition from the decrypted transition data.

25. The method of one of more of claims 22-24, wherein the recovery data is stored in the recovery data memory is stored with a method as claimed in one or more of claims 1-20.

26. The method of one or more of claims 22-25, wherein execution of the method is triggered by determining that integrity of the distributed database is compromised.

27. The method of claim 26, comprising: determining from the recovery data a faulty transition in at least one record of the distributed database, and restoring the record in the state directly preceding the faulty transition.

28. The method of claim 26 or 27, comprising: determining from the recovery data an originating system from which the faulty transition originated, and excluding the originating system from the network.

29. A system for participating in a distributed database arrangement with a distributed database, the system comprising: a memory in which at least some data records of the distributed database are stored; an integrated circuit processor arranged to run recovery data generation software to generate recovery data for the data records of the distributed database stored on the system, which integrated circuit processor is connected to the memory, the recovery data generation software comprising instructions executable by the integrated circuit processor to perform parts or whole of a method as claimed in claim 1 or 3, and optionally one or more of claims 4-20; and an output for transmitting the recovery data.

30. A system for storing recovery data for a distributed database of a distributed database arrangement, the system comprising: a memory in which recovery data for the data records of the distributed database are stored; an integrated circuit processor arranged to run recovery data storage software to store received recovery data, which integrated circuit processor is connected to the memory, the recovery data storage software comprising instructions executable by the integrated circuit processor to perform parts or whole of a method as claimed in claim 2, and optionally one or more of claims 4-20; and an input for receiving recovery data.

31. A network comprising at least one system as claimed in claim 29 and at least one system as claimed in claim 30, the systems being communicatively connected to each other.

32. A computer program product comprising code portions for performing some or all steps of the method of one or more of claims 1- 28, when run on a programmable apparatus.

33. A medium readable by a programmable apparatus, with data stored thereon representing recovery data obtained with, or obtainable by the method of one or more of claims 1-20 or a recovered data record obtained with a method as claimed in one or more of claims 22-28.

34. A non-transitory, tangible data carrier readable by a programmable apparatus, with data stored thereon representing code portions for performing some or all steps of the method of one or more of claims 1-28 when executed by a programmable apparatus.

35. A non-transitory, tangible data carrier readable by a programmable apparatus, with data stored thereon representing recovery data obtained with, or obtainable by, the method of one or more of claims 1-20 or a recovered data record obtained with a method as claimed in one or more of claims 22-28.