WO2008014002A2 - Super-efficient verification of dynamic outsourced databases - Google Patents

Super-efficient verification of dynamic outsourced databases Download PDF

Info

Publication number
WO2008014002A2
WO2008014002A2 PCT/US2007/017042 US2007017042W WO2008014002A2 WO 2008014002 A2 WO2008014002 A2 WO 2008014002A2 US 2007017042 W US2007017042 W US 2007017042W WO 2008014002 A2 WO2008014002 A2 WO 2008014002A2
Authority
WO
WIPO (PCT)
Prior art keywords
query
hash
answer
predetermined
proof
Prior art date
Application number
PCT/US2007/017042
Other languages
French (fr)
Other versions
WO2008014002A3 (en
Inventor
Michael T. Goodrich
Roberto Tamassia
Nikolaos Triandopoulos
Original Assignee
Brown University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Brown University filed Critical Brown University
Publication of WO2008014002A2 publication Critical patent/WO2008014002A2/en
Publication of WO2008014002A3 publication Critical patent/WO2008014002A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Definitions

  • the teachings in accordance with the exemplary embodiments of this invention relate generally to databases and, more specifically, relate to verification for outsourced databases.
  • Databases are increasingly being hosted or mirrored at untrusted third parties (i.e., outsourced), so as to support queries from users.
  • outsourced third parties
  • users cannot trust the answers that come from queries to these outsourced databases.
  • an important component of an outsourced database solution is the security and complexity of its answer-verification process.
  • a method includes: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree; and returning the answer and the proof.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree; and returning the answer and the proof.
  • an electronic device includes: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree.
  • a method includes: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first signed hash value for a corresponding hash tree and zero or more second hash values; hashing, based on the answer and the zero or more second hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one first signed hash value to determine a correspondence; and verifying at least one signature of the at least one first signed hash value by verifying that the at least one first signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first signed hash value for a corresponding hash tree and zero or more second hash values; hashing, based on the answer and the zero or more second hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one first signed hash value to determine a correspondence; and verifying at least one signature of the at least one first signed hash value by verifying that the at least one first signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree.
  • a method includes: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values; and returning the answer and the proof.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values; and returning the answer and the proof.
  • a method in another exemplary aspect of the invention, includes: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query on the data set comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query on the data set comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set
  • a method includes: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first hash value for a corresponding hash tree and at least one membership witness for the at least one first hash value; hashing, based on the answer, along at least one second hash value of at least one node of the corresponding hash tree to obtain at least one predetermined third hash value; comparing the obtained at least one predetermined third hash value to the at least one first hash value to determine a correspondence; and verifying the proof by utilizing the at least one first hash value and the at least one membership witness to verify that the at least one first hash value was utilized to obtain a predetermined accumulation value and verifying a signature on the predetermined accumulation value, wherein lhe predetermined accumulation value corresponds to a value obtained by accumulating a set of predetermined third hash values.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first hash value for a corresponding hash tree and at least one membership witness for the at least one first hash value; hashing, based on the answer, along at least one second hash value of at least one node of the corresponding hash tree to obtain at least one predetermined third hash value; comparing the obtained at least one predetermined third hash value to the at least one first hash value to determine a correspondence; and verifying the proof by utilizing the at least one first hash value and the at least one membership witness to verify that the at least one first hash value was utilized to obtain a predetermined accumulation value and verifying a signature on the predetermined accumulation value, wherein the predetermined accumulation value corresponds to a value obtained by accumulating
  • a method includes: maintaining, by a data source, an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; maintaining, by a query source, a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and invoking, by the query source, an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: maintaining, by a data source, an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; maintaining, by a query source, a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and invoking, by the query source, an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries.
  • a system in another exemplary aspect of the invention, includes: a data source configured to maintain an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; a query source configured to maintain a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and a responder, wherein the query source is further configured to invoke an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries.
  • a data source configured to maintain an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder
  • a query source configured to maintain a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder
  • the query source is further configured
  • FIG. 1 shows a system in which the exemplary embodiments of the invention" may be employed
  • FIG. 2 shows an exemplary new authentication structure
  • FIG. 3 depicts an exemplary system for detection and elimination of replay attacks
  • FlG. 4 depicts a flowchart illustrating one non-limiting example of a method for practicing the exemplary embodiments of this invention
  • FIG. 5 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention.
  • FIG. 6 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention.
  • FIG. 7 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention.
  • FIG. 8 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention.
  • Exemplary methods, computer programs, devices and networks are described herein that implement new algorithmic and cryptographic techniques for authenticating the results of queries over databases that are outsourced to untrusted parties.
  • the techniques depart from previous approaches by considering super-efficient answer verification. For example, answers to queries are validated in time asymptotically less than the time spent to produce them and using lightweight cryptographic operations. This property is achieved by adopting the decoupling of query answering and answer verification in a way that can be used, for example, for range or aggregate queries.
  • efficient techniques are provided for updating the database over time.
  • exemplary techniques are provided that are safe from replay attacks from the outsourcer.
  • One such exemplary technique involves the use of an external auditor, for example, who simply keeps a hashed digest of the sequence of updates and queries, yet is able to audit the outsourcer to determine if a replay attack has occurred since the last audit.
  • the scheme is static (that is, it doesn't allow for database updates) and involves a fairly complicated verification protocol; in particular, data is hashed over a binary tree in two different ways.
  • a different approach augments B-trees using hashes and signatures at tree nodes to authenticate range queries; completeness is subsequently considered. Essentially every node in the tree is signed, incurring a relatively high storage cost. The verification cost is O(t) but involves expensive operations (0(0 signatures are verified).
  • Still another approach authenticates range queries using signature aggregation; completeness is subsequently achieved. The approach is able to achieve super-efficiency, but not coupled with both efficient updates and replay-attack safety.
  • Another approach provides a hash- based B-tree-based authenticated indexing technique, focusing on experimental performance and the importance of range searching in database queries.
  • One exemplary technique involves a recursive construction that divides a hash tree in a recursive fashion so that it has O(log * «) "super" levels (that is, a number proportional to the inverse of the tower-of-twos function).
  • the source need only sign the hash values of nodes on super levels in this scheme, which significantly speeds up data updates while also simplifying the means to achieve super-efficiency.
  • the exemplary embodiments of the invention are super-efficient, dynamic and replay safe.
  • One exemplary solution involves the use of an RSA accumulator to allow clients to verify a single aggregation to prove that the signed responses to a query are still valid even if the signatures on those particular items are quite old.
  • a source-responder work trade-off is used to perform updates in O(Jn) time with this approach, which is efficient for moderately large values of n .
  • Other exemplary embodiments use an external auditor to detect, and thereby deter, replay attacks through periodic audits of the query responders. The key contribution here is that the auditor need only store a constant-sized digest for each responder, so that auditing is also a super-efficient computation. More importantly, it is shown below that a responder cannot employ a replay attack without being caught by the auditor.
  • Data authentication is examined in a context common to today's Internet setting, where a database becomes available for queries at an intermediate entity that is distinct from the data owner (creator or source) and is untrusted by the end user. That is, the creator (or owner) of the data set is not the same entity as the one answering queries about the set and, in particular, the data owner does not control the corresponding data structure that is used to answer a query.
  • an intermediate, untrusted party answers the queries about the data set that are issued by an end-user.
  • a data source S creates (and owns) a dynamic data set D, which may evolve through update operations, and maintains an authentication structure for D — appropriate for a specific type of queries (e.g., range queries or aggregate queries).
  • the data set is stored by a responder R who maintains the same authentication structure for D and answers queries issued by a user U.
  • R provides U with a cryptographic proof/? that is computed using the authentication structure of D. The proof/? is then used by a verification process to check the validity of the answer subject to a given query.
  • D and the authentication structure are appropriately updated by S and R.
  • the system 100 includes a source (S) 102 communicating with a responder (R) 104 and a user (U) 106 communicating with R 104.
  • S 102, R 104 and U 106 comprise the additional components (e.g., the dynamic data set) and are enabled to perform the functions (e.g., storage, maintenance, querying, answering, updating) as described immediately above.
  • U 106 comprises an electronic device capable of communication with R 104.
  • U 106 may comprise at least one data processor, at least one memory, a transceiver, and a user interface comprising a user input and a display device.
  • U 106 and/or R 104 include one or more components capable of implementing the exemplary embodiments of the invention (e.g., a data processor).
  • a data processor e.g., a data processor
  • an encryption component may be employed.
  • the encryption component may be a separate entity (e.g., an integrated circuit, an Application Specific Integrated Circuit or ASIC) or may be integrated with other components (e.g., a program run by a data processor, functionality enabled by a data processor).
  • This section describes a new, exemplary authentication structure for super- efficient answer verification, for example, for the problem of one-dimensional range searching.
  • the properties of this exemplary authentication structure are also considered.
  • This exemplary structure may also be utilized in conjunction with the new, exemplary authentication schemes presented in the next two sections.
  • D be a set of n key- value pairs (Jk, v) , where each key k is a distinct element of a totally ordered universe K .
  • the size « of Z ) and the size / of A q may be referred to as the input size and output size (or answer size) of query q , respectively.
  • the search data structure is decoupled from the authentication data structure.
  • and / ⁇ A ⁇ ⁇ .
  • the design of the authentication structure for range searching queries is based on verifying a collection of certain simple relations defined over the set D , regardless of the search technique employed.
  • the successor relation ⁇ (X) over a totally ordered set X with n elements consists of all ordered pairs of consecutive elements of X , augmented with pairs (- ⁇ , x, ) and (x n ,+ ⁇ ), where xi and X n are the smallest and largest elements of X, respectively.
  • ⁇ ( ⁇ l,5,2,8 ⁇ ) ⁇ (-oo,lXl,2X2,5X5 J 8X8,+oo) ⁇ .
  • ⁇ (X) has size n + ⁇ (i.e., n + 1 pairs).
  • the successor relation of the keys of a set of key-value pairs D may comprise the essential information for verifying the answer to a range searching query on D , as summarized in the following lemma.
  • the first condition guarantees that the answer ⁇ consists of t consecutive key- value pairs of data set D, whereas the second that the query range is exactly covered by the answer range.
  • answer correctness for range searching captures both inclusiveness (all the returned pairs are in the query range) and completeness (all the pairs in the query range are returned), while some previous approaches considered only inclusiveness.
  • a 9 ⁇ (A 1 , v,),... ,(A,, V 1 ) ⁇ , Ai ⁇ ... ⁇ A,
  • a q can be authenticated by verifying t pairs of the key-value relation, namely, that (A 1 , v, ) e Z) , 1 ⁇ i ⁇ t, and t + 1 pairs of the successor relation on the keys, namely that (A 1 .
  • ⁇ ) ⁇ (* 1 ,v 1 ),...,(A,,v,) ⁇ u ⁇ (A 0 ,A 1 ),...,(A / ,A, +1 ) ⁇ (1)
  • the authentication structure will reside both at S, for computing and signing the authentication strings, and at R, for producing the answer proof that will allow U to verify the answer.
  • security is proved based on Lemma 1 and the security properties of the utilized cryptographic primitives: using standard reductions, one can show that any successful attack launched from a computational bounded R corresponds to a successful attack against the security properties of our primitives (e.g., collision- resistance hashing, signature schemes, one-way accumulators).
  • the authentication structure for range search queries on D uses a hash tree built over D , which essentially encodes the relations ⁇ (K D ) and D .
  • A be a collision- resistant hash function.
  • a balanced hash tree of depth d is built, storing at the leaves from left to right the hash values A 1 , ...,A n defined as follows, where
  • a 1 A(A(- ⁇ ) I A(A:,)
  • a n A(A(Ar n ) I A(v ⁇ )
  • the hash values at the leaves encode information about various pairs: for 2 ⁇ i ⁇ n — 1 , A ( is the digest of the key-value pair ⁇ k t , v,) and successor pair (A:, , A 7+1 ) ,
  • a 1 is the digest of pairs (Ar 1 , V 1 ) , (- 00,Ar 1 ) and (Ar 1 , Ar 2 )
  • a n is the digest of pairs (Ar n , v H ) , (Jc n ,+ ⁇ ) .
  • internal nodes in the hash tree store the hash of the concatenation of the hash values stored at their children. So, any node v in the hash tree stores a hash value A v that encodes information about key- value pairs of D and successor pairs that are associated with the laves in the subtree rooted at v. For instance, a hash value stored at the parent node of two sibling leaf nodes j and j + ⁇ is the digest of pairs
  • ⁇ q) ⁇ (k lt , v (
  • set ⁇ (q) is partially encoded in all hash values stored at nodes that belong in the paths from leaves in L q up to the tree root r.
  • hash values to contain the hashes h?,...,hl , m 2 m x , at level
  • one defines the set S, of additional special hash values, stopping before the log * n step of the recursion, effectively at level 2 of the tree (or at some other small constant level of the tree) and set S - h r ⁇ j S x ⁇ j S 2 ...SL .
  • J 1 as the final set of special hash values, which is of ⁇ ( «) size. In actuality, it is ⁇ S ⁇ ⁇ n — 1, thus, S has size smaller than the trivial solution of setting as special every hash value in the tree.
  • the set S of special hash values in the hash tree is defined recursively and consists of ⁇ ( «) values residing at log* n levels: h r at level logn , ⁇ A, 1 ,... ⁇ at level loglog n , ⁇ A, 2 ,... ⁇ at level logloglog n , etc.
  • the verification cost of an answer of size t is O(log /) hashing cost where O ⁇ ) special hash values need be authenticated, essentially as being members of the set of special hash values S.
  • O ⁇ special hash values need be authenticated, essentially as being members of the set of special hash values S.
  • O ⁇ special hash values need be authenticated, essentially as being members of the set of special hash values S.
  • O ⁇ special hash values
  • Replay attacks may be eliminated, for example, by using time-stamps — such as a standard solution known in the literature — to check the freshness of a valid signature.
  • time-stamps such as a standard solution known in the literature — to check the freshness of a valid signature.
  • hash-based authentication i.e., in the most practical and widely used setting where only cryptographic hashing is used to produce the authentication strings, the exemplary authentication structure achieves optimal performance with respect to both the verification and the update costs.
  • the following result summarizes the performance of the new, exemplary structure and signature-based authentication scheme (proof in Appendix).
  • the answer proof has size O(log /) and consists of two signatures, two keys, and C(log t) hash values;
  • This authentication scheme is secure with respect to data authentication, safe with respect to replay attacks, and optimal with respect to super-efficient verification in the hash-based data authentication model.
  • the new authentication structure is now described.
  • the main idea is to use a dynamic accumulator for authenticating set membership queries for the set of special hash values S. This is performed as follows: the set S of special hash values is accumulated to accumulation value a and a is signed by the source. Then, verifying that a special hash value belongs in 5 is performed in two steps, and still in optimal fashion (0(1) verification cost): first, the hash value together with at least one membership witness are used to verify that the hash value was used by the accumulator in producing a and, second, the signature on a is verified. For security reasons, only the source knows the trapdoor information of the accumulator; the responder does not know this trapdoor. It follows that the verification is (as in the construction of the previous section) super-efficient.
  • Inserting and deleting elements in an accumulator involves some computational cost for updating the new accumulation and for updating the set-membership witnesses of all the elements (e.g., with one or at least one set- membership witness per element).
  • the witnesses of the O(n) accumulated special hash values are explicitly maintained in the source and the responder.
  • updates can be of cost O(ri) : the reason is that after any update all n membership witnesses must be updated.
  • the problem of the high update cost becomes more challenging for deletions, especially under the necessary assumption that the responder cannot use the trapdoor information.
  • using the RSA accumulator and certain algorithmic techniques one can achieve reasonable update and query costs. The following result summarizes the performance of this new, exemplary authentication scheme (proof in the Appendix, below).
  • the answer proof has size ⁇ 9(logO and consists of one signature, two field elements, two keys and O(iogt) hash values;
  • Theorem 2 states that if the RSA accumulator is additionally used, the update cost can be reduced to but now this cost is incurred at both the source S and the responder R. Both schemes preserve the super-efficient verification and replay-attack safety requirements. It is interesting to examine if one can further improve the update costs and design an authentication scheme that achieves different trade-offs.
  • Auditing Mechanism Model In the exemplary auditing mechanism, the delayed consistency checking is performed by the user U and in collaboration with the source S, without any direct interaction between the two, however.
  • the auditing mechanism corresponds to securely, compactly and efficiently encoding a series of transactions with the responder R, i.e., updates and queries over data set D, at S and U, respectively.
  • R maintains an update audit state ⁇ u , that encodes the history of updates, through information reported after update transactions with R: for any update M performed on the data set D, an update trail T u is provided to S by R that is used to update ⁇ u through operation 'upd_u_state'.
  • U maintains a query audit state ⁇ q , that encodes the history of queries, through information reported after query transactions with R: for any query q issued on D and returned answer-proof pair, a query trail T q is provided to t/by R that is used to update ⁇ q through operation 'upd_q_state'. These trails correspond to "receipts" that the auditing mechanism collects. This series of updates of the states ⁇ u and ⁇ q corresponds to the computation phase of the auditing mechanism.
  • Verification of the consistency of the two transaction series (update and query) and, consequently, replay attack detection are performed by Um the audit phase.
  • U can invoke a request for checking the consistency of the reported transactions with the current set D that resides at R. This is performed at U through operation 'audit', which receives as input the current audit query state ⁇ q of U and the current audit update state ⁇ u of S, appropriately updated given the current data set D (provided to S by R), and accepts or rejects its input, accordingly verifying the consistency of transactions.
  • the audit state remains unchanged and a new computation phase begins.
  • An auditing scheme (upd_u_state, upd_q_state, audit) is secure if it satisfies the following property: operation audit accepts its input if and only if no malicious action has been performed by R, that is, all query-answer pairs verified by U are consistent with the update history of the data set D and its current states computed using operations upd_u_state, upd_q_state.
  • Auditing scheme (upd_u_state, upd_q_state, audit), in particular, is secure if the following requirements (for computational bounded responder R) are satisfied: (i) completeness, dictating that all valid update and query transactions yield through operations upd_u_state, upd_q_state audit states that when checked by audit with a valid (not corrupted by R) data set D always result in accepting; and (ii) soundness, dictating that when audit accepts its inputs, then the audit states correspond to transactions of valid update and query operations subject to the current state of the data set.
  • An Efficient Secure Auditing Scheme Next described is how to construct a secure auditing scheme.
  • a simple cryptographic solution is used that is inspired from efficient and secure cryptographic mechanisms that provide off-line memory checking.
  • a trusted checker checks the correctness (or consistency) of an untrusted memory, where data is written in and read from the memory through operations ' load' and ' store' .
  • the checker maintains some constant-size state information and augments the data that is written into the untrusted memory, for example, with time-stamps, such that at any point in time, a check can be performed on the memory correctness.
  • the idea is to use a cryptographic primitive .4 for generating and updating this state information, as a short description of the memory history.
  • This primitive can produce short digests of large sets in an incremental fashion (that is, where elements can be inserted in the set and the new digest can be accordingly updated in (9(1) time without recomputing from scratch) and is used as follows.
  • a special encoding of the operation is created and securely enclosed in the state information through A.
  • two separate digests are maintained over two sets: a first set encodes the "load" history of the memory (i.e., reading operations); the second set encodes the "store” history (i.e., writing operations) of the memory.
  • Any operation results in updating both sets e.g., a load(i) operation will add the read item d t in the "load” history and the written item d, (but with a new time-stamp) in the "store” history.
  • the crucial observation is that if the memory is correct, the encodings of the two sets are such that the produced digests are the same when the check is performed.
  • the cryptographic primitive A such that it is collision-resistant, meaning that its computationally infeasible to find distinct sets that produce the same digest, the memory checking problem is reduced to an equality testing problem (subject to an appropriate encoding for the operations in the memory).
  • Such primitives A for incrementally computing collision-resistant digests of sets exist; e.g., e - biased hash functions.
  • the RSA accumulator is used as a collision-resistance primitive A for incrementally computing digests over sets and A(S) is used to denote the digest of set S.
  • A(S) is used to denote the digest of set S.
  • A is used to define the audit states ⁇ u and ⁇ q stored by S and U, respectively. The main idea is as follows.
  • the set S of special values defined over the exemplary super-efficient authentication structure of Section 3 may be viewed as an untrusted memory: with memory locations corresponding to the unique identifiers of the tree nodes (according to a fixed ordering, e.g., in-order tree traversal) and memory items corresponding to the special hash values and their signatures.
  • Every transaction (update or query) uniquely defines a subset of special hash values in the tree: for updates, the hashes in the ⁇ 9(log* ⁇ ) special tree levels in the corresponding leaf-to-root path; for queries, the two hashes of the lowest special tree level that exactly covers the answer.
  • These two subsets of special hashes respectively define the update trail T u and the query trail T q that are returned by R.
  • the tuple id v , h v , ⁇ v , t v ) is included in the corresponding trail.
  • id v is the identifier of v, h v the hash value, ⁇ v the corresponding signature and t v the associated timestamp.
  • e(-) is a function for computing prime representative values (as in the proof of Theorem 2)
  • /V is the RSA modulo
  • * ' v is the encoding that corresponds to ⁇ V but with a fresh time-stamp (e.g., monotonically increasing, synchronized for all parties) and a new identifier, hash value and signature (update case only).
  • the audit phase is as follows. First R forwards the request for the audit to
  • FIG. 3. depicts an exemplary system for detection and elimination of replay attacks.
  • the auditor A keeps audit state ⁇ of size O( ⁇ ) about the database DB, which is incrementally updated after any updates or queries on the database occur using respectively update trails T 11 and query trails T q provided by the responder R and user U
  • computation phase At certain points in time, the auditor checks the consistency of its local audit state ⁇ with the current database DB residing in R , performing an off-line correctness check on the history of transactions on the database (audit phase). Replay attacks are detected, since old data, although verifiable at U , correspond to invalid transactions. Replay attacks are effectively eliminated, since they are detected and expose possible malicious actions by R .
  • FIG.3 illustrates the use of a third party auditor ⁇ .
  • functions of the third party auditor A may be fulfilled by the user U and/or source S (e.g., via the responder R).
  • Theorem 3 There exists a hash-based, dynamic, super-efficient and audited authentication structure for range search queries over a set of size n with the following performance, where t denotes the number of data items returned by a query:
  • the answer proof has size 0(logf) and consists of two signatures, two keys and 0(logO hash values;
  • the auditing scheme stores 0(1) audit state information, performs
  • replay attacks performed by the responder are always detectable by the auditor (e.g., the user or a third party auditor) at the audit phase.
  • the auditor e.g., the user or a third party auditor
  • Section 3 which is an exemplary authentication structure for range search queries.
  • many other type of queries are related to range searching or consist of more complex search problems that eventually boil down (e.g., may be reduced) to range searching.
  • the canonical members of this class are aggregate queries, such as SUM, MAX, and AVG, as non-limiting examples.
  • a hashing scheme appropriate for these queries could be constructed such that it encodes the information (relations) about ranges, corresponding aggregation values and neighboring data records.
  • the hash tree node v defining subtree T v stores a hash value that encodes information about the aggregation value a v computed over the records that correspond to the leaves of T v , the left-most and rightmost records in T v and, also, their predecessor and successor records (not in T v ), respectively.
  • these queries can be authenticated by considering the corresponding allocation nodes in the query range; and again, any query range has at most two allocation nodes in some special level of the tree.
  • the exemplary hashing scheme of Section 3 and, accordingly, all of the exemplary authentication schemes can be extended to these classes of queries (e.g., aggregation queries and path property queries), as non-limiting examples.
  • Hashing operations are particularly lightweight (block-cipher type of computations).
  • Hash-tree An authentication tree, based on the construction due to
  • Merkle is used which hierarchically defines a collection of hash values (stored at internal nodes) computed over a data set (stored at leaves).
  • a hash tree is a balanced binary tree, where each node stores a hash value computed using a collision- resistant hash function: leaves store the hash of the corresponding element and internal nodes store the hash of the concatenation of the hash values of their children.
  • Signatures Any signature scheme secure against adaptive chosen-message attack may be used. Typically, signing and verifying a signature involves more expensive operations (e.g., modular exponentiations).
  • RSA-based dynamic accumulators are used in conjunction with a dynamization scheme for optimally verifying set membership. These cryptographic primitives produce an efficiently computed accumulation of a set, along with short and efficiently verifiable witnesses for all accumulated items.
  • Set-membership takes 0(1) time and is one-way: under the strong RSA assumption, it is computationally infeasible to find not accumulated in the set items and fake witnesses that pass the verification test.
  • the underlying computations involve modular exponentiations and multiplications.
  • the verification cost is O(log t) hashing cost and at most two signature verifications.
  • the exemplary authentication structures can achieve super-efficient verification based on the use of O(n) special digests defined hierarchically over the data set. It is shown that this design is optimal for hash-based authentication, i.e., when only cryptographic hashing is used to produce the digests. The proof is based on a result from previous work, saying that for hash-based authentication of set-membership queries, super-efficient verification can be achieved only at an "exponential" growth of the signature cost. See R. Tamassia and N. Triandopoulos. Computational bounds on hierarchical data processing with applications to informationsecurity. In Proc. Int.
  • the update cost includes: O(log t) hashing cost, ⁇ 9(log* ri) signature cost and O(ji) signature renewal cost, thus, O ⁇ ii) signature cost in total.
  • this technique may be optimal for hash-based data authentication resilient to replay attacks.
  • the problem can be formulated as follows. One wishes to design a mechanism that allows a user to validate the freshness of a verified signature received by the responder, even when the responder is allowed to cache old signed hash values.
  • the set S of special hash values is fixed over time (only values of key- value pairs change over time). Then the problem of the verification of signature freshness is equivalent to a particular data authentication problem.
  • update operation 'insertType( ⁇ , JC)' inserts an element x of type r e ⁇ ⁇ , ... , ⁇ ,,, ⁇ in the data structure (there are m in total types), and query operation 'last( ⁇ )' returns the element x of type ⁇ that was lastly inserted in the data structure (i.e., most recently).
  • verifying the signature freshness corresponds to verifying the answer of a last(-) query and vice-versa.
  • the complexity due to authentication holds because of the use of the accumulator.
  • the accumulation function is modular exponentiation, where the RSA modulo is used.
  • the witness w of its membership in S is value A(S — ⁇ / ⁇ ) and it can be efficiently verified by checking that W 4' ⁇ i) _ ⁇ s ⁇ Accumulation A(S) is the unique authentication string that is signed by the source. Accordingly, answer verification is still super-efficient as in the proof of Theorem 1 ) : only now the two special hash values that authenticate the query are first authenticated to be members of A(S), which is in turn authenticated by verifying its signature. Also, using time-stamps when signing A(S) provides security against replay attacks.
  • exemplary data authentication structures have been considered in a setting where critical information is queried (e.g., in high rates) from a dynamic outsourced database that resides in an untrusted site.
  • New approaches have been presented for query authentication, where, by decoupling the answer-generation and answer- verification procedures, one moves towards super-efficient answer verification, an important property for data authentication, given that many real-life applications involve the querying of critical data (e.g., financial) by computationally limited devices, for example.
  • Exemplary authentication schemes for range search queries are described that achieve super-efficient answer verification, allow for efficient updates on the database and eliminate replay attacks from the database outsourcer.
  • any answer of size / is verified in time O(f), using only 0(1) modular exponentiations.
  • exemplary authentication protocols are discussed that implement exemplary efficient auditing mechanisms that can perform an off-line check on the consistency of an outsourced database that reliably reports any malicious action from the outsourcer.
  • the exemplary schemes may be extended to more general queries.
  • a method includes: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree; and returning the answer and the proof. See FIG. 4.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree; and returning the answer and the proof.
  • an electronic device includes: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree.
  • the electronic device as above, embodied as a responder in a network.
  • the electronic device as above and further including one or more of further improvements described herein.
  • a method includes: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first signed hash value for a corresponding hash tree and zero or more second hash values; hashing, based on the answer and the zero or more second hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one first signed hash value to determine a correspondence; and verifying at least one signature of the at least one first signed hash value by verifying that the at least one first signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree.
  • the method is implemented by a computer program. The method as above and further including one or more of further improvements described herein. See FIG. 5.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first signed hash value for a corresponding hash tree and zero or more second hash values; hashing, based on the answer and the zero or more second hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one first signed hash value to determine a correspondence; and verifying at least one signature of the at least one first signed hash value by verifying that the at least one first signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree.
  • the computer program product as above and further including one or more of further improvements described herein.
  • a method includes: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conj unction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values; and returning the answer and the proof. See FIG. 6.
  • n corresponds to a number of data elements
  • t corresponds to a size of an answer returned for a query, where a query is answered in O(log n + /) time
  • an answer proof has a size O(log t) and the answer proof consists of one signature, two field elements, two keys and O(log /) hash values
  • an answer to a query is validated by performing O(t) arithmetic computations, O(f) hash operations, O(l) modular exponentiation and O(l) signature verifications, where an update results in 0(log «) hash operations, O( Vn log* ri) modular operations and 0(1) signature generations.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values; and returning the answer and the proof.
  • a computer program product as in any above and further including one or more of further improvements described herein.
  • an electronic device includes: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query on the data set comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values.
  • a method includes: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first hash value for a corresponding hash tree and at least one membership witness for the at least one first hash value; hashing, based on the answer, along at least one second hash value of at least one node of the corresponding hash tree to obtain at least one predetermined third hash value; comparing the obtained at least one predetermined third hash value to the at least one first hash value to determine a correspondence; and verifying the proof by utilizing the at least one first hash value and the at least one membership witness to verify that the at least one first hash value was utilized to obtain a predetermined accumulation value and verifying a signature on the predetermined accumulation value, wherein the predetermined accumulation value corresponds to a value obtained by accumulating a set of predetermined third hash values.
  • each predetermined third hash value of the set of predetermined third hash values is unsigned.
  • the method is implemented by a computer program.
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first hash value for a corresponding hash tree and at least one membership witness for the at least one first hash value; hashing, based on the answer, along at least one second hash value of at least one node of the corresponding hash tree to obtain at least one predetermined third hash value; comparing the obtained at least one predetermined third hash value to the at least one first hash value to determine a correspondence; and verifying the proof by utilizing the at least one first hash value and the at least one membership witness to verify that the at least one first hash value was utilized to obtain a predetermined accumulation value and verifying a signature on the predetermined accumulation value, wherein the predetermined accumulation value corresponds to a value obtained by accumul
  • a method includes: maintaining, by a data source, an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; maintaining, by a query source, a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and invoking, by the query source, an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries.
  • a method as in any above, wherein the method is implemented by a computer program.
  • n corresponds to a number of data items and / corresponds to a number of data items returned for a query
  • a query is answered in 0(log n + 1) time
  • an answer proof has a size 0(log /) and the answer proof consists of two signatures, two keys and 0(log t) hash values
  • an answer to a query is validated by performing 0(/) hash operations and 0(1) signature verifications, where an update results in 0(log ri) hash operations and 0(log* ri) signature generations
  • an auditing scheme stores 0(1) audit states
  • the auditing scheme performs 0(log ri) work per update at the data source and 0(1) work per query at the query source during a computational phase
  • the auditing scheme performs OQi) work at the data source and 0(1) work at the query source during an audit phase, wherein replay attacks performed by the responder are always detectable by the query source at the
  • a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: maintaining, by a data source, an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; maintaining, by a query source, a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and invoking, by the query source, an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries.
  • a system in another non-limiting, exemplary embodiment, includes: a data source configured to maintain an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; a query source configured to maintain a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and a responder, wherein the query source is further configured to invoke an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries.
  • a data source configured to maintain an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder
  • a query source configured to maintain a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder
  • the query source is further

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One non-limiting, exemplary method includes: receiving a query including one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof has zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further includes at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree; and returning the answer and the proof.

Description

SUPER-EFFICIENT VERIFICATION OF DYNAMIC OUTSOURCED DATABASES
TECHNICAL FIELD:
[0001] The teachings in accordance with the exemplary embodiments of this invention relate generally to databases and, more specifically, relate to verification for outsourced databases.
BACKGROUND:
[0002] Databases are increasingly being hosted or mirrored at untrusted third parties (i.e., outsourced), so as to support queries from users. Of course, without some kind of verification process, users cannot trust the answers that come from queries to these outsourced databases. Thus, an important component of an outsourced database solution is the security and complexity of its answer-verification process. Consider the numerous computational settings where clients reside in computationally less powerful machines, such as sensors in wireless networks, mobile phones, and vehicles. In this context, the cryptographic protocols for trustworthy answer verification should preferably incur small communication and computational overheads.
[0003] For instance, many database queries boil down to one-dimensional range search queries— asking to report those records having values of a certain field within a given interval— and most techniques for authenticating such queries have O(log n + t) communication and computational costs, where n is the total number of records in the database and t is the number of returned records. Note that if the database size, n , is very large relative to the answer size, t , then the verification time is potentially much higher than it may need to be.
SUMMARY:
[0004] In an exemplary aspect of the invention, a method includes: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree; and returning the answer and the proof.
[0005] In another exemplary aspect of the invention, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree; and returning the answer and the proof.
[0006] In another exemplary aspect of the invention, an electronic device includes: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree.
[0007] In another exemplary aspect of the invention, a method includes: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first signed hash value for a corresponding hash tree and zero or more second hash values; hashing, based on the answer and the zero or more second hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one first signed hash value to determine a correspondence; and verifying at least one signature of the at least one first signed hash value by verifying that the at least one first signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree.
[0008] In another exemplary aspect of the invention, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first signed hash value for a corresponding hash tree and zero or more second hash values; hashing, based on the answer and the zero or more second hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one first signed hash value to determine a correspondence; and verifying at least one signature of the at least one first signed hash value by verifying that the at least one first signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree.
[0009] In another exemplary aspect of the invention, a method includes: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values; and returning the answer and the proof.
[0010] In another exemplary aspect of the invention, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values; and returning the answer and the proof.
[0011] In another exemplary aspect of the invention, a method includes: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query on the data set comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values.
[0012] In another exemplary aspect of the invention, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query on the data set comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values.
[0013] In another exemplary aspect of the invention, a method includes: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first hash value for a corresponding hash tree and at least one membership witness for the at least one first hash value; hashing, based on the answer, along at least one second hash value of at least one node of the corresponding hash tree to obtain at least one predetermined third hash value; comparing the obtained at least one predetermined third hash value to the at least one first hash value to determine a correspondence; and verifying the proof by utilizing the at least one first hash value and the at least one membership witness to verify that the at least one first hash value was utilized to obtain a predetermined accumulation value and verifying a signature on the predetermined accumulation value, wherein lhe predetermined accumulation value corresponds to a value obtained by accumulating a set of predetermined third hash values.
[0014] In another exemplary aspect of the invention, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first hash value for a corresponding hash tree and at least one membership witness for the at least one first hash value; hashing, based on the answer, along at least one second hash value of at least one node of the corresponding hash tree to obtain at least one predetermined third hash value; comparing the obtained at least one predetermined third hash value to the at least one first hash value to determine a correspondence; and verifying the proof by utilizing the at least one first hash value and the at least one membership witness to verify that the at least one first hash value was utilized to obtain a predetermined accumulation value and verifying a signature on the predetermined accumulation value, wherein the predetermined accumulation value corresponds to a value obtained by accumulating a set of predetermined third hash values. [0015] In another exemplary aspect of the invention, a method includes: maintaining, by a data source, an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; maintaining, by a query source, a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and invoking, by the query source, an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries.
[0016] In another exemplary aspect of the invention, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: maintaining, by a data source, an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; maintaining, by a query source, a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and invoking, by the query source, an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries.
[0017] In another exemplary aspect of the invention, a system includes: a data source configured to maintain an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; a query source configured to maintain a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and a responder, wherein the query source is further configured to invoke an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries. BRIEF DESCRIPTION OF THE DRAWINGS:
[0018] The foregoing and other aspects of embodiments of this invention are made more evident in the following Detailed Description, when read in conjunction with the attached Drawing Figures, wherein:
[0019] FIG. 1 shows a system in which the exemplary embodiments of the invention" may be employed;
[0020] FIG. 2 shows an exemplary new authentication structure;
[0021] FIG. 3 depicts an exemplary system for detection and elimination of replay attacks;
[0022] FlG. 4 depicts a flowchart illustrating one non-limiting example of a method for practicing the exemplary embodiments of this invention;
[0023] FIG. 5 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention;
[0024] FIG. 6 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention;
[0025] FIG. 7 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention; and
[0026] FIG. 8 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention.
DETAILED DESCRIPTION:
[0027] Instead, it is desirable to design cryptographic techniques that allow authentication of range search queries with O(t) communication and computation costs, even if/ is small. It is desirable for verification to be proportional to the answer size, that is, that verification be super-efficient. Furthermore, it is desirable for the protocols to involve lightweight cryptographic operations with, ideally, only OQ) modular exponentiations. [0028] Additionally, authentication solutions should be sought that are efficiently dynamized, that is, they perform well even if the database evolves frequently over time. The challenge in this context is that a malicious outsourcer may replay old verifiable (e.g., signed) information from the data owner to a client. This is where super-efficiency may prove detrimental, since one wants to avoid a verification that requires more than O(t) work on the part of the client, and one desires to avoid requiring the data owner to process (e.g., re-sign) all the records of the database with each update. Ideally, it would be preferable to have a dynamic system that is super-efficient for the client, safe from replay attacks, and can process updates efficiently for the data owner and the outsourcer.
[0029] Super-efficient verification is an important property in database security, since it extends the functionality of database management systems to dynamic and highly distributed data dissemination models, where small mobile computing devices may continuously and rapidly query data from databases that are outsourced to untrusted and geographically dispersed proxy machines.
[0030] For instance, consider vehicles querying navigation information from geographic information systems or mobile phones retrieving summary results about the financial trajectory of the day: data validity checking should be very efficient because of the limited resources these devices have and the high rate at which queries are issued. Moreover, human-centered architectures in sensor networks will increasingly involve mobile devices (e.g., cell phones, personal digital assistants, vehicles) continuously sensing information and accordingly receiving critical data from other surrounding computing devices. These new data dissemination settings, where data is verified in computationally limited devices, are anticipated to grow largely in the near future in order to realize sensing-based information systems (e.g., context-aware publish-subscribe systems). Similarly, in pervasive computing, the next-generation paradigm where computation is integrated into the environment, people are envisioned to interact with small data-processing devices located everywhere at all times; these devices will be constantly receiving critical data from databases hosted in untrusted entities.
[0031] Finally, answer verification in dynamic databases is utilized to capture the data authentication needs of any realistic application. For example, traffic information is something that tends to evolve over the course of a day or week, and financial information changes on an hourly basis. In such cases, constructing query authentication protocols that provide both efficient update of records at the database owner and outsourcer and super-efficient answer verification at the client is a challenging task because of conflicting design goals and the additional threat of replay attacks. Note that for the static case, separately signing every record provides a trivial, although not practical, super-efficient verification. In many scenarios, there may be an incentive for an untrusted database responder to provide stale information to some users (e.g., either to save resources required to perform updates or because the old data has more economic value to the outsourcer). Replay attacks are a serious threat in outsourced database systems, one has been recently observed also by the database community.
[0032] Exemplary methods, computer programs, devices and networks are described herein that implement new algorithmic and cryptographic techniques for authenticating the results of queries over databases that are outsourced to untrusted parties. The techniques depart from previous approaches by considering super-efficient answer verification. For example, answers to queries are validated in time asymptotically less than the time spent to produce them and using lightweight cryptographic operations. This property is achieved by adopting the decoupling of query answering and answer verification in a way that can be used, for example, for range or aggregate queries. Unlike previous schemes, efficient techniques are provided for updating the database over time. More importantly, exemplary techniques are provided that are safe from replay attacks from the outsourcer. One such exemplary technique involves the use of an external auditor, for example, who simply keeps a hashed digest of the sequence of updates and queries, yet is able to audit the outsourcer to determine if a replay attack has occurred since the last audit.
1.1 Previous and Related Work
[0033] Related work exists in the computer security literature on authenticated data structures, which model the security problem of data querying in untrusted or adversarial environments. The general approach is to augment a data structure such that, along with an answer to a query, a cryptographic proof is provided that can be used to verify the answer authenticity. Research initially focused on authenticating membership queries (mostly in the context of the certificate revocation problem), and various authenticated dictionaries based on extensions of the hash tree (introduced by Merkle) have been studied . Alternatively, other approaches show how to use dynamic accumulators to realize a dynamic authenticated dictionary. More general queries, beyond membership queries, have been studied as well, where extension of hash trees are used to authenticate various kinds of queries, including basic operations (e.g., select, join) on databases, pattern matching in tries and orthogonal range searching, path queries and connectivity queries on graphs and queries on geometric objects (e.g., point location queries and segment intersection queries) and queries on XML documents. Many of these queries, including path queries in graphs (which typically would be the primary query type in a traffic routing system), basically boil down to one-dimensional range searching queries.
[0034] Progress has also been made in the design of generic authentication techniques for classes of queries or general patterns that can instantiate the authentication of concrete queries. First, one approach has shown how, by hashing over the search structure of a specific class of data structures (that search in a DAG), one can achieve authentication of the corresponding query types (e.g., orthogonal range search) in the static case. In another approach, it was shown how extensions of hash trees can be used to authenticate properties of data organized as paths (e.g., sequences of data items), where the properties are decomposable, i.e., use an associative operator over the properties of subpaths (e.g., aggregation queries). Also the authentication of the general fractional cascading data-structuring technique has been considered, allowing the authentication of any search query problem that involves iterative searches over catalogs (e.g., point location). Both techniques involve answer proofs and verification times that asymptotically equal the complexity of answering queries; they are not super-efficient. Finally, in another approach, a theoretical framework was presented for hash-based authentication of set-membership queries, which links querying answering, answer verification, and authentication, but range searching queries are not addressed. It was shown that for authenticated dictionaries of size n, all costs related to authentication are at least logarithmic in n in the worst case.
[0035] There has recently also been a growing body of work on authenticating queries in outsourced databases. The model is essentially the one of authenticated data structures but now the data sets are relational databases residing in external memory (e.g., hard drives) and the type of queries are SQL-based queries. Again, these queries are founded on one-dimensional range queries. In one approach, Range queries are supported by hashing over range trees with communication, query and verification costs proportional to O(log+ /) . Similarly, range queries can be authenticated with O(log n + 1) costs. Another approach uses cryptographic hashing and accumulators to introduce a hash-based super-efficient verification scheme, where the communication cost is O(log t) and the verification cost is O{t) , which is super-efficient. Nevertheless, the scheme is static (that is, it doesn't allow for database updates) and involves a fairly complicated verification protocol; in particular, data is hashed over a binary tree in two different ways. A different approach augments B-trees using hashes and signatures at tree nodes to authenticate range queries; completeness is subsequently considered. Essentially every node in the tree is signed, incurring a relatively high storage cost. The verification cost is O(t) but involves expensive operations (0(0 signatures are verified). Still another approach authenticates range queries using signature aggregation; completeness is subsequently achieved. The approach is able to achieve super-efficiency, but not coupled with both efficient updates and replay-attack safety. Another approach provides a hash- based B-tree-based authenticated indexing technique, focusing on experimental performance and the importance of range searching in database queries.
1.2 Contributions
[0036] Dynamic super-efficient authentication techniques are provided for queries
(e.g., one-dimensional range searching and other types of queries based on them) that are both dynamic and replay safe. That is, one can support fast query times on untrusted responders, super-efficient verification for clients, and fast update time for the data source. One exemplary technique involves a recursive construction that divides a hash tree in a recursive fashion so that it has O(log* «) "super" levels (that is, a number proportional to the inverse of the tower-of-twos function). The source need only sign the hash values of nodes on super levels in this scheme, which significantly speeds up data updates while also simplifying the means to achieve super-efficiency. Indeed, for all practical applications (e.g., where n is smaller than the number of atoms in the universe), there are a constant number of super levels in this scheme. In comparison to the aforementioned previous and related work, the exemplary embodiments of the invention are super-efficient, dynamic and replay safe.
[0037] To avoid the possibility of replay attacks, several possible solutions are provided. One exemplary solution involves the use of an RSA accumulator to allow clients to verify a single aggregation to prove that the signed responses to a query are still valid even if the signatures on those particular items are quite old. A source-responder work trade-off is used to perform updates in O(Jn) time with this approach, which is efficient for moderately large values of n . Other exemplary embodiments use an external auditor to detect, and thereby deter, replay attacks through periodic audits of the query responders. The key contribution here is that the auditor need only store a constant-sized digest for each responder, so that auditing is also a super-efficient computation. More importantly, it is shown below that a responder cannot employ a replay attack without being caught by the auditor.
2 Preliminaries
[0038] In this, section, an exemplary data querying and authentication model are briefly described. Data authentication is examined in a context common to today's Internet setting, where a database becomes available for queries at an intermediate entity that is distinct from the data owner (creator or source) and is untrusted by the end user. That is, the creator (or owner) of the data set is not the same entity as the one answering queries about the set and, in particular, the data owner does not control the corresponding data structure that is used to answer a query. In this setting, an intermediate, untrusted party answers the queries about the data set that are issued by an end-user.
[0039] In particular, consider data authentication in the following three-party data querying model. A data source S creates (and owns) a dynamic data set D, which may evolve through update operations, and maintains an authentication structure for D — appropriate for a specific type of queries (e.g., range queries or aggregate queries). The data set is stored by a responder R who maintains the same authentication structure for D and answers queries issued by a user U. Along with an answer a to a query q, R provides U with a cryptographic proof/? that is computed using the authentication structure of D. The proof/? is then used by a verification process to check the validity of the answer subject to a given query. On any update for D issued by the source, D and the authentication structure are appropriately updated by S and R.
[0040] Referring to FIG. 1, a system 100 is shown in which the exemplary embodiments of the invention may be employed. The system 100 includes a source (S) 102 communicating with a responder (R) 104 and a user (U) 106 communicating with R 104. S 102, R 104 and U 106 comprise the additional components (e.g., the dynamic data set) and are enabled to perform the functions (e.g., storage, maintenance, querying, answering, updating) as described immediately above. U 106 comprises an electronic device capable of communication with R 104. As a non-limiting example, U 106 may comprise at least one data processor, at least one memory, a transceiver, and a user interface comprising a user input and a display device. U 106 and/or R 104 include one or more components capable of implementing the exemplary embodiments of the invention (e.g., a data processor). As a non-limiting example, an encryption component may be employed. As further non-limiting examples, the encryption component may be a separate entity (e.g., an integrated circuit, an Application Specific Integrated Circuit or ASIC) or may be integrated with other components (e.g., a program run by a data processor, functionality enabled by a data processor).
[0041] With respect to data authentication, the goal is to design an authentication structure that allows trustworthy answer verification, thus checking that the answer is as accurate as it would have been had the answer come directly from S. To achieve this, the following general authentication technique is utilized. Using the public-key cryptographic model, assume that U knows the public key of S (and so does R). The corresponding secret key is used by S in combination with some cryptographic primitives to produce one (or more) authentication strings (or digests) for the data set D, which constitute short description(s) of D that capture structural information that is related to the type of queries of interest. Given any query q, R uses its authentication structure to produce a proof/? for the answer a ofq. On input of a query-answer pair (q, a), a proof/?, and the public key of S, U runs a verification algorithm that either accepts a as valid or rejects it as invalid. The proof of an answer securely relates the answer to (some of) the authentication string(s), which the source authenticates using a signature scheme. The set of authentication algorithms, communication protocol and verification process are called an authentication scheme. [0042] Now described is the security requirement that any authentication scheme must satisfy. Security is captured as two individual requirements, modeling the desired property: for all queries the verification process should be trustworthy, accepting an answer-proof pair if and only if the returned answer is the correct answer to the query. First, completeness is desired, which ensures that for any query the authentication structure generates a correct corresponding answer-proof pair that the verification algorithm accepts. Second, soundness is desired, which ensures that if, given a query q, an answer-proof pair (a,p) is accepted by the verification algorithm, then a is the correct answer to q. With respect to this requirement, assume the following threat model. Recall that in the data authentication model, the user U trusts only the source S, not the responder R; R is then modeled as an entity that is controlled by an adversary. Denial of service (DoS) attacks are not considered, but assume that R always participates in the communication protocol and interacts with S and U. However, R can now maliciously try to cheat, by not providing the correct answer to a query and forging a false proof for this answer. Accordingly, the soundness requirement dictates that given any query issued by U, no polynomial-time responder R, having oracle access to the authentication algorithm that the source runs to generate the authentication strings (i.e., R observes the authentication strings of D over time or even selectively query authentication strings of specially chosen D) can reply with an answer-proof pair, such that both the answer is not correct and the verification algorithm accepts the answer as authentic. This definition implies safety against replay attacks.
[0043] In this work, one interest is in secure authentication schemes for range search queries or aggregate queries that additionally achieve the best possible degree of efficiency, in particular, schemes that introduce low computational and communication overhead to the involved parties. In particular, it is desirable to seek authentication schemes that primarily incur low verification time, called verification cost. Other important cost parameters are the update cost (for updating the authentication structure at S and R after updates) and the query cost (for producing the answer-proof pairs at R after queries), as well as the proof size. In the interest of super-efficient verification, one may wish to design authentication techniques that allow very fast answer verification, in time asymptotically less than the time needed for answer generation and tolerate — when needed — reasonable trade-offs in the update and query costs. However, unlike some previous work, the exemplary techniques of the invention do not trade security for efficiency.
[0044] In the described exemplary authentication schemes, standard cryptographic tools are utilized, such as collision-resistant hash functions, Merkle's tree, digital signatures and dynamic accumulators, as non-limiting examples. A brief description of these is presented in the Appendix.
3 An Exemplary Super-Efficient Authentication Structure
[0045] This section describes a new, exemplary authentication structure for super- efficient answer verification, for example, for the problem of one-dimensional range searching. The properties of this exemplary authentication structure are also considered. This exemplary structure may also be utilized in conjunction with the new, exemplary authentication schemes presented in the next two sections.
[0046] Let D be a set of n key- value pairs (Jk, v) , where each key k is a distinct element of a totally ordered universe K . A one-dimensional range searching query q = \cjL ,qR] on D is for the query interval [qL ,qκ], with qL , qκ e K*u{— ∞, + ∞) , which returns as an answer the subset Aq of D consisting of all pairs whose keys are in [<//.></«] > i-e-> A1 = {(^> v) G D : q, ≤ k < qR) . The size « of Z) and the size / of Aq may be referred to as the input size and output size (or answer size) of query q , respectively.
[0047] In the approach for designing exemplary authentication structures in accordance with the exemplary embodiments of the invention, the search data structure is decoupled from the authentication data structure. Assume that the answer Aq to a range searching query q is computed by the responder R in O(logn + 0 time using some optimal technique (e.g., searching in a balanced range tree), where n = |Z>| and / = \Aη\ .
The design of the authentication structure for range searching queries is based on verifying a collection of certain simple relations defined over the set D , regardless of the search technique employed.
[0048] The successor relation σ(X) over a totally ordered set X with n elements consists of all ordered pairs of consecutive elements of X , augmented with pairs (- ∞, x, ) and (xn ,+∞), where xi and Xn are the smallest and largest elements of X, respectively. For instance, σ({l,5,2,8})= {(-oo,lXl,2X2,5X5J8X8,+oo)} . Thus, σ(X) has size n + \ (i.e., n + 1 pairs). The successor relation of the keys of a set of key-value pairs D may comprise the essential information for verifying the answer to a range searching query on D , as summarized in the following lemma.
[0049] Lemma 1. Let q = (q, , qR ) be a range search query on a set D of key- value pairs and K0 be the set of keys in D . Let A= { (A,, V1),...,(A,,v,) } be a set of key- value pairs, with k\ < ... < kt. Then A is the correct answer Aq to query q if and only if there exist keys ko, kl+l e KD ^>{-∞,+∞] such that: (I)A c D and
{{k0 , A1 ), (A1 , k2 ),..., (*,_, , k, Jk1 , A1+1 )} c σ{KD ); (2) ko < qL ≤ A1 and k, ≤ qH < A,+l .
[0050] Indeed, keys Ao, k^\ correspond to the boundaries of the range interval, each one possibly coinciding with fictitious keys -oo or +∞, with (Ao, kt+])σ(KD) ifAq = 0. Also, the first condition guarantees that the answer Λ consists of t consecutive key- value pairs of data set D, whereas the second that the query range is exactly covered by the answer range. Thus, in this formulation, answer correctness for range searching captures both inclusiveness (all the returned pairs are in the query range) and completeness (all the pairs in the query range are returned), while some previous approaches considered only inclusiveness.
[0051] It follows that, if A9 = { (A1, v,),... ,(A,, V1) }, Ai < ... < A,, is the correct answer to query q, Aq can be authenticated by verifying t pairs of the key-value relation, namely, that (A1 , v, ) e Z) , 1 < i < t, and t + 1 pairs of the successor relation on the keys, namely that (A1. , A,+1 ) e KD , 0 < i < t, and, finally, by checking t + 4 inequalities (i.e., the ordering of these pairs and that A0 < qr ≤ A1 and A, < qR < A/+1 ). Assuming uniquely defined representations for the key-value and successor relations, denote with θ(q) the resulting set of 2/ + 1 pairs to be verified:
^) = {(*1,v1),...,(A,,v,)} u{(A0,A1),...,(A/,A,+1)} (1)
[0052] So far, verification has been discussed in an absolute, information- theoretic sense. That is, the knowledge of certain relations over D allows the construction of a proof for verifying the correctness of an answer. In the presented data authentication model, one is interested in having answer verification hold in a computational sense, where a computationally bounded responder R (e.g., controlled by an adversary) is essentially incapable of providing incorrect answers that are verifiable by the user U. As Lemma 1 implies, the problem of authenticating range searching queries on a set D of size n is reduced to the problem of authenticating (membership in) two types of binary relations of size O(ri) defined over D: for example, the key-value and successor relations. This is used to separate the answer verification from the process of answer generation.
[0053] Next is considered an exemplary design for an authentication structure that for any query q provides super-efficient verification of the corresponding special relations θ(q) over D, independently of how the answer Aq is generated and with only OQ) authentication costs. This structure is designed such that it securely and compactly encodes and authenticates these special relations: collision-resistant hashing and accumulators are used to associate, in a cryptographically sound manner, the answer Aq to q, a corresponding proof p and, overall, the relations in θ(q), with one or more authentication strings (e.g., hash values or set accumulations) that will be signed by the source. The authentication structure will reside both at S, for computing and signing the authentication strings, and at R, for producing the answer proof that will allow U to verify the answer. In this setting, security is proved based on Lemma 1 and the security properties of the utilized cryptographic primitives: using standard reductions, one can show that any successful attack launched from a computational bounded R corresponds to a successful attack against the security properties of our primitives (e.g., collision- resistance hashing, signature schemes, one-way accumulators).
3.1 A New Exemplary Authentication S tructure
[0054] Below is described some new, exemplary authentication structures incorporating aspects of the exemplary embodiments of the invention. The described authentication structures are non-limiting examples of suitable structures that may be utilized.
[0055] Let D = {(kι,v]),...,(kn,vπ)} , kx < ... < £„ be a set of n key-value pairs, where for simplicity and without loss of generality, it is assumed that n - 2J . The authentication structure for range search queries on D uses a hash tree built over D , which essentially encodes the relations σ(KD) and D . In particular, let A be a collision- resistant hash function. A balanced hash tree of depth d is built, storing at the leaves from left to right the hash values A1, ...,An defined as follows, where || denotes string concatenation:
A1 = A(A(- ∞) I A(A:,) || A(v.) || A(Zr2)), (2) h, = A(A(A:,) I A(V/) I A(A--H)), I = 2,...,(#I - 1) ), (3)
An = A(A(Arn) I A(vΛ) || A(+ ∞)), (4)
[0056] Thus, the hash values at the leaves encode information about various pairs: for 2 < i < n — 1 , A( is the digest of the key-value pair {kt , v,) and successor pair (A:, , A7+1 ) ,
A1 is the digest of pairs (Ar1, V1) , (- 00,Ar1) and (Ar1, Ar2) , and An is the digest of pairs (Arn, vH ) , (Jcn,+ ∞) . Also recall that internal nodes in the hash tree store the hash of the concatenation of the hash values stored at their children. So, any node v in the hash tree stores a hash value Av that encodes information about key- value pairs of D and successor pairs that are associated with the laves in the subtree rooted at v. For instance, a hash value stored at the parent node of two sibling leaf nodes j and j + \ is the digest of pairs
(kj >vj) >
Figure imgf000020_0001
(kj>kj+ι) and (kj+ι >kj+2) > whereas the hash value Ar of the root of the tree is the digest of all pairs in σ(D) and key- value pairs in D.
[0057] From Lemma 1 and in order to verify answer Aq = {(k^ ,v( ),...,(A:(i ,v(j )} ,
\ < ... < kh , it suffices to verify set θ{q) = {(klt , v(| \ ... , (kh , v,# )}u {(klo , k \ ..., (k,t , A^ )}, where kh - -∞ if A^ = kx , or klo = A; _, otherwise, and, similarly, A:, ( = +00 if k,t = kn , or k. = k, ., otherwise.
[0058] Observe that this hash tree, by construction, encodes all the information related to set θ(q) in the hash values stored at t+l (or /, if k,t = Ar1 ) leaves Lq that are uniquely defined by query q: namely, the leaves /,, ,..., l,t -corresponding to the key-value pairs in Aq and, if k^ ≠ kλ , additionally the leaf / corresponding to key kt)_t . Moreover, set θ(q) is partially encoded in all hash values stored at nodes that belong in the paths from leaves in Lq up to the tree root r. Note that this encoding is one-way, since h is collision-resistant. Consequently, to authenticate Aq it suffices to authenticate any set S(/ = \hs , ..., AΛj> } of special hash values stored at tree nodes that define subtrees that strictly cover the leaves in Lq, and to associate in the hash tree the answer Aq with set Sq (here, Sq is not uniquely defined). Indeed, the hashes in Sq are digests that securely describe a set of pairs that strictly includes set θ(q). If, given the answer Aq and a collection of hash values (which serves as a proof), one can recompute the hashes in Sq, then, assuming that these values are authentic and based on collision-resistance hashing, one can be assured — in the bounded computational model — that the answer is correct, simply by checking answer validity as described in Lemma 1. hi essence, recomputing the special digests in Sq by hashing over the answer Λ9 (or information related to leaves in Lq) along the hash tree and verifying that these digests are authentic, is equivalent to verifying that all key- value pairs and successor pairs in θ(q) are authentic, and, thus, the answer is successfully verified. This way, authenticating range search queries is reduced to authenticating (membership in) the set Sq. Hashes in Sq are the authentication strings that S will authenticate. To describe this exemplary authentication structure, assume that S authenticates Sq by separately signing each special hash value. With respect to the verification cost of this technique, one has that answer Aq can be verified at hashing cost proportional to the size of the subtrees defined by the nodes storing the hash values in Sq, and signature cost proportional to the size of Sq. Here, set Sq is appropriately defined only for a specific query q; in general, one needs to authenticate a collection of special hash values 5 that can be used to efficiently verify the answer to any query.
[0059] Super-efficient Verification. An efficient approach is to set S = hr, i.e., use as special hash value for all queries the hash hr stored at the tree root r. Then, the answer Aq, \Aq\ = /, to any query q can be efficiently associated with hr, by considering as proof the O(log /) subtrees of total size O(t) that exactly cover the leaves in Lq and the paths connecting these subtrees to r through <9(log «) other tree nodes. Note that these <9(log i) subtrees are uniquely defined for any query q and they are rooted at the so-called allocation nodes for Lq. The verification cost is O(log n + t) for hashing through these subtrees up to r plus one signature verification for hs. But this generally does not provide super-efficient verification: whenever t = o(log ri), e.g., t = 0(log log ri) or t is constant, the technique is suboptimal. An exemplary technique for further improving the verification cost is described below.
[0060] Suppose that one only queries answers of size / < log n (also see FIG. 2).
Then one may define the set S\ of special hash values to contain the hashes hl ,...,hl , where m, = n/logn , at level l\ = log log n of the hash tree. In this case, it is easy to see that any answer of size t is covered by at most two nodes at level /| . Thus, the verification cost now is O(loglog ri) and, if t is o(log ri) and O(loglog ri) one has an improvement and optimal performance. To further improve the verification cost in the case where t is ø(loglog ri) , one basically uses the above exemplary technique to recursively define additional special hash values over the n/ log n trees defined by the special hash values in Si: consider each one of the trees of size log w rooted at level l\ and apply the above technique, assuming that t < loglog n . Define the set S2 of special
hash values to contain the hashes h?,...,hl , m2 = mx , at level
2 log log n
I2 = log/, = log log log n of the hash tree and answers of size / with logloglog n < t < loglog n can be authenticated super-efficiently at cost O(logloglog ri) . Proceed as above: at the z'-th step of the recursion one defines the set S, of additional special hash values, stopping before the log* n step of the recursion, effectively at level 2 of the tree (or at some other small constant level of the tree) and set S - hr \j Sx \j S2 ...SL .J 1 as the final set of special hash values, which is of Θ(«) size. In actuality, it is \S\ < n — 1, thus, S has size smaller than the trivial solution of setting as special every hash value in the tree.
[0061] . Referring to FIG. 2, the exemplary authentication structure is shown. The set S of special hash values in the hash tree is defined recursively and consists of Θ(«) values residing at log* n levels: hr at level logn , {A,1,...} at level loglog n , {A,2,...} at level logloglog n , etc. Super-efficient verification is achieved since the answer Aq of a size at most t = loglog n to query q is verified by hashing along O(log/) nodes in the hash tree up to at most two special hash values and by optimally verifying that these hash values are indeed special, i.e., that they belong in S.
[0062] Using this exemplary authentication structure, authentication schemes can be designed that achieve super-efficient verification: the verification cost of an answer of size t is O(log /) hashing cost where O{\) special hash values need be authenticated, essentially as being members of the set of special hash values S. In what follows, the case where authentication in S is performed using signatures is considered; that is, each hash value in set S is separately signed by the source S. Updates on D are handled by appropriately updating the hash tree (by hashing and restructuring the tree along a leaf-to- root path; details in Section 4) and having the source sign the O(log* n) updated special values. Replay attacks may be eliminated, for example, by using time-stamps — such as a standard solution known in the literature — to check the freshness of a valid signature. For hash-based authentication, i.e., in the most practical and widely used setting where only cryptographic hashing is used to produce the authentication strings, the exemplary authentication structure achieves optimal performance with respect to both the verification and the update costs. The following result summarizes the performance of the new, exemplary structure and signature-based authentication scheme (proof in Appendix).
[0063] Theorem 1. There exists a super-efficient authentication scheme for range search queries over a set ofn key-value pairs with the following performance, where t denotes the number of pairs returned by a query:
[0064] — a range query is answered in O(\og n + f) time;
[0065] . — the answer proof has size O(log /) and consists of two signatures, two keys, and C(log t) hash values;
[0066] — the answer to a range query is validated by performing O(t) arithmetic computations, O(J) hash operations, and 0(1) signature verifications;
[0067] — an update results in O(log n) hash operations (at both the source and the responder), O(log* h) signature generations (at the source) and O{η) signature renewals (at the source).
[0068] This authentication scheme is secure with respect to data authentication, safe with respect to replay attacks, and optimal with respect to super-efficient verification in the hash-based data authentication model.
4 Exemplary Super-Efficient Dynamic Authentication Schemes
[0069] In this section, an alternative technique is proposed that reduces the high update cost of the previous, optimal but less practical, technique. Essentially, a super- efficient dynamic authentication structure for range queries is presented, which provides reasonable trade-offs between the update and query costs.
[0070] The previous section described the construction of a (perfectly balanced) hash tree for a set D of n key-value pairs that encodes information about some- appropriate for the authentication of range search queries— sets of relations (the key-value and successor relations in D), and a set S of O(n) special hash values was defined that are sufficient to support super-efficient answer verification, provided there is an optimal (in terms of verification) technique for authenticating set membership queries. Recall that for any query, there are at most two special hash values, out of the total O(ri), that need to be authenticated as authentic members of S, and note that only queries with positive answer need to be authenticated: a special hash value must be verified to be in set S (i.e., belong to the set of special hash values S).
[0071] The new authentication structure is now described. The main idea is to use a dynamic accumulator for authenticating set membership queries for the set of special hash values S. This is performed as follows: the set S of special hash values is accumulated to accumulation value a and a is signed by the source. Then, verifying that a special hash value belongs in 5 is performed in two steps, and still in optimal fashion (0(1) verification cost): first, the hash value together with at least one membership witness are used to verify that the hash value was used by the accumulator in producing a and, second, the signature on a is verified. For security reasons, only the source knows the trapdoor information of the accumulator; the responder does not know this trapdoor. It follows that the verification is (as in the construction of the previous section) super-efficient.
[0072] The dynamization of the structure (i.e., how updates on the data set can be handled) is now described. Assume for simplicity that only values are updated, that is, no keys are inserted or deleted in D . After any update of this type in the data set D , one ends up rehashing over a (appropriate for the update operation) leaf-to-root path in the hash tree. Thus 0(log* ri) special hash values change. One removes the old hash values from the accumulation a and adds the new ones into this, i.e., to perform 0(log* ri) element deletions and insertions. Inserting and deleting elements in an accumulator involves some computational cost for updating the new accumulation and for updating the set-membership witnesses of all the elements (e.g., with one or at least one set- membership witness per element). Suppose that the witnesses of the O(n) accumulated special hash values are explicitly maintained in the source and the responder. In a highly dynamic setting, updates can be of cost O(ri) : the reason is that after any update all n membership witnesses must be updated. The problem of the high update cost becomes more challenging for deletions, especially under the necessary assumption that the responder cannot use the trapdoor information. However, using the RSA accumulator and certain algorithmic techniques one can achieve reasonable update and query costs. The following result summarizes the performance of this new, exemplary authentication scheme (proof in the Appendix, below).
[0073] Theorem 2. There exists a dynamic super-efficient authentication structure for range search queries over a set of n key-value pairs with the following performance, where t denotes the number of pairs returned by a query:
[0074] — a range query is answered in 0(log n + t) time;
[0075] — the answer proof has size <9(logO and consists of one signature, two field elements, two keys and O(iogt) hash values;
[0076] — the answer to a range query is validated by performing O(t) arithmetic computations, O{t) hash operations and OQ) modular exponentiation and verifying 0(1) signatures; and
[0077] - an update results in O(log ri) hash operations (at both the source and the responder), 0(>/« log* «) modular operations and 0(1) signature generations (at the source). [0078] This authentication structure is secure with respect to authentication and safe with respect to replay attacks.
5 Detection and Elimination of Replay Attacks
[0079J In Section 3, an authentication structure for one-dimensional range search queries was presented that provides super-efficient answer verification asymptotically optimally in the hash-based data authentication model. In particular, it was seen that the source needs to authenticate a set 5 of Ω(jϊ) special hash values in order to achieve verification costs that are independent of the size n of the data set. Additionally, Theorem 1 states that when signatures are used to optimally (i.e., in constant time) authenticate elements of set S, the update cost at the source S is O(n), because all signatures need to be refreshed to eliminate replay attacks. On the other hand, Theorem 2 states that if the RSA accumulator is additionally used, the update cost can be reduced to
Figure imgf000026_0001
but now this cost is incurred at both the source S and the responder R. Both schemes preserve the super-efficient verification and replay-attack safety requirements. It is interesting to examine if one can further improve the update costs and design an authentication scheme that achieves different trade-offs.
[0080] In this section, a new, exemplary scheme is proposed in the three party authentication model (S, R, U) that achieves efficient update costs at S and R (only logarithmic in ri) and super-efficient verification costs at U (as before), but uses an alternative solution to the replay attack problem. In particular, the security requirement is slightly relaxed with respect to the time when replay attacks are detected and replayed data is rejected. As before, invalid answers are immediately rejected by U, but answers are checked to be consistent with the update history in an off-line fashion. An exemplary technique is introduced which implements an auditing mechanism and provides delayed consistency checking for detecting and effectively eliminating replay attacks. This mechanism can be used to augment the exemplary authentication scheme of Section 3, so that U can immediately check any received answer for correctness and, at any later time, check, in a batch, all received answers for freshness.
[0081] Justification. Delayed consistency checking is a useful property in application areas where the freshness of answers is not critical to be verified in real time. In many applications, risk management requires that invalid responses must be caught, but this determination does not always have to be immediate, as long as it is certain and sufficiently near-term. Indeed, such swift and sure justice is an ideal circumstance for risk management purposes.
[0082] Auditing Mechanism Model. In the exemplary auditing mechanism, the delayed consistency checking is performed by the user U and in collaboration with the source S, without any direct interaction between the two, however. The auditing mechanism corresponds to securely, compactly and efficiently encoding a series of transactions with the responder R, i.e., updates and queries over data set D, at S and U, respectively. In particular, S maintains an update audit state ∑u, that encodes the history of updates, through information reported after update transactions with R: for any update M performed on the data set D, an update trail Tu is provided to S by R that is used to update ∑u through operation 'upd_u_state'. Similarly, U maintains a query audit state ∑q, that encodes the history of queries, through information reported after query transactions with R: for any query q issued on D and returned answer-proof pair, a query trail Tq is provided to t/by R that is used to update ∑q through operation 'upd_q_state'. These trails correspond to "receipts" that the auditing mechanism collects. This series of updates of the states ∑u and ∑q corresponds to the computation phase of the auditing mechanism.
[0083] Verification of the consistency of the two transaction series (update and query) and, consequently, replay attack detection are performed by Um the audit phase. At any point in time (e.g., predefined or decided instantly), U can invoke a request for checking the consistency of the reported transactions with the current set D that resides at R. This is performed at U through operation 'audit', which receives as input the current audit query state ∑q of U and the current audit update state ∑u of S, appropriately updated given the current data set D (provided to S by R), and accepts or rejects its input, accordingly verifying the consistency of transactions. After an audit operation that accepts its input, the audit state remains unchanged and a new computation phase begins. If it rejects, the states are reset and the next computational phase starts for a new data set: in this case, the data source S is responsible for creating the new data set at R. The triplet of algorithms (upd_u_state, upd_q_state, audit) along with the protocols for formatting the trails is called an auditing scheme. [0084] Security. An auditing scheme (upd_u_state, upd_q_state, audit) is secure if it satisfies the following property: operation audit accepts its input if and only if no malicious action has been performed by R, that is, all query-answer pairs verified by U are consistent with the update history of the data set D and its current states computed using operations upd_u_state, upd_q_state. Auditing scheme (upd_u_state, upd_q_state, audit), in particular, is secure if the following requirements (for computational bounded responder R) are satisfied: (i) completeness, dictating that all valid update and query transactions yield through operations upd_u_state, upd_q_state audit states that when checked by audit with a valid (not corrupted by R) data set D always result in accepting; and (ii) soundness, dictating that when audit accepts its inputs, then the audit states correspond to transactions of valid update and query operations subject to the current state of the data set.
[0085] Usage. Next consider how a secure auditing scheme can be used to detect and eliminate replay attacks. For example, augment the authentication scheme of Section 3 with an auditing scheme (upd_u_state, upd_q_state, audit). After updates, along with the update at S and R of the underlying authentication structure, S runs upd u state to update its update audit state, but now no signature refreshing is performed: only O(log* n) hash values are signed by S. After queries, along with the answer verification, U also runs upd_q_state to update its query audit state. Suppose that responder R launches a replay attack at some point in time. This attack will be detected at some point in the future by U: consider the first audit phase that occurs after the attack. By the security property of the auditing scheme, audit will reject its input. Thus, a rejecting audit phase is equivalent to detecting a replay attack launched by R. One has that a misbehaving R who does replay attacks is always caught and exposed to its victim U. Note that the use of a secure auditing scheme in such a setting does not provide authentication: the auditor cannot pinpoint which query-answer pairs were replayed.
[0086] An Efficient Secure Auditing Scheme. Next described is how to construct a secure auditing scheme. A simple cryptographic solution is used that is inspired from efficient and secure cryptographic mechanisms that provide off-line memory checking. In off-line memory checking, a trusted checker checks the correctness (or consistency) of an untrusted memory, where data is written in and read from the memory through operations ' load' and ' store' . The checker maintains some constant-size state information and augments the data that is written into the untrusted memory, for example, with time-stamps, such that at any point in time, a check can be performed on the memory correctness. The idea is to use a cryptographic primitive .4 for generating and updating this state information, as a short description of the memory history. This primitive can produce short digests of large sets in an incremental fashion (that is, where elements can be inserted in the set and the new digest can be accordingly updated in (9(1) time without recomputing from scratch) and is used as follows. After any (augmented) load or store operation performed in the memory, a special encoding of the operation is created and securely enclosed in the state information through A. In particular, two separate digests are maintained over two sets: a first set encodes the "load" history of the memory (i.e., reading operations); the second set encodes the "store" history (i.e., writing operations) of the memory. Any operation results in updating both sets, e.g., a load(i) operation will add the read item dt in the "load" history and the written item d, (but with a new time-stamp) in the "store" history. The crucial observation is that if the memory is correct, the encodings of the two sets are such that the produced digests are the same when the check is performed. By choosing the cryptographic primitive A such that it is collision-resistant, meaning that its computationally infeasible to find distinct sets that produce the same digest, the memory checking problem is reduced to an equality testing problem (subject to an appropriate encoding for the operations in the memory). Such primitives A for incrementally computing collision-resistant digests of sets exist; e.g., e - biased hash functions.
[0087] Next an exemplary design of an efficient secure auditing scheme is presented that is based on the memory checking idea. Applying this idea in the three party model is a challenging task, because the checker here should be collaboratively implemented by S and U; and furthermore, it is desirable that super-efficiency at U is not harmed.
[0088] The RSA accumulator is used as a collision-resistance primitive A for incrementally computing digests over sets and A(S) is used to denote the digest of set S. Thus, given A(S) and a new element x not in S, A(S U X) can be computed in 0(1 ) time; also, it is hard to find sets S≠S' such that A(S) =A(S"). That is, collision resistance holds trivially, since the accumulator is one-way under the strong RSA assumption. A is used to define the audit states ∑u and ∑q stored by S and U, respectively. The main idea is as follows. The set S of special values defined over the exemplary super-efficient authentication structure of Section 3 may be viewed as an untrusted memory: with memory locations corresponding to the unique identifiers of the tree nodes (according to a fixed ordering, e.g., in-order tree traversal) and memory items corresponding to the special hash values and their signatures.
[0089] Every transaction (update or query) uniquely defines a subset of special hash values in the tree: for updates, the hashes in the <9(log* ή) special tree levels in the corresponding leaf-to-root path; for queries, the two hashes of the lowest special tree level that exactly covers the answer. These two subsets of special hashes respectively define the update trail Tu and the query trail Tq that are returned by R. For each tree node v in a subset, the tuple (idv, hv, σv, tv) is included in the corresponding trail. Here, idv is the identifier of v, hv the hash value, σv the corresponding signature and tv the associated timestamp. Algorithms upd_u_state and upd_q_state process these trails to update the audit states ∑u = (A11J, A11J) and ∑q = (Aqj, AqJ)\ each audit state is a pair of accumulations, one for "load" history, one for "store". The tuple of v is encoded (according to fixed way) to a unique string xv (e.g., by applying a one-way hash function) and for each tuple in the trails the states are updated to ∑'u = (A 'uj, A 'UJ) and ∑'q = (A ',/, A 'q>s), as follows: A
Figure imgf000030_0001
Auj e(xv) mod φ(N), A '„ = AUJ e(x'v) mod ≠(N) , A ',./ = A^ mod N, A > q, = A^ mod N, where e(-) is a function for computing prime representative values (as in the proof of Theorem 2), /Vis the RSA modulo, and* 'v is the encoding that corresponds to ΛΓV but with a fresh time-stamp (e.g., monotonically increasing, synchronized for all parties) and a new identifier, hash value and signature (update case only).
[0090] The audit phase is as follows. First R forwards the request for the audit to
S, along with a final audit trail that contains a tuple for each special node in set S (final reading of memory). S updates its update audit state (only the "load" component), signs the final ∑u and forwards this to the U, through R. Given (Auj, AUJ) and (Aqj, AqJ), audit
(run at U) accepts if and only if: A*/' mod N= A^" mod N.
[0091] FIG. 3. depicts an exemplary system for detection and elimination of replay attacks. The auditor A keeps audit state Σ of size O(\) about the database DB, which is incrementally updated after any updates or queries on the database occur using respectively update trails T11 and query trails Tq provided by the responder R and user U
(computation phase). At certain points in time, the auditor checks the consistency of its local audit state Σ with the current database DB residing in R , performing an off-line correctness check on the history of transactions on the database (audit phase). Replay attacks are detected, since old data, although verifiable at U , correspond to invalid transactions. Replay attacks are effectively eliminated, since they are detected and expose possible malicious actions by R .
[0092] Note that the exemplary embodiment shown in FIG.3 illustrates the use of a third party auditor Λ. As described above, in other exemplary embodiments, functions of the third party auditor A may be fulfilled by the user U and/or source S (e.g., via the responder R).
[0093] The following theorem summarizes the results of this section (proof sketch in the Appendix below).
[0094] The following theorem summarizes the results of this section.
[0095] Theorem 3. There exists a hash-based, dynamic, super-efficient and audited authentication structure for range search queries over a set of size n with the following performance, where t denotes the number of data items returned by a query:
[0096] — a query is answered in O(log n + t) time;
[0097] — the answer proof has size 0(logf) and consists of two signatures, two keys and 0(logO hash values;
[0098] — the answer to a query is validated by performing O{t) hash operations and verifying 0(1) signatures;
[0099] — an update results in 0(log ri) hash operations (at both the source and the responder), 0(log* ή) signature generations (at the source);
[00100] — the auditing scheme stores 0(1) audit state information, performs
0(log n) work per update (at the source) and 0(1 ) work per query (at the user) during the computational phase and performs O(ri) work (at the source) and 0(1) work (at the user) during the audit phase; and
[00101] — replay attacks performed by the responder are always detectable by the auditor (e.g., the user or a third party auditor) at the audit phase.
6 Extension To Other Query Types
[00102] Consider an exemplary authentication scheme based on the construction of
Section 3, which is an exemplary authentication structure for range search queries. As previously mentioned, many other type of queries are related to range searching or consist of more complex search problems that eventually boil down (e.g., may be reduced) to range searching. This suggests that the exemplary authentication schemes discussed herein can be used as general design tools for achieving super-efficient authentication of other type of queries. Indeed, all that is needed is to consider a (different) hashing scheme over the data set D (computed along the hash tree), which should be appropriate for the target type of queries. Similar to the construction in Section 3, the hashing over D should securely encode the relations that are sufficient for verifying the answers to the queries in consideration. Super-efficiency would then follow simply by authenticating at most two special hash values at the appropriate special level of the tree, depending on the exact range defined by the query.
[00103] Two types of queries that fall into this category are briefly discussed.
Consider the class of queries that ask for any associative function over a field of data records that lie in a query range (possibly, according to some other field of the records). The canonical members of this class are aggregate queries, such as SUM, MAX, and AVG, as non-limiting examples. A hashing scheme appropriate for these queries could be constructed such that it encodes the information (relations) about ranges, corresponding aggregation values and neighboring data records. In particular, the hash tree node v defining subtree Tv stores a hash value that encodes information about the aggregation value av computed over the records that correspond to the leaves of Tv, the left-most and rightmost records in Tv and, also, their predecessor and successor records (not in Tv), respectively. Using this hashing scheme, these queries can be authenticated by considering the corresponding allocation nodes in the query range; and again, any query range has at most two allocation nodes in some special level of the tree. Similarly, one can use the exemplary schemes for the class of path property queries — all related to range searching. The exemplary hashing scheme of Section 3 and, accordingly, all of the exemplary authentication schemes can be extended to these classes of queries (e.g., aggregation queries and path property queries), as non-limiting examples.
7 Appendix
[00104] Cryptographic Primitives.
[00105] Collision-resistant hashing. A length-reducing cryptographic hash function h is used over variable-length strings, such that it is computationally infeasible to compute strings x ≠ y such that h(x) = h(y). Hashing operations are particularly lightweight (block-cipher type of computations).
[00106] Hash-tree. An authentication tree, based on the construction due to
Merkle, is used which hierarchically defines a collection of hash values (stored at internal nodes) computed over a data set (stored at leaves). For a set of n elements, a hash tree is a balanced binary tree, where each node stores a hash value computed using a collision- resistant hash function: leaves store the hash of the corresponding element and internal nodes store the hash of the concatenation of the hash values of their children.
[00107] Signatures. Any signature scheme secure against adaptive chosen-message attack may be used. Typically, signing and verifying a signature involves more expensive operations (e.g., modular exponentiations).
[00108] Accumulators. RSA-based dynamic accumulators are used in conjunction with a dynamization scheme for optimally verifying set membership. These cryptographic primitives produce an efficiently computed accumulation of a set, along with short and efficiently verifiable witnesses for all accumulated items. Set-membership takes 0(1) time and is one-way: under the strong RSA assumption, it is computationally infeasible to find not accumulated in the set items and fake witnesses that pass the verification test. The underlying computations involve modular exponentiations and multiplications.
[00109] Proof of Theorem 1. The complexity properties follow directly by the construction of the authentication structure. For the static case, where no updates are performed to data set D, one can simply use signatures to authenticate the special hash values in S. In this setting, signatures provide an optimal solution, because every special hash value can be authenticated in 0(1) time (recall that at most two special hash values need be authenticated for any query). The authentication scheme is as follows: every hash in S is signed by the source and on any query q the proof for answer Aq of size t contains (9(log /) hash values, associating the pairs in θ(g) with two special hash values at level /, where / = log(<7) n if log(<?) n < t < log*9"1* n. The verification cost is O(log t) hashing cost and at most two signature verifications.
[00110] The exemplary authentication structures can achieve super-efficient verification based on the use of O(n) special digests defined hierarchically over the data set. It is shown that this design is optimal for hash-based authentication, i.e., when only cryptographic hashing is used to produce the digests. The proof is based on a result from previous work, saying that for hash-based authentication of set-membership queries, super-efficient verification can be achieved only at an "exponential" growth of the signature cost. See R. Tamassia and N. Triandopoulos. Computational bounds on hierarchical data processing with applications to informationsecurity. In Proc. Int. Colloquium on Automata, Languages and Programming (ICALP), volume 3580 of LNCS, pagesl53-165. Springer- Verlag, 2005. More specifically, this work has showed that, in any hash-based authentication scheme for membership queries in a set of size n, super-efficient verification (sub-logarithmic on «) can be achieved only if Ω(ri) digests are authenticated. In particular, for any e > 0, even O[nl~e) special hash values do not suffice to yield o(log n) verification cost; they still incur £?(log n) (simply efficient) verification cost. This result is used to show the first lower bound on the number of digests (e.g., signatures) required for super-efficient range-search verification. One can reduce membership queries to range search queries (indeed, they are a special case of range search queries), and conclude that the above result holds also for range searching. Accordingly, for super-efficient verification of answers to range queries one may need Ω(ri) special hash values.
[00111] In the dynamic case, where the data set evolves over time through update operations, the above static structure can be dynamized, but at high (linear in the data set size n) update cost. In particular, after any update in the data set D, the hash tree is updated accordingly, where essentially 6>(log «) hash values along with two leaf-to-root paths are recomputed and the O(log* n) new special hash values along these paths are resigned by the source (the details are given in Section 4). However, in order for this scheme to be secure against replay attacks all O(n) (signed) special hash values must be resigned. One needs to resign these values in order to defeat the possibility that an old, out-of-date special hash value is used to (successfully but incorrectly) verify old invalid data. Indeed, without global resigning of the currently valid set S, replay attacks can be launched by the responder; a serious threat in data authentication. Suppose that D is updated and only O(log* ri) special hash values are resigned by the source. The responder can simply cache (and not destroy) the old signed special hash values. In this case, verifiable digests of old data can be used as proofs for answers to queries over D, thus allowing the verification of previously authentic but currently invalid data. This is against the soundness requirement.
[00112] How can one invalidate old signed digests? One standard, exemplary solution is the use of time-stamps. The idea is very simple: every signature is produced on time-stamped digests, that is, a time-stamp is appended to a digest before it is signed by the source. It turns out that this simple and easily implementable modification in the signing process can eliminate replay attacks: standard to a fixed and known to the users time-quantum, verifiable signatures on digests are accepted by the verification algorithm only if their timestamps belong in the current (at the time of verification), most recent, time-quantum. Alternatively, the user can judge after the signature verification whether the signature freshness (expressed by the time-stamp) is acceptable. This solution removes the threat of replay attacks, but it introduces a signature refreshing requirement. That is, after any update (or many updates executed in a batch mode) or at the end of the current time-quantum, a new (fresh) time-stamp needs to be associated with all the special hash values and, thus, all signatures need to be renewed — otherwise, not every correct answer can be verified, which is against the completeness requirement. Overall, in the exemplary authentication scheme the update cost includes: O(log t) hashing cost, <9(log* ri) signature cost and O(ji) signature renewal cost, thus, O{ii) signature cost in total.
[00113] Next it is shown that this technique may be optimal for hash-based data authentication resilient to replay attacks. Suppose that one wishes to eliminate replay attacks using only cryptographic hashing (and signatures), and without employing time- stamping. The problem can be formulated as follows. One wishes to design a mechanism that allows a user to validate the freshness of a verified signature received by the responder, even when the responder is allowed to cache old signed hash values. Consider the special case of the problem, where the set S of special hash values is fixed over time (only values of key- value pairs change over time). Then the problem of the verification of signature freshness is equivalent to a particular data authentication problem. Indeed, consider the following find-last data authentication problem with parameter m, where update operation 'insertType(τ, JC)' inserts an element x of type r e {τ\, ... , τ,,,} in the data structure (there are m in total types), and query operation 'last(τ )' returns the element x of type τ that was lastly inserted in the data structure (i.e., most recently). One can see that verifying the signature freshness corresponds to verifying the answer of a last(-) query and vice-versa.
[00114] One can show that this type of queries cannot be authenticated using the
"hash and sign" paradigm without essentially authenticating membership to a set of size O(m). One simply observes that the authentication of the set-membership problem of size m is reduced to the authentication of the find-last problem with parameter m: one can insert the m elements of the set in the data structure each as an element of distinct type, according to some fixed 1-1 mapping. An existing element is the set is tested for set- membership simply by answering a last(f) query. The argument is complete, in view of the lower bounds for the set-membership problem in the hash-based data authentication model and the fact (shown before) that for super-efficient answer verification m — Ω{ri) special hash values may be needed.
[00115] Proof of Theorem 2. The complexity due to authentication holds because of the use of the accumulator. The RSA accumulator is used to accumulate the set S = {/?, , h2 ,..., hm } of special hash values, where m = O(ri). The accumulation function is modular exponentiation, where the RSA modulo is used. That is, set S = hx ,h2,...,hm is accumulated to accumulation value A(S) = a = s^h^b^-^h^ mod N, where N=pq,p, q are strong primes and s is relative prime to N and e{h,) is an efficiently computed prime representative value of A1. The trapdoor function (known only to S) in this case is φ(N), since for any x > 0, s* mod N = s""0*^ . For any element ht e S , the witness w, of its membership in S is value A(S — {Λ/}) and it can be efficiently verified by checking that W4'<i) _ ^s^ Accumulation A(S) is the unique authentication string that is signed by the source. Accordingly, answer verification is still super-efficient as in the proof of Theorem 1 ) : only now the two special hash values that authenticate the query are first authenticated to be members of A(S), which is in turn authenticated by verifying its signature. Also, using time-stamps when signing A(S) provides security against replay attacks.
[00116] An update in D results in 0(log* ή) updates in S, thus O(log* ri) insertions and deletions in A(S). Previous update techniques are used for the RSA-based accumulator, where it is shown how to achieve an efficient tradeoff between update cost and query cost, namely, how, by allowing some form of preprocessing at S, one can achieve
Figure imgf000037_0001
for updating A(S) at source S after an update in set S and for computing the element witnesses at responder R after a query in S. See M. T. Goodrich, R. Tamassia, and J. Hasic. An efficient dynamic and distributed cryptographic accumulator. In Proc. Of Information Security Conference (ISC), volume 2433 of LNCS, pages 372—388. Springer- Verlag, 2002. Thus, one can use optimal in terms of verification cost cryptographic primitives to authenticate membership in the set of special hash values
S at a
Figure imgf000037_0002
costs incurred after updates and during the generation of the answers' proofs.
[00117] Finally, how to maintain the hash tree in the most dynamic setting is described, where not only key-value pairs update their values but also key-value pairs are inserted to or deleted from the data set D. The hash tree is maintained in a weight- balanced binary tree, using a BB[a]. The main difference is that now one no longer has a perfect height-balanced tree. Thus the definition of the set S of special hash values is slightly relaxed. At any level of the exemplary recursive construction membership in S is defined not by using the rigid method of an exact level in the tree. This would lead to very frequent changes in the set S, thus triggering updates in the accumulator at higher rates (essentially a rotation performed at a node while maintaining the balance of the tree could trigger a number of updates proportional to the size of the subtree defined by this node). Instead, membership in S is defined according to layers of certain width in the hash tree. Using this more flexible definitional method, one has it that the set S does not change very often. Note that after any update in D, one performs a leaf-to-root tree traversal where the necessary changes in the structure (rehashing, structural changes through rotations and updates in the accumulation α) are performed at cost at most Oy-Jn log* n). Actually, the log* n factor can be removed by not applying the changes sequentially.
[00118] Proof of Theorem 3. (Sketch.) The complexity follows from Theorem 2 and the use of the RSA accumulator in the auditing mechanism, where updates of the audit and query states each take constant time and where the final auditing operation also takes constant time. Security follows from the security of the RSA accumulator and the off-line checking method: the use of time-stamps in combination with the audit and query states reduce the replay-attack problem to a simple equality test. Note that in the exemplary scheme any special value in Sis already authenticated; thus, the only possible attach launched by the responder is a replay attack on the elements in S (i.e., old elements are used, or elements are used not in the correct chronological order).
8 Conclusion
[00119] Herein, exemplary data authentication structures have been considered in a setting where critical information is queried (e.g., in high rates) from a dynamic outsourced database that resides in an untrusted site. New approaches have been presented for query authentication, where, by decoupling the answer-generation and answer- verification procedures, one moves towards super-efficient answer verification, an important property for data authentication, given that many real-life applications involve the querying of critical data (e.g., financial) by computationally limited devices, for example. Exemplary authentication schemes for range search queries are described that achieve super-efficient answer verification, allow for efficient updates on the database and eliminate replay attacks from the database outsourcer. In some exemplary embodiments, any answer of size / is verified in time O(f), using only 0(1) modular exponentiations. Also, for eliminating replay attacks on old invalid data, exemplary authentication protocols are discussed that implement exemplary efficient auditing mechanisms that can perform an off-line check on the consistency of an outsourced database that reliably reports any malicious action from the outsourcer. The exemplary schemes may be extended to more general queries.
[00120] In one non-limiting, exemplary embodiment, a method includes: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree; and returning the answer and the proof. See FIG. 4.
[00121] A method as above, and including one or more of the following: wherein the corresponding hash tree comprises a plurality of log* n levels, where n corresponds to a number of data elements and log* corresponds to a nested expression of logarithmic functions having at least one such logarithmic function; where the set of predetermined signed hash values consists of Θ(n) values residing at log* « levels, where n corresponds to a number of data elements and log* corresponds to a nested expression of logarithmic functions having at least one such logarithmic function; where n corresponds to a number of data elements and t corresponds to a number of data elements returned by a query, where a query is answered in O(log n + 1) time, where an answer proof has a size O(log i) and the answer proof consists of two signatures, two keys and O(log i) hash values, where an answer to a query is validated by performing O{t) arithmetic computations, O(i) hash operations and 0(1) signature verifications, where an update results in O(log n) hash operations, O(log* n) signature generations and O(ri) signature renewals.
[00122] A method as in any of the above, further comprising: hashing, based on the answer and the zero or more first hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one predetermined signed hash value to determine a correspondence; and verifying at least one signature of the at least one predetermined signed hash value by verifying that the at least one predetermined signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree.
[00123] A method as in any of the preceding, wherein the at least one predetermined signed hash value consists of one or two predetermined signed hash values. A method as in any of the preceding claims, wherein the method is implemented by a computer program.
[00124] In another non-limiting, exemplary embodiment, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree; and returning the answer and the proof.
[00125] The computer program product of above and further including one or more of further improvements described herein.
[00126] In another non-limiting, exemplary embodiment, an electronic device includes: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree. The electronic device as above, embodied as a responder in a network. The electronic device as above and further including one or more of further improvements described herein.
[00127] In another non-limiting, exemplary embodiment, a method includes: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first signed hash value for a corresponding hash tree and zero or more second hash values; hashing, based on the answer and the zero or more second hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one first signed hash value to determine a correspondence; and verifying at least one signature of the at least one first signed hash value by verifying that the at least one first signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree. Wherein the method is implemented by a computer program. The method as above and further including one or more of further improvements described herein. See FIG. 5.
[00128] In another non-limiting, exemplary embodiment, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first signed hash value for a corresponding hash tree and zero or more second hash values; hashing, based on the answer and the zero or more second hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one first signed hash value to determine a correspondence; and verifying at least one signature of the at least one first signed hash value by verifying that the at least one first signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree. The computer program product as above and further including one or more of further improvements described herein.
[00129] In another non-limiting, exemplary embodiment, a method includes: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conj unction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values; and returning the answer and the proof. See FIG. 6.
[00130] A method as above, further comprising: accumulating the set of predetermined hash values to obtain the accumulation value; and signing the accumulation value. A method as in any above, where n corresponds to a number of data elements and t corresponds to a size of an answer returned for a query, where a query is answered in O(log n + /) time, where an answer proof has a size O(log t) and the answer proof consists of one signature, two field elements, two keys and O(log /) hash values, where an answer to a query is validated by performing O(t) arithmetic computations, O(f) hash operations, O(l) modular exponentiation and O(l) signature verifications, where an update results in 0(log «) hash operations, O( Vn log* ri) modular operations and 0(1) signature generations. A method as in any above, wherein the method is implemented by a computer program. A method as in any above and further including one or more of further improvements described herein.
[00131] In another non-limiting, exemplary embodiment, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values; and returning the answer and the proof. A computer program product as in any above and further including one or more of further improvements described herein.
[00132] In another non-limiting, exemplary embodiment, an electronic device includes: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query on the data set comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values. An electronic device as in any above and further including one or more of further improvements described herein.
[00133] In another non-limiting, exemplary embodiment, a method includes: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first hash value for a corresponding hash tree and at least one membership witness for the at least one first hash value; hashing, based on the answer, along at least one second hash value of at least one node of the corresponding hash tree to obtain at least one predetermined third hash value; comparing the obtained at least one predetermined third hash value to the at least one first hash value to determine a correspondence; and verifying the proof by utilizing the at least one first hash value and the at least one membership witness to verify that the at least one first hash value was utilized to obtain a predetermined accumulation value and verifying a signature on the predetermined accumulation value, wherein the predetermined accumulation value corresponds to a value obtained by accumulating a set of predetermined third hash values. A method as above, wherein each predetermined third hash value of the set of predetermined third hash values is unsigned. A method as in any above, wherein the method is implemented by a computer program. A method as in any above and further including one or more of further improvements described herein. See FIG. 7.
[00134] In another non-limiting, exemplary embodiment, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first hash value for a corresponding hash tree and at least one membership witness for the at least one first hash value; hashing, based on the answer, along at least one second hash value of at least one node of the corresponding hash tree to obtain at least one predetermined third hash value; comparing the obtained at least one predetermined third hash value to the at least one first hash value to determine a correspondence; and verifying the proof by utilizing the at least one first hash value and the at least one membership witness to verify that the at least one first hash value was utilized to obtain a predetermined accumulation value and verifying a signature on the predetermined accumulation value, wherein the predetermined accumulation value corresponds to a value obtained by accumulating a set of predetermined third hash values. A computer program product as in any above and further including one or more of further improvements described herein.
[00135] In another non-limiting, exemplary embodiment, a method includes: maintaining, by a data source, an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; maintaining, by a query source, a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and invoking, by the query source, an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries. A method as in any above, wherein the method is implemented by a computer program. A method as in any above and further including one or more of further improvements described herein. See FIG. 8.
[00136] A method as in any above, wherein the update representation comprises a first compact cryptographic representation and wherein the query representation comprises a second compact cryptographic representation. A method as in any above, further comprising: receiving a query comprising one of a range query or an aggregate query, wherein the query is on the data set; determining an answer corresponding to the query; hashing, based on the answer, along at least two hash values of at least two nodes of a corresponding hash tree to obtain at least one predetermined signed hash value; verifying the answer by verifying that the at least one predetermined signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree; and returning the verified answer.
[00137] A method as in any above, further comprising: receiving a query comprising one of a range query or an aggregate query, wherein the query is on the data set; determining an answer corresponding to the query; hashing, based on the answer, along at least two hash values of at least two nodes of a corresponding hash tree to obtain at least one predetermined hash value; verifying the answer by verifying that the at least one predetermined hash value is a member of a set of predetermined hash values by utilizing the at least one predetermined hash value and a membership witness to verify that the at least one predetermined hash value was utilized to obtain an accumulation value and verifying a signature on the accumulation value, wherein the accumulation value is obtained by accumulating the set of predetermined hash values, wherein each predetermined hash value of the set of predetermined hash values is unsigned; and returning the verified answer.
[00138] A method as in any above, where n corresponds to a number of data items and / corresponds to a number of data items returned for a query, where a query is answered in 0(log n + 1) time, where an answer proof has a size 0(log /) and the answer proof consists of two signatures, two keys and 0(log t) hash values, where an answer to a query is validated by performing 0(/) hash operations and 0(1) signature verifications, where an update results in 0(log ri) hash operations and 0(log* ri) signature generations, where an auditing scheme stores 0(1) audit states, where the auditing scheme performs 0(log ri) work per update at the data source and 0(1) work per query at the query source during a computational phase, where the auditing scheme performs OQi) work at the data source and 0(1) work at the query source during an audit phase, wherein replay attacks performed by the responder are always detectable by the query source at the audit phase. A method as in any above, wherein the audit process is invoked at a time when there is no unanswered query. A method as in any above, wherein there is no direct interaction between the data source and the query source.
[00139] In another non-limiting, exemplary embodiment, a computer program product comprises program instructions tangibly embodied on a computer-readable medium execution of which results in operations comprising: maintaining, by a data source, an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; maintaining, by a query source, a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and invoking, by the query source, an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries. A computer program product as above and further including one or more of further improvements described herein.
[00140] In another non-limiting, exemplary embodiment, a system includes: a data source configured to maintain an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; a query source configured to maintain a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and a responder, wherein the query source is further configured to invoke an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries. A system as above and further including one or more of further improvements described herein.
[00141] A system as in any above, wherein the update representation comprises a first compact cryptographic representation and wherein the query representation comprises a second compact cryptographic representation. A system as in any above, wherein the audit process is invoked at a time when there is no unanswered query, wherein there is no direct interaction between the data source and the query source.
[00142] Generally, various exemplary embodiments of the invention can be implemented in different mediums, such as software, hardware, logic, special purpose circuits or any combination thereof. As a non-limiting example, some aspects may be implemented in software which may be run on a computing device, while other aspects may be implemented in hardware. [00143] The foregoing description has provided by way of exemplary and non- limiting examples a full and informative description of the best method and apparatus presently contemplated by the inventors for carrying out the invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. For example, while various exemplary embodiments above refer to hashing over at most two special hash values, it should be appreciated that the exemplary embodiments of the invention are not all limited in such a manner and that, in fact, other exemplary embodiments may utilize more than two special hash values in the described manner. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.
[00144] Furthermore, some of the features of the preferred embodiments of this invention could be used to advantage without the corresponding use of other features. As such, the foregoing description should be considered as merely illustrative of the principles of the invention, and not in limitation thereof.

Claims

CLAIMSWhat is claimed is:
1. A method comprising: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree; and returning the answer and the proof.
2. A method as in claim 1 , wherein the corresponding hash tree comprises a plurality of log* n levels, where n corresponds to a number of data elements and log* corresponds to a nested expression of logarithmic functions having at least one such logarithmic function.
3. A method as in any of the preceding claims, where the set of predetermined signed hash values consists of Θ(«) values residing at log* n levels, where n corresponds to a number of data elements and log* corresponds to a nested expression of logarithmic functions having at least one such logarithmic function.
4. A method as in any of the preceding claims, where « corresponds to a number of data elements and / corresponds to a number of data elements returned by a query, where a query is answered in O(log n + t) time, where an answer proof has a size O(log f) and the answer proof consists of two signatures, two keys and O(log t) hash values, where an answer to a query is validated by performing O(f) arithmetic computations, O(t) hash operations and 0(1) signature verifications, where an update results in O(log ri) hash operations, O(log* ri) signature generations and O{n) signature renewals.
5. A method as in any of the preceding claims, further comprising: hashing, based on the answer and the zero or more first hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one predetermined signed hash value to determine a correspondence; and verifying at least one signature of the at least one predetermined signed hash value by verifying that the at least one predetermined signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree.
6. A method as in any of the preceding claims, wherein the at least one predetermined signed hash value consists of one or two predetermined signed hash values.
7. A method as in any of the preceding claims, wherein the method is implemented by a computer program.
8. An electronic device comprising: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, wherein the proof further comprises at least one predetermined signed hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree.
9. An electronic device as in claim 8, embodied as a responder in a network.
10. A method comprising: sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first signed hash value for a corresponding hash tree and zero or more second hash values; hashing, based on the answer and the zero or more second hash values, along at least one third hash value of at least one node of the corresponding hash tree to obtain at least one predetermined fourth hash value; comparing the obtained at least one predetermined fourth hash value to the at least one first signed hash value to determine a correspondence; and verifying at least one signature of the at least one first signed hash value by verifying that the at least one first signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree.
11. A method as in claim 10, wherein the method is implemented by a computer program.
12. A method comprising: receiving a query comprising one of a range query or an aggregate query; determining an answer corresponding to the query; determining a proof corresponding to the query and the answer, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values; and returning the answer and the proof.
13. A method as in claim 12, further comprising: accumulating the set of predetermined hash values to obtain the accumulation value; and signing the accumulation value.
14. A method as in claim 12 or 13, where n corresponds to a number of data elements and / corresponds to a size of an answer returned for a query, where a query is answered in 0(log n + /) time, where an answer proof has a size O(log i) and the answer proof consists of one signature, two field elements, two keys and 0(log f) hash values, where an answer to a query is validated by performing O(t) arithmetic computations, O{t) hash operations, 0(1) modular exponentiation and 0(1) signature verifications, where an update results in 0(log ri) hash operations, 0(vn log* «) modular operations and 0(1) signature generations.
15. A method as in claim 12, 13 or 14, wherein the method is implemented by a computer program.
16. An electronic device comprising: a memory configured to store a data set, zero or more first hash values and at least one predetermined hash value; and a data processor configured to receive a query comprising one of a range query or an aggregate query, to determine an answer corresponding to the query, to determine a proof corresponding to the query and the answer, and to return the answer and the proof, wherein the proof comprises zero or more first hash values for zero or more nodes of a corresponding hash tree, at least one predetermined hash value that corresponds to a value obtained by hashing along at least one first hash value of the corresponding hash tree, and at least one membership proof for the at least one predetermined hash value, wherein the at least one membership proof is configured to be utilized in conjunction with a predetermined signed accumulation value to verify that the at least one predetermined hash value is a member of a set of predetermined hash values.
17. An electronic device as in claim 16, embodied as a responder in a network.
18. A method comprising : sending a query comprising one of a range query or an aggregate query; receiving an answer and a proof corresponding to the query, wherein the proof comprises at least one first hash value for a corresponding hash tree and at least one membership witness for the at least one first hash value; hashing, based on the answer, along at least one second hash value of at least one node of the corresponding hash tree to obtain at least one predetermined third hash value; comparing the obtained at least one predetermined third hash value to the at least one first hash value to determine a correspondence; and verifying the proof by utilizing the at least one first hash value and the at least one membership witness to verify that the at least one first hash value was utilized to obtain a predetermined accumulation value and verifying a signature on the predetermined accumulation value, wherein the predetermined accumulation value corresponds to a value obtained by accumulating a set of predetermined third hash values.
19. A method as in claim 18, wherein each predetermined third hash value of the set of predetermined third hash values is unsigned.
20. A method comprising: maintaining, by a data source, an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; maintaining, by a query source, a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and invoking, by the query source, an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries.
21. A method as in claim 20, wherein the update representation comprises a first compact cryptographic representation and wherein the query representation comprises a second compact cryptographic representation.
22. A method as in claim 20 or 21 , further comprising: receiving a query comprising one of a range query or an aggregate query, wherein the query is on the data set; determining an answer corresponding to the query; hashing, based on the answer, along at least two hash values of at least two nodes of a corresponding hash tree to obtain at least one predetermined signed hash value; verifying the answer by verifying that the at least one predetermined signed hash value belongs to a set of predetermined signed hash values in the corresponding hash tree; and returning the verified answer.
23. A method as in claim 20 or 21, further comprising: receiving a query comprising one of a range query or an aggregate query, wherein the query is on the data set; determining an answer corresponding to the query; hashing, based on the answer, along at least two hash values of at least two nodes of a corresponding hash tree to obtain at least one predetermined hash value; verifying the answer by verifying that the at least one predetermined hash value is a member of a set of predetermined hash values by utilizing the at least one predetermined hash value and a membership witness to verify that the at least one predetermined hash value was utilized to obtain an accumulation value and verifying a signature on the accumulation value, wherein the accumulation value is obtained by accumulating the set of predetermined hash values, wherein each predetermined hash value of the set of predetermined hash values is unsigned; and returning the verified answer.
24. A method as in claim 22 or 23, where n corresponds to a number of data items and t corresponds to a number of data items returned for a query, where a query is answered in 0(log n + /) time, where an answer proof has a size 0(log /) and the answer proof consists of two signatures, two keys and 0(log 0 hash values, where an answer to a query is validated by performing O(f) hash operations and 0(1) signature verifications, where an update results in 0(log «) hash operations and 0(log* ri) signature generations, where an auditing scheme stores 0(1) audit states, where the auditing scheme performs 0(log «) work per update at the data source and 0(1) work per query at the query source during a computational phase, where the auditing scheme performs OQi) work at the data source and 0(1) work at the query source during an audit phase, wherein replay attacks performed by the responder are always detectable by the query source at the audit phase.
25. A method as in any one of claims 20-24, wherein the audit process is invoked at a time when there is no unanswered query.
26. A method as in any one of claims 20-25, wherein there is no direct interaction between the data source and the query source.
27. A method as in any one of claims 20-26, wherein the method is implemented by a computer program.
28. A system comprising: a data source configured to maintain an update audit state comprising an update representation obtained from a history of updates to a data set stored by a responder; a query source configured to maintain a query audit state comprising a query representation obtained from a history of queries and corresponding verified answers for queries by the query source on the data set stored by the responder; and a responder, wherein the query source is further configured to invoke an audit process comprising the query source receiving the update audit state from the data source via the responder and the query source utilizing the update audit state and the query audit state to check consistency of updates, queries and corresponding verified answers to said queries.
29. A system as in claim 28, wherein the update representation comprises a first compact cryptographic representation and wherein the query representation comprises a second compact cryptographic representation. .
30. A system as in claim 28 or 29, wherein the audit process is invoked at a time when there is no unanswered query, wherein there is no direct interaction between the data source and the query source.
PCT/US2007/017042 2006-07-28 2007-07-30 Super-efficient verification of dynamic outsourced databases WO2008014002A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US83387806P 2006-07-28 2006-07-28
US60/833,878 2006-07-28

Publications (2)

Publication Number Publication Date
WO2008014002A2 true WO2008014002A2 (en) 2008-01-31
WO2008014002A3 WO2008014002A3 (en) 2008-10-16

Family

ID=38982143

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/017042 WO2008014002A2 (en) 2006-07-28 2007-07-30 Super-efficient verification of dynamic outsourced databases

Country Status (1)

Country Link
WO (1) WO2008014002A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2338127A1 (en) * 2008-08-29 2011-06-29 Brown University Cryptographic accumulators for authenticated hash tables
US8871471B2 (en) 2007-02-23 2014-10-28 Ibis Biosciences, Inc. Methods for rapid forensic DNA analysis
WO2019168557A1 (en) * 2018-02-27 2019-09-06 Visa International Service Association High-throughput data integrity via trusted computing
US10511440B2 (en) 2015-02-20 2019-12-17 Telefonaktiebolaget Lm Ericsson (Publ) Methods of proving validity and determining validity, electronic device, server and computer programs
US10862690B2 (en) 2014-09-30 2020-12-08 Telefonaktiebolaget Lm Ericsson (Publ) Technique for handling data in a data network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070022109A1 (en) * 2005-07-25 2007-01-25 Tomasz Imielinski Systems and methods for answering user questions

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070022109A1 (en) * 2005-07-25 2007-01-25 Tomasz Imielinski Systems and methods for answering user questions

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8871471B2 (en) 2007-02-23 2014-10-28 Ibis Biosciences, Inc. Methods for rapid forensic DNA analysis
EP2338127A1 (en) * 2008-08-29 2011-06-29 Brown University Cryptographic accumulators for authenticated hash tables
EP2338127A4 (en) * 2008-08-29 2013-12-04 Univ Brown Cryptographic accumulators for authenticated hash tables
US8726034B2 (en) 2008-08-29 2014-05-13 Brown University Cryptographic accumulators for authenticated hash tables
US9098725B2 (en) 2008-08-29 2015-08-04 Brown University Cryptographic accumulators for authenticated hash tables
US10862690B2 (en) 2014-09-30 2020-12-08 Telefonaktiebolaget Lm Ericsson (Publ) Technique for handling data in a data network
US10511440B2 (en) 2015-02-20 2019-12-17 Telefonaktiebolaget Lm Ericsson (Publ) Methods of proving validity and determining validity, electronic device, server and computer programs
WO2019168557A1 (en) * 2018-02-27 2019-09-06 Visa International Service Association High-throughput data integrity via trusted computing
US11140134B2 (en) 2018-02-27 2021-10-05 Visa International Service Association High-throughput data integrity via trusted computing
US11848914B2 (en) 2018-02-27 2023-12-19 Visa International Service Association High-throughput data integrity via trusted computing

Also Published As

Publication number Publication date
WO2008014002A3 (en) 2008-10-16

Similar Documents

Publication Publication Date Title
Yang et al. Lightweight and privacy-preserving delegatable proofs of storage with data dynamics in cloud storage
Miao et al. Verifiable searchable encryption framework against insider keyword-guessing attack in cloud storage
Hülsing et al. Mitigating multi-target attacks in hash-based signatures
Narasimha et al. Authentication of outsourced databases using signature aggregation and chaining
Hu et al. Spatial query integrity with voronoi neighbors
Cormode et al. Verifying computations with streaming interactive proofs
Goodrich et al. Efficient authenticated data structures for graph connectivity and geometric search problems
US20110225429A1 (en) Cryptographic accumulators for authenticated hash tables
Li et al. Integrity-verifiable conjunctive keyword searchable encryption in cloud storage
Hu et al. Private search on key-value stores with hierarchical indexes
Chen et al. Publicly verifiable databases with all efficient updating operations
Tamassia et al. Certification and Authentication of Data Structures.
WO2008014002A2 (en) Super-efficient verification of dynamic outsourced databases
Li et al. Private matching
Xu et al. Efficient public blockchain client for lightweight users
Yang et al. Multiuser private queries over encrypted databases
Yi et al. Small synopses for group-by query verification on outsourced data streams
Zhang et al. Towards efficient and privacy-preserving interval skyline queries over time series data
Su et al. Authentication of top-spatial keyword queries in outsourced databases
Prakasha et al. Efficient digital certificate verification in wireless public key infrastructure using enhanced certificate revocation list
Maniatis Historic integrity in distributed systems
Daniel et al. ES-DAS: An enhanced and secure dynamic auditing scheme for data storage in cloud environment
Jing et al. Communication-efficient verifiable data streaming protocol in the multi-user setting
Mohamad et al. Verifiable structured encryption
Le et al. Query access assurance in outsourced databases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07810907

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

NENP Non-entry into the national phase in:

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07810907

Country of ref document: EP

Kind code of ref document: A2