WO2002039212A2

WO2002039212A2 - An efficient dynamic and distributed cryptographic accumulator

Info

Publication number: WO2002039212A2
Application number: PCT/US2001/043007
Authority: WO
Inventors: Michael T. Goodrich; Roberto Tamassia
Original assignee: John Hopkins University
Priority date: 2000-11-08
Filing date: 2001-11-08
Publication date: 2002-05-16
Also published as: WO2002039212A3; AU2002239245A1

Abstract

A computer implemented method is used to realize an authenticated dictionary by computing and updating the value of an exponential accumulator function in a distributed network (203) in a manner that allows a source computer (200) to quickly update mirror site computers (204-206) that are storing the same data (201) as the source computer. This allows the mirror site computers (204-206) to answer queries (207-212) much faster while not compromising security. The mirror site computers (204-206) answer queries on behalf of the source computer (200) but provide accumulator values so that client sofware can determine that the answers provided are as accurate as had they come from the source computer (200) itself. The accumulator values are updated as items are inserted and removed from the source computer's database (201). This invention provides a mechanism by which the source computer (200) can use a pipelined binary tree computation to quickly update partial values that, when stored at the mirror sites (204-206), allow the mirror sites to answer queries (207-212) much faster while not compromising security.

Description

AN EFFICIENT DYNAMIC AND DISTRIBUTED CRYPTOGRAPHIC ACCUMULATOR

GOVERNMENT INTERESTS

The work leading to this invention was funded in part by the Defense Advanced Research Projects Agency (DARPA), grant number:

F30602-00-0509. The U.S. Government may have certain rights in this invention.

DESCRIPTION

BACKGROUND OF THE INVENTION

Field of the Invention

The present invention relates to a mechanism to compute and update the value of an exponential accumulator function in a distributed network so that a source computer can quickly update mirror site computers that are storing the same data as the source while not compromising security.

Background Description

Because of network latency and the risk of denial of service attacks, Internet services, such as Web servers, are often replicated to mirror sites. Thus, a user will in general be much closer to one of these mirror sites than to the source of the repository, and will therefore experience a faster response time from a mirror site than it would by communicating directly with the source. In addition, by off-loading user servicing from the information source, this distributed scheme allows for load balancing across the mirror sites, which further improves performance.

An information security problem arising in the replication of data to mirror sites is the authentication of the information provided by the sites. Indeed, there are applications where the user may require that data coming from a mirror site be cryptographically validated as being as genuine as they would be had the response come directly from the source. For example, a financial speculator that receives NASDAQ stock quotes from the Yahoo ! Web site would be well advised to get a proof of the authenticity of the data before making a large trade.

For all applications, and particularly for applications in wireless computing, we desire solutions that involve short responses from a mirror site that can be quickly verified with low computational overhead. More formally, the problem we address involves three parties: a trusted source, an untrusted directory, and a user. The source defines a finite set S of elements that evolves over time though insertions and deletions of items. The directory maintains a copy of set S. It receives time-stamped updates from the source together with update authentication information, such as signed statements about The update and the current elements of the set. The user performs membership queries on the set S of the type "is element e in set S?", but instead of contacting the source directly, it queries the directory. The directory provides the user with a yes/no answer to the query together with query authentication information, which yields a proof of the answer assembled by combining statements signed by the source. The user then verifies the proof by relying solely on its trust in the source and the availability of public information about the source that allows to check the source's signature. The data structure used by the directory to maintain set S, together with the protocol for queries and updates is called an authenticated dictionary. Figure 1 shows a schematic view of an authenticated dictionary. In the use of the authenticated dictionary, a user 10 makes a query 11 to a directory 12 which responds by providing as answer authentication information 13. The directory 12, in turn, is provided with updated authentication information 14 from the source 15.

The design of an authenticated dictionary should address several goals. These goals include low computational cost, so that the computations performed internally by each entity (source, directory, and user) should be simple and fast and low communication overhead, so that bandwidth utilization is minimized. Since these goals are particularly important for the user, we say that an authenticated dictionary is size oblivious if the response and verification given to each user does not depend in any way on the number of items in the dictionary. We are most interested in solutions to the authenticated dictionary problem that are size oblivious. Such solutions are ideally suited for wireless applications, where user devices have low computational power and low bandwidth. In addition, size-oblivious solutions add an extra level of security, since the size of the dictionary is never revealed to users.

Authenticated dictionaries have a number of applications, including scientific data mining (e.g., genomic querying and astropliysical querying), geographic data sewers (e.g., GIS querying), third- party data publication on the Internet, and certificate revocation in public key infrastructure. They are also useful for time stamping online documents, provided the source publishes a signed summary information to a trusted and dated archive (such as the New York Times Classified Ads).

Genomic querying tends to be comprised of text searches for various patterns, such as substrings or super-strings. Also, astrophysical querying, such as in the object catalog of the Sloan Digital Sky Survey, tends to be in the form of range searches for points lying within given geometric shapes. Given the significant scientific and economic benefits that can result from such querying, these users need to be certain that the results of their queries are accurate and current.

Another type of data replication problem arises in geographic information systems (GIS) applications, where a large collection of geographic data must be replicated to several web sites so as to provide data querying capability to a widely-dispersed population of users. Such queries are typically geographic or geometric in nature, and often need to be trustworthy, as critical navigation plans are often based on such queries. This application is well-suited for a size-oblivious authenticated dictionary, as the navigational device is likely to be a palm computer equipped with a Global Positioning Satellite (GPS) receiver.

In the third-party publication application, the source is a trusted organization (e.g., a stock exchange) that produces and maintains integrity- critical content (e.g., stock prices) and allows third parties (e.g., Web portals), to publish This content on the Internet so that it widely disseminated. The publishers store copies of the content produced by the source and process queries on such content made by the users.

In addition to returning the result of a query, a publisher also returns a proof of authenticity of the result, thus providing a validation service. Publishers also perform content updates originating from the source. Even so, the publishers are not assumed to be trustworthy, for a given publisher may be processing updates from the source incorrectly or it maybe the victim of a system break-in.

In the certificate revocation application, the source is a certification authority (CA) that digitally signs certificates binding entities to their public keys, thus guaranteeing their validity. Nevertheless, certificates are sometimes revoked (e.g., if a private key is lost or compromised, or if someone loses their authority to use a particular private key). Thus, the user of a certificate must be able to verify that a given certificate has not been revoked. To facilitate such queries, the set of revoked certificates is distributed to certificate revocation directories, which process revocation status queries on behalf of users. The results of such queries need to be trustworthy, for they often form the basis for electronic commerce transactions.

SUMMARY OF THE INVENTION

It is therefore an object of the present invention to provide a mechanism to compute and update the value of an exponential accumulator function in a distributed network in a manner that allows the efficient realization of an authenticated dictionary where a source computer can quickly update mirror site computers that are storing the same data as the source computer.

It is another object of the invention to allow mirror site computers to answer queries much faster while not compromising security. According to the invention, the mirror site computers answer queries on behalf of the source computer but provide accumulator values so that client software can determine that the answers provided are as accurate as had they come from the source computer itself. The accumulator values are updated as items are inserted and removed from the source computer's database. This invention provides a mechanism by which the source computer can use a pipelined binary tree computation to quickly update partial values that, when stored at the mirror sites, allow the mirror sites to answer queries much faster while not compromising security. The invention uses one-way accumulators which allow insecure directories to provide cryptographically secure answers to membership queries on a set maintained by a trusted source. Such usage implements the authenticated dictionary abstract data type and it finds applications in certificate management for public key infrastructure, and the publication of data collections on the Internet. From the user's perspective, particularly in wireless applications, the optimal authenticated dictionaries provide small verifications derived from data signed by the source and involve computations that are simple to program and perform. Our new scheme for authenticated dictionaries supports efficient incremental updates of the underlying set and optimal constant-time verification by the user. The invention is based on the dynamic maintenance of a one-way accumulator function over the set elements.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which: Figure 1 is a schematic diagram of an authenticated dictionary; Figure 2 is a block diagram showing a source computer interconnected to a plurality of directory computers;

Figure 3 is a flow diagram showing the logic of the update algorithm executed by the source computer;

Figure 4 is a flow diagram showing the logic of the query algorithm executed by a directory computer; and Figure 5 is a flow diagram showing the logic of the validation algorithm executed by a user.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION

Referring again to the drawings, and more particularly to Figure 2, there is shown a block diagram of the environment in which the invention is practiced. A source computer 200 having a database 201 is connected via some network, such as the Internet 203, to a plurality of mirror site computers or directories, here represented by computers 204, 205 and 206. The mirror site computers are, in turn, accessed by a plurality of user devices over a network, such as the Internet 203. These user devices may be desk top computers 207, personal digital assistants (PDAs) 208, 209, hand held computers 210, cellphones 211, 212, and other such devices, including smart cards and the like, having limited computing power. Many of these connections are wireless, requiring short responses from a mirror site computer that can be quickly verified with low computational overhead.

Before we present our technique for authenticated dictionaries, we review previous work on authenticated dictionaries and discuss some cryptographic concepts used in our approach.

Throughout, we denote with n the current number of elements of the set S stored in the authenticated dictionary. Also, we describe the validation of positive answers to membership queries (i.e., validating e e S). The validation of negative answers can be handled with a standard method, as discussed below.

Authenticated dictionaries are related to research in distributed computing (e.g., data replication in a network), data structure design (e.g., program checking and memory checking), and cryptography (e.g., incremental cryptography). The use of one-way accumulators originates with J. Benaloh and M. de Mare (see "One-way accumulators: A decentralized alternative to digital signatures", Advances in Cryptography - EUROCRYPT93, vol. 765 of Lecture Notes in Computer Science, pp.

274-285, 1993). They show how to utilize an exponential one-way accumulator, which is also known as an RSA accumulator, to summarize a collection of data so that user verification responses have constant-size. A refinement of the RSA accumulator approach is givne by T. Sander, A. Ta-Shma and M. Yung in "Blind, Auditable Membership Proofs", Proc. Financial Cryptography, '00, Lecture Notes on Computer Science, 2000. Such a solution can be used to implement an authenticated dictionary, but in a dynamic setting, where items are inserted and deleted, the standard way of utilizing the exponential accumulator is inefficient. Several other researchers have also noted the inefficiency of this implementation in a dynamic setting (e.g., see B. Schneier, Applied Cryptography: Protocols, Algorithms and Sourcecode in C, John Wiley and Sons, Inc., New York, 1994). Indeed, our solution can be viewed as refuting this previous intuition to show that a more sophisticated utilization of the exponential accumulator can be made to be efficient even in a dynamic setting.

Previous additional work on authenticated dictionaries has been conducted primarily in the context of certificate revocation. The traditional method for certificate revocation (e.g., see C. Kaufman, R. Permian, and M. Speciner, Network Security: Private Communications in a Public World, Prentice-Hall, Englewood Cliffs, NJ, 1995) is for the CA (source) to sign a statement consisting of a timestamp plus a hash of the set of all revoked certificates, called certificate revocation list (CRL), and periodically send the signed CRL to the directories. A directory then just forwards that entire signed CRL to any user who requests the revocation status of a certificate. This approach is secure, but it is inefficient, for it requires the transmission of the entire set of revoked certificates for both source-to-directory and directory-to-user communication. This scheme corresponds to an authenticated dictionary where both the update authentication information and The query authentication information has size Θ(«). Thus, this solution is clearly not size-oblivious, and even more recent modifications of this solution, which are based on delta-CRLs [e.g., see D.A. Cooper, "A more efficient use of delta-CRLs", Proceedings of the 2000 IEEE Symposium on Security and Privacy, pp. 190-202, 2000), are not size-oblivious. Because of the inefficiency of the underlying authenticated dictionary, CRLs are not a scalable solution for certificate revocation.

S. Micali (see "Efficient certificate revocation", Technical Report TM-542b, MIT Laboratory for Computer Science, 1996) proposes an alternate approach, where the source periodically sends to each directory the list of all issued certificates, each tagged with the signed timestamped value of a one-way hash function (e.g., see B. Schneier, supra) that indicates if this certificate has been revoked or not. This approach allows the system to reduce the size of the query authentication information to O(l) words: namely just a certificate identifier and a hash value indicating its status. Unfortunately, this scheme requires the size of the update authentication information to increase to Θ(N), where N is the number of all non-expired certificates issued by the certifying authority, which is typically much larger than the number n of revoked certificates. It is size- oblivious for immediate queries, but cannot be used for time stamping for archiving purposes, since no digest of the collection is ever made.

The hash free scheme introduced by R. C. Merkle (see "Protocols for public key cryptosystems", Proc. Symp. On Security and Privacy, IEEE Computer Society Press, 1980, and "A certified digital signature" in Advances in Cryptology - CRYPTO '89, G. Brassard, editor vol. 435, lecture Notes in Computer Science, pp. 218-238, Springer-Nerlag, 1990) can be used to implement a static authenticated dictionary, which supports the initial construction of the data structure followed by query operations, but not update operations. A hash tree Efor a set S stores the elements of S at the leaves of T and a hash value h(y) at each node v, which combines the hash of its children. The authenticated dictionary for set S consists of the hash tree plus the signature of a statement consisting of a timestamp and the value h{f) stored of the root r of T. An element e is proven to belong to S by reporting the values stored at the nodes on the path in T from the node storing e to the root, together with the values of all nodes that have siblings on this path. Thus, this solution is not size-oblivious, since the length of this path depends on n. P. C. Kocher (see "On certificate revocation and validation", Proc. International Conference on Financial Cryptography, vol. 1465, Lecture Notes on Computer Science, 1998) also advocates a static hash tree approach for realizing an authenticated dictionary, but simplifies somewhat the processing done by the user to validate that an item is not in the set S, by storing intervals instead of individual elements. Such an interval approach can also be applied to the exponential accumulator.

Using techniques from incremental cryptography, M. Naor and K. Nissim (see "Certificate revocation and certificate update", Proceedings of the 7^th USENIX Security Symposium (Security-98), pp. 217-228, Berkeley, 1998) dynamize hash trees to support the insertion and deletion of elements. In their scheme, the source and the directory maintain identically- implemented 2-3 trees. Each leaf of such a 2-3 tree T stores an element of set S, and each internal node stores a one-way hash of its children's values. Hence, the source-to-directory communication is reduced to O(l) items, but The directory-to-user communication remains at O(log ). Thus, their solution is still not size oblivious.

Other certificate revocation schemes based on variations of hash trees have been recently proposed, but like the static hash tree, These schemes are also not size oblivious.

One- Way Accumulators

An important cryptography concept for our invention is that of oneway accumulator functions. Such a function allows a source to digitally sign a collection of objects as opposed to a single document.

The most common form of one-way accumulator is defined by starting with a "seed" value y₀, which signifies the empty set, and then defining the accumulation value incrementally &omy₀ for a set of values X^~ {x„ . . . , x_n), so that^, =fly,-_lrx.), where/is a one-way function whose final value does not depend on the order of the x,s. In addition, one desires that \y,\ not be much larger to represent than y,X, so that the final accumulation value, y_n, is not too large. Because of the commutative nature of f, a source can digitally sign the value ofy_n so as to enable a third party to produce a short proof for any element x_; belonging to X- namely, swap x_; with x_n and recompute . ^from scratch - the pair (x„ ;„_,) is ^' a cryptographically-secure assertion for the membership of x, in setX

A well-known example of a one-way accumulator function/is the exponential accumulator,

exρ(y,x) = fmodN, (1)

for suitably-chosen values of the seed yo and modulus N. In particular, choosing N=pq for two strong primes;? and q makes the accumulator function exp as difficult to break as RSA cryptography. The difficulty in using the function exp in the context of authenticated dictionaries is that it is not associative; hence, any updates to set X require significant re-computations. Indeed, some have mentioned the challenge of using the exponential accumulator function in an incremental setting, where items in the set X are inserted and removed over time.

There is an important technicality involved with use of the exp function, namely in the choice of the seed a y₀. In particular, we should choose this base of the exponent to be relatively prime with/? and q. This choice is dictated by Euler's Theorem, which states

Theorem 1 (Euler's Theorem): α^φ(Λ° modN= 1, if a > 1 and N> 1 are relatively prime. Since a and N are relatively prime in our use of the accumulator function exp, the following well-known corollary to Euler's Theorem will prove useful.

Corollary 2: If a > 1 and N> 1 are relatively prime, then ΛnodN = cT^oάm modN, for all x > 0.

One implication of this corollary to the authenticated dictionary problem is that the source should never reveal the values of the prime numbers p and q. Such a revelation would allow a directory to compute φ(N), which in turn could result in a false validation at a compromised directory. So, our approach takes care to keep the values oϊp and q only at the source.

The challenge to using the exponential accumulator function, exp, for an authenticated dictionary is that the straightforward approach to its use, particularly for updates, is inefficient. In this paper we show how to significantly improve upon the performance of this straightforward approach. For completeness, however, let us first briefly review the straightforward approach before we describe improvements.

Let S ~ {e_λ, e₂, . . . ,e_n) be the set of elements stored at the source. The source chooses secure primes p and q that are suitably large. It then chooses a suitably large base a that is relatively prime to N=pq. The source broadcasts the values of a and Nto the directories, but it keeps the values p and q secret. For each element e_; of S, the source computes a representative of e„ which we denote with x, = h(e . The strongest security is obtained when the representative x, is a prime number. This can be achieved, for example, by the randomized technique used by Sander et al., supra. Alternatively, we can compute x_; by applying to ei a collision- resistant cryptographic has function. The source then accumulates the representatives of the elements by computing A = a*¹' ²' ^{■ ■ ■}' ^x"moάN

and broadcasts to the directories a signed message (A, f), where t is a current timestamp.

To prove that some query item e, is in S, the directory computes the value

^ _{= a}x , x₂, . . ., x_., x_M, . . ., χ„_modN

That is, A, is the accumulation of all the representatives of the elements of S besides x,. After computing A_p the directory then returns to the user A„ N, and the signed pair (A, t). Computing A, is no trivial task for the directory, for it must perform n- 1 exponentiations to answer a query. Making the simplifying assumption that the number of bits needed to represent N is independent of n, the computation performed to answer a single query takes 0(n) time. Note that the message sent to the user has constant size; hence, this scheme is size oblivious. The user checks that t is current and that (A, t) is indeed signed by the source. Then it verifies that x, is the representative of e„ computes

A_i 'mod N and compares it to A. If A = A₍ 'mod N, then the user is

reassured of the validity of the answer Indeed, it is generally accepted to be computational infeasible for someone who does not know the values ofp and q to compute a value B such that A = B_t 'mod N when e, jέ S. In

particular, it is computational infeasible for the directory to provide a false justification for some element belonging to S when in fact this is not the case. The validation time is O(l).

For updates, this simple approach has an asymmetric performance, with insertions being much easier than deletions. To insert a new element e„₊₁ into the set S, the source simply re-computes the accumulation A as follows

A = A^'

where x„₊₁ = h(e_n+]). An updated signed pair (A, t) is then sent to the directories in the next time interval. Thus, an insertion takes O(l) time. The deletion of an element e_t e B, on the other hand, will in general require the source computer to re-compute the new value A by performing n - 1 exponentiations. That is, a deletion takes 0( ) time. The performance of this straightforward use of the exponential accumulator is summarized in Table 1.

Table 1: Straightforward implementation of an authenticated dictionary using an exponential accumulator

The above query time bound is generally considered too slow to be efficient for processing large numbers of queries. Fortunately, we describe an alternative approach that can answer queries much faster.

We present a first improvement that allows for fast query processing. We require the directory to store each of the -4, accumulator values, as defined in formula (2). Thus, to answer a query, a directory looks up the A, value, rather than computing it from scratch, and it then completes the transaction as described in the previous section. That is, a directory can under this pre-computed accumulations scheme process any query in O(l) time, with the computation for a user remaining unchanged. Unfortunately, a standard way of implementing this approach is inefficient for processing updates. In particular, a directory now requires 0(n²) time to process a single insertion or deletion, for after such an update the directory must recompute all the A, values from scratch. That is, re-computing any single A, at a directory after an update requires n- 1 exponentiations. Thus, at first blush, this pre-computed accumulations approach appears to be quite inefficient when updates to the set S are required.

We can process updates much faster than 0(n²) time, however, by enlisting the help of the source. Our method in fact can be implemented in

0(n) time by a simple two-phase approach. The details for the two phases follows.

Let S be the set of n items stored at the source after performing all the insertions and deletions required in the previous time interval. Build a complete binary tree T "on top" of the representatives of the elements of S, so that each leaf of T is associated with the representative x, = h{e-) of an element e, of S. In the first phase, we perform a post-order traversal of T, so that each node v in T is visited only after its children are visited. The main computation performed during the visit of a node v is to compute a value x(v). If v is a leaf of T, storing some x„ then we compute

x(v) = x, mod φ(N).

If v is an internal node of T with children u and w (we can assume Eis proper, so that each internal node has two children), then we compute

x(v) = x(u)x(w) mod φ(N).

When we have computed x(r), where r denotes the root of T, then we are done with this first phase. Since a post-order traversal takes O(N) time, and each visit computation in our traversals takes O(l) time, this entire first phase runs in 0(n) time.

In the second phase, we perform a pre-order traversal of T, where the visit of a node v involves the computation of a value A(y). The value A(y) for a node v is defined to be the accumulation of all values stored at nodes that are not descendants of v (including v itself if v is a leaf). Thus, if v is a leaf associated with the representative x, of some element of S, then A(v) = A_j. Recall that in a pre-order traversal we perform the visit action on each node v before we perform the respective visit actions for v's children. For the root r, of T, we define A(r) - a mod N. For any non-root node v, let z denote v's parent and let w denote v's sibling (and note that since Eis proper, every node but the root has a sibling). Given A(z) and x(w), we can compute the value A(v) for v as follows:

A(v) =A(z)^x{w) mod N.

By the corollary (2) to Euler's Theorem, we can inductively prove that each A (v) equals the accumulation of all the values stored at non- descendants of v. Since a pre-order traversal of T takes 0(ή) time, and each visit action can be performed in O(l) time, we can compute all the A_i values in 0(n) time. Note that implementing this algorithm requires knowledge of the value φ(N), which presumably only the source knows.

Thus, this computation can only be performed at the source, who then must transmit all the new -4, values after any updates.

The performance of the pre-computed accumulation scheme is summarized in Table 2.

0(n) 0(n) 0(n) 0(n) 0(1) 0(1) 0(1)

Table 2: Precomputed accumulation scheme for implementing an authenticated dictionary with an exponential accumulator.

Thus, this pre-computed accumulations approach can be implemented to run in constant time for queries and in linear time for updates. If n is very large, however, and updates occur frequently but in small numbers, then even these linear-time computations at the source can take a while. Therefore, we next describe how to combine the two above approaches to design a scheme that is efficient for both updates and queues.

Suppose we are again interested in maintaining a set S= {e_l5 e₂, . . ., e_n} as described above. We will use an integer parameter 1 ≤p≤n to balance the processing between the source and the directories, depending on their relative computational power The main idea is to partition the set S into p groups of roughly nip elements each, performing the straightforward approach inside each group and the pre-computed accumulations approach among the groups. The details are as follows.

The first step is subdividing the dictionary. Divide the set S into p groups, Y_λ, Y₂, . . ., Y_p, of roughly nip elements each, balancing the size of the groups as much as possible. For group Y_p let y_} denote the product of the hash values of the elements in Y_} modulo φ(N). Define -δ. as

,-v, y, ^■ - . - . - >, _{mod N}

^BJ =

That is, B_j is the accumulation of the representatives of all the elements that are not in the set Y_y After any insertion or deletion in a set Y_p the source can compute a new value v_y in 0(nlp) time (we show below how with some effort this bound can be improved to O(log nip) time). Moreover, since the source knows the value of φ(N), it can update all the B_j values after such an update in 0(p) time. Thus, the source can process an update operation in 0(p + nip) time, assuming that the update does not require adjusting where the boundaries between the Y_j sets are.

Fortunately, maintaining the size of each set Y_j is not a major overhead. We need only keep the invariant that each Y_} has at least \nlp ]/2 elements at most

elements. If a Y_j set becomes too small, then we either merge it with one of its adjacent sets Y_}__x or Y_J+], or (if merging Y_j with such a sets would cause an overflow) we "borrow" some of the elements from an adjacent set to bring the size of Y_j to at least 3[nlp^~]l4. Likewise, if a Y_} set grows too large, then we simply split it in two. These simple adjustments take 0(nlp) time, and will maintain the invariant that each - , is of size Θ(n/p). Of course, this assumes that the value of n does not change significantly as we insert and remove elements, but even this condition is easily handled. Specifically, we can maintain the sizes of the Y_j's in a priority queue that keeps track of the smallest and largest Y_j sets.

Whenever we increase n by an insertion, we can check the priority queue to see if the smallest set now must do some merging or borrowing to keep from growing too small. Likewise, whenever we decrease n by a deletion, we can check the priority queue to see if the largest set now must split. A straightforward inductive argument shows that this approach keeps the size ofthe -- s to be Θ(τ ).

Keeping the ζ's to have exactly size θ(nlp) is admittedly an extra overhead. In practice, however, all this overhead can probably be ignored, as it is likely that the Y s will grow and shrink at more or less the same rate. Indeed, even if the updates are non-uniform, we can afford to completely redistribute the elements in all the ζ's as often as every 0(min{p, nip}) updates, amortizing the 0( ) cost for this redistribution to the previous set of updates that occurred since the last redistribution.

Turning to the task at a directory, then, we recall that a directory receives all p of the B values after an update occurs. Thus, a directory can perform its part of an update computation in 0(p) time. It validates that some e_i is in S by first determining the group Y_j that e, belongs to, which can be done by table look-up. Then, it computes A, as

where [k, l is the range of indices for the elements in Y_j and x_m = h (e;„). Thus, a directory can answer a query in 0(nlp) time.

The performance of the parameterized accumulation algorithm is summarized in Table 3.

Table 3 : Parameterized scheme for implementing an authenticated dictionary using an exponential accumulator. We denote with/? an integer such that 1 ≤p≤n.

The parameter p allows us to balance the performance between the source and the directories, and also between the cost for an update and the cost for performing queries. For example, we can balance performance equally by setting p -

implies that both queries and updates in this scheme take

time. Note that for reasonable values of n, say from between 10,000 and 1,000,000, n is between 100 and 1,000. In many cases, this is enough of a reduction to make the dynamic exponential accumulator practical for the source and directories, while still keeping the user computation to be one exponentiation and one signature verification. Indeed, these user computations are simple enough to even be embedded in a smart card, a PDA, or cellphone. We describe now how the source can further improve the performance of an update operation in the parameterized scheme. Recall that in this scheme the set S is partitioned into/? subsets, Y_λ, Y₂, . . ., Y_p, and the source maintains for each - a value -5, on behalf of the directories, that is the accumulation of all the representatives of the elements not in Y. Also recall that, for each group Y_p we let^ denote the product of the representatives of the elements in Y_} modulo φ(N). In the algorithm description above, the source re-computes y from scratch after any update occurs, which takes 0(nlp) time. We now describe how this can be done in O(log (nip)) time.

The method is for the source to store the elements of each Y in a balanced binary search tree. For each internal node w in T the source maintains the value y(w), which is the product of all the representatives stored at descendents of w, modulo φ(N). Thus, _ (r(- ) = Y where r(T) denotes the root of T_p Any insertion or deletion will affect only O(log

(nip)) nodes w in T for which we can recompute their x(w) values in O(log (nip)) total time. Therefore, after any update, the source can recompute ay_} value in O(log (nip)) time, assuming that the size of the -J^'s does not violate the size invariant. Still, if the size of Y_} after an update violates the size invariant we can easily adjust it by performing appropriate splits and joins on the frees representing Y_p Y _] and/or Y_pλ. Moreover, we can rebuild the entire set of trees after every 0(nlp) updates, to keep the sizes of the Y_j sets to be 0(nlp), with the cost for this periodic adjustment (which will probably not even be necessary in practice for most applications) being amortized over the previous updates. This performance of the resulting scheme is summarized in Table 4.

Table 4: Enhanced parameterized scheme for implementing an authenticated dictionary using an exponential accumulator

We denote with ? an integer such that 1 ≤p≤n.

In this version of our invention, we can achieve a complete tradeoff between the cost of updates at the source and queries at the directories.

Tuning the parameter/? over time, therefore, could yield the optimal balance between the relative computational powers of the source and directories. It could also be used to balance between the number of queries and updates in the time intervals. The update algorithm executed by the source computer according is illustrated in the flow diagram of Figure 3. The process begins in function block 301 with the group Yj where the update occurs, y is denoted the product of the hash values of the elements in Y_} modulo φ(N) in function block 302; that is, y «- Y _eeγh(e) odΦ(N) . Before entering a processing

loop, the index i is set to 1 in function block 302. The first step in the processing loop is to update B, using pre-computed accumulations in function block 304. The index i is then incremented by 1 in function block 305. A determination is then made in decision block 306 as to whether i > n, and if not, the process loops back to function block 304; otherwise, a further test is made in decision block 307 to determine if the groups are unbalanced after the update. If so, the groups re-balanced in function block 308 before sending B . . . B_p and signed (A, t) to the mirror site computers in function block 309. If re-balancing is not required, the process goes directly to function block 309. The query algorithm executed by a mirror source computer is shown in Figure 4. The group Y_} containing the search element ei is identified in function block 401. The accumulated value A_l defined by ll_ee __(e )A(e)

B_j ^J ' modN is accessed in function block 402. Finally, A, and signed (A, t) are returned to the user in function block 403

The validation algorithm executed by the user is shown in Figure 5. h(e)

A comparison is of the response R is made to A_j ' mod(N)in function

block 501. A determination is made in decision block 502. If R = A, a valid answer is returned in function block 503; otherwise, an invalid answer is returned in function block 504.

We have shown how to make the exponential accumulator function the basis for a practical and efficient scheme for authenticated dictionaries, which relies on reasonable cryptographic assumptions similar to those that justify RSA encryption. A distinctive advantage of our approach is that the validation of a query result performed by the user takes constant time and requires computations (a single exponentiation and digital signature verification) simple enough to be performed in devices with very limited computing power, such as a smart card or a cellphone. Our approach also achieves a complete tradeoff between the cost of updates at the source and queries at the directories, with updates taking 0(p + log (nip)) time and queries taking 0(nlp) time, for any fixed integer parameter l≤p≤n. For example, we can achieve

time for both updates and queries.

Our invention can be easily adapted to contexts, such as certificate revocation queries, where one needs to also validate that an item e is not in the set S. In this case, we use the well-known trick of storing in the dictionary not the items themselves, but instead the intervals [e_l5 e,₊₁] in a sorted list of the elements of S. A query for an element e returns an interval /= [e_]5 e_/+]] containing, e plus a cryptographic validation of interval L ife is one of the endpoints of this interval, it is in S; if it is strictly inside the interval, e is not in S. Note that this approach also requires that we have a way of representing some notion of -oo and +oo. Even so, the overhead adds only a constant factor to all the running times for updates, queries, and validations. While the invention has been described in terms of preferred embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.

Claims

CLAIMSHaving thus described our invention, what we claim as new and desire to secure by Letters Patent is as follows:

1. A computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network in a manner that allows a source computer to quickly update mirror site computers that are storing the same data as the source computer, the method comprising the steps of: storing a set S of elements {e_l5 e₂, . . ., e„} at the source computer; choosing by the source computer secure primes/? and q and then choosing a base a that is relatively prime to N=pq; broadcasting by the source computer values of a and N to the mirror site computers while maintaining/? and q secret; computing by the source computer a representative for each element stored at the source computer; accumulating by the source computer computed representatives of the elements by computing A = a*¹' *²' ^{' ' '}' "m.odN; broadcasting to the mirror site computers a signed message (A,t), where t is a current time stamp; in response to a query by a user, computing by a mirror site computer A . = a " ²' ^{' '} " '^"" ^/+1' ^{' '} " "modN to prove that some query

item e, is in S, where A- is the accumulation of all the representatives of the elements of S besides x,; and returning to the user A_t, N and the signed pair (A, t).

2. The computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network recited in claim 1 , wherein the computed representative for each element stored at the source computer is an associated prime.

3. The computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network recited in claim 1, wherein the computed representative for each element stored at the source computer is a cryptographic hash function x, = h(e .

4. The computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network recited in claim 1, further comprising the steps of: checking by the user that t is current and that (A, t) is signed by the source computer; computing by the user A_l 'mod N; and

comparing A_τ 'mod N to A, and if A = A_t 'mod N, the user is

reassured of the validity of the answer to the query.

5. The computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network recited in claim 1, further comprising the steps of: inserting by the source computer a new element e,_!+1 into the set S by re-computing the accumulation function as A - A ^x"⁺¹ , where x_lM = h(e_nH); and sending by the source computer an updated signed pair (A, t) to the mirror site computers during a next time interval.

6. The computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network recited in claim 1 , further comprising the steps of: building by the source computer a binary tree T on top of the representatives of the elements of S so that each leaf of T is associated with the representative x_t = z(e,) of an element e, of S; and performing a post-order traversal of T so that each node v in T is visited only after its children are visited.

7. The computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network recited in claim 6, wherein the step of performing a post-order traversal of T comprises the steps of: computing x(v) = x,modφ(N) if v is a leaf of T storing some x_;; and computing x(v) = x(---)x(w)modφ(N) if v is an internal node of T with children u and w.

8. The computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network recited in claim 6, further comprising the steps of: performing by the source computer a pre-order traversal of T, where a visit of a node v involves computing a value A(v) defined as the accumulation of all values stored at nodes that are not decedents of v; and transmitting by the source computer to the mirror site computers all new A, values after any updates.

9. The computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network recited in claim 8, further comprising the steps of: dividing the set S into/? groups Y_λ, Y₂, . . ., Y of approximately nip elements in each group; and maintaining by the source computer a value -δ, for each Y_} on behalf of each mirror site computer, where B = a ^{ι Vl "y}>^{~ VjH '" y?} mod N and_y.

denotes the product of the hash values of the elements in Y modulo φ(N).

10. The computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network recited in claim 9, further comprising the steps of: validating by the source computer that some element e, is in S by determining the group Y_} that e, belongs to; and computing by the source computer A_i = B_j " **^{+1 "} '^"' ^{X,+1 " X}' mod N, where [k, /]is the range of indices for the

elements in Y_j and x_m = h(e_n .

11. The computer implemented method for realizing an authenticated dictionary by computing and updating a value of an exponential accumulator function in a distributed network recited in claim 6, further comprising the steps of: storing by the source computer the elements of Y in a balanced binary search tree T wherein for each node w in T the source computer maintains a value y(w) which is the product of all the items stored at decedents of w, modulo φ(N) so that y(r(T ) = y where r(T denotes the root of T_p and adjusting the size of Y_j after an update if the update violates a size invariant.