WO2010015063A1 - System and method for privacy preserving query verification - Google Patents

System and method for privacy preserving query verification Download PDF

Info

Publication number
WO2010015063A1
WO2010015063A1 PCT/CA2008/001436 CA2008001436W WO2010015063A1 WO 2010015063 A1 WO2010015063 A1 WO 2010015063A1 CA 2008001436 W CA2008001436 W CA 2008001436W WO 2010015063 A1 WO2010015063 A1 WO 2010015063A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
client
query
node
query result
Prior art date
Application number
PCT/CA2008/001436
Other languages
French (fr)
Inventor
Xiaonan Ma
Hong Chen
Windsor Wee Sun Hsu
Original Assignee
International Business Machines Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corporation filed Critical International Business Machines Corporation
Priority to PCT/CA2008/001436 priority Critical patent/WO2010015063A1/en
Publication of WO2010015063A1 publication Critical patent/WO2010015063A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3247Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving digital signatures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/321Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving a third party or a trusted authority

Definitions

  • This invention relates to the field of data publishing, and particularly to solutions for the preservation of privacy in query verification of outsourced third-party data publishing models.
  • the data publisher's server could be compromised — resulting in the data publisher losing control of the security of their own server.
  • the securing of large online data systems has proving to be a daunting task. Therefore, it is most critical for a client to ensure that the query result that is received from a publisher that is not trusted is both authentic and complete.
  • the ability to prove the authenticity and completeness of query results can also be very useful in defeating server spoofing attacks, where attackers try to impersonate legitimate servers with their own data servers and feed the clients with malicious information.
  • Currently solutions that are implemented to guarantee the authenticity and completeness of the query results may result in unforeseen problems.
  • a publisher may inadvertently leak information in regard to data records that are outside of a prescribed query space. This result may conflict with implemented access control policies and a client may obtain information that he or she is not allowed to access — thus the privacy of the data is not preserved within the transaction.
  • the shortcomings of the prior art are overcome and additional advantages are provided through the provision of a method for proving the correctness of a query result produced by a data publisher while preserving data privacy.
  • the method comprises delivering a public key of a public key/private key pair from a data owner to a client and delivering data and cryptographic metadata to at least one data publisher, wherein the cryptographic metadata is associated both with the data and the public key of the public key/private key pair.
  • the method further comprises receiving a query from the client, returning a query result and a verification object from the data publisher to the client in response to the query, and verifying the correctness of the query result, .wherein the correctness of the query result is verified utilizing the verification object and the public key.
  • FIG. 1 illustrates one example of a data publishing architecture for outsourced data publishing.
  • FIG. 2 illustrates one example of a one-dimensional CRT in accordance with exemplary embodiments of the present invention.
  • FIG. 3 illustrates one example of a two-dimensional CRT in accordance with exemplary embodiments of the present invention.
  • FIG. 4 shows an exemplary algorithmic function for the insertion of a record into a CRT in accordance with exemplary embodiments of the present invention.
  • FIG. 5 shows an exemplary algorithmic function for the deletion of a record from a CRT in accordance with exemplary embodiments of the present invention.
  • FIG. 6 shows an exemplary algorithmic function for providing evidence of the existence of a record within a cube in accordance with exemplary embodiments of the present invention.
  • FIG. 7 shows an exemplary algorithmic function for providing evidence of the existence of a shell in accordance with exemplary embodiments of the present invention.
  • Exemplary embodiments of the present invention provide a solution for proving the correctness of query results that have been produced by data publishers that are not trusted, while preserving the privacy of the published data. Thus ensuring that the procedure that is used to verify the correctness of any query results does not require the disclosure of any information that is outside an access control area that is assigned to a query requester. Further, the exemplary embodiments of the present invention are configured to efficiently process multi-dimensional query results while continuing to preserve the privacy of the published data.
  • FIG. 1 a data publishing architecture 100 for the publishing of outsourced data is shown.
  • the system of FIG. 1 comprises three parties — a data owner 105, a data publisher 110 and a clientl l5.
  • the architecture as shown is exemplary in nature, in actual data publishing environments there can be more than one data owner 105 in addition to multiple data publishers 110.
  • data is generated or collected by the data owner 105.
  • the data owner 105 delivers the data and any data updates to the data publisher 110.
  • the client 115 queries the data publisher 110 to retrieve data instead of directly querying the data owner 105.
  • the data owner 105 has possession of a pair of public/private keys.
  • the data owner 105 uses the private key of the public/private key pair to perform computational cryptographic techniques over a prescribed dataset wherein cryptographic metadata related to the dataset is produced as a result.
  • the data and metadata 106 are delivered to the data publisher 110.
  • the data publisher 110 queries 108 the data publisher 110
  • the data publisher 110 returns the query result and a proof called a Verification Object (VO) 109 to the client 115.
  • the VO being constructed based on the generated metadata.
  • the correctness of the query result is verified using the corresponding VO along with the data owner's 105 public key that has been previously transmitted 107 to the client 115.
  • each data owner 105 maintains at least one private-public key pair with which the data owner 105 uses to sign data. It is yet further assumed that all data publishers 110 and clients 115 obtain the correct public keys from each data owner 105 via a trusted communication channel. Since the possibility exists that a data publisher 110 could be compromised, a client 115 is assumed to only trust query results that can be verified using the public key of the corresponding data owner's 105. As such, data publishers 110 enforce access control policies to prevent respective clients 115 from gaining access to information that that the client 115 does not have the right to access. Additionally, since various data publishers 110 may operate independently of each other the data publishers have different access control policies; such policies that may be periodically updated.
  • Each point in the k-space is equivalent to a record comprised within a dataset.
  • A;(r) denote the value of the /th attribute of the record.
  • a client 115 may issue a range query Q(L 1 , R 1 , . . . , L k , R k ), wherein the query Q defines a sub-space q of the k-space:
  • the query space of the query Q is thereafter referred to as q.
  • the client 115 issues Q to get the result:
  • the data publisher 110 Upon receiving the query Q 5 the data publisher 110 returns the result T' along with a verification object (VO). The VO is returned along with the result T' in order to guarantee the authenticity and completeness of the query result.
  • VO verification object
  • the data publisher 110 enforces a prescribed set of access control policies against the client 115. For example, suppose there is a payroll database wherein each record within the payroll database contains the payroll information belonging to specific individuals. As such, each record contains information in regard to the salary, age and additional miscellaneous information about each person contained within the record. Enacted enforced access policies ensure that a client 115 can only have access to the records wherein the salaries are in the range between $10,000 and $15,000 and the age of the individual is in the range between 20 and 30 years old. These series of ranges are defined as the accessible space of the client 115.
  • the access policy enforced on a client 115 can be represented as AC(L 1 , R 1 , ...L k , R k ).
  • the accessible space ac of a client is a sub-space of the k-space, wherein:
  • Authenticity is defined as meaning that every record in a query result should be from the data owner's 105 database. For example, suppose the result of a query is T' and the database is T. The result of the query is authentic In the event that T' e T . This aspect can be assured by having a data owner 105 sign every record in their database.
  • Completeness is defined as meaning that every record within a query space should be part of the query result. For example, if we assume that a range query space is q. We will say that the query result is complete in the event that the following equation is satisfied, wherein:
  • Privacy preserving or the preservation of privacy is defined as meaning that a client 115 should not have access to or receive any information about the points/records that are outside of the accessible space of the client 115.
  • r 0 e [0,N)* ⁇ ac represent some point outside of the client's accessible space.
  • Qs be a query sequence and Res(Qs) be the corresponding sequence of query results (which are combined with the corresponding VOs).
  • a sub-space of the k-space in the following form is defined as a cube, wherein:
  • a query space is a cube. Additionally, the accessible space of a client 115 is also referred to as a cube.
  • a sub-space of the k-space in the form C 1 Vc 2 is defined as a shell.
  • C 1 and C 2 are both k-dimensional cubes and c 2 C c 1 .
  • the data owner 105 can sign every record to guarantee authenticity. Since the client 115 acquired the public key of a private-public key pair from the data owner 105, the client 115 can verify the authenticity of the records within the query results, hi further exemplary embodiments, the data owner 105 can organize the data utilizing data structures such as merkle hash trees, in which case the data owner only needs to sign the root of the hash tree.
  • the VO comprises three components: the authentication data structure, which proves the authenticity of the data records in the query result; the number of records in the accessible space of the client 115, which is signed by the data owner 105; and the number of records in the shell which is also authenticated by the data owner 105.
  • the shell is a function of the query, the exemplary embodiments do not require that data publishers 110 to contact the data owner for each query.
  • the authentication data structure as implemented to allow for data publishers 110 to efficiently prove to a client 115 the number of data records that exist within a particular shell.
  • a VO is constructed such that the VO only depends on the records outside of the query space and inside the accessible space of the client 115.
  • a range tree is a data structure that is used in computational geometry to store points in k-space.
  • a data structure that is a modified version of the range tree is utilized — this structure being referred to as a CRT.
  • CRT can be constructed as single (FIG. 2) or multi-dimensional (FIG. 3) computational models.
  • the CRT is used to store a list of numbers X 1 ,... x n .
  • a one dimensional CRT is a binary tree, wherein each node of the tree corresponds to an interval.
  • the CRT node stores the information of interval [node.l, node.r). For each node, there is also a counter to store the number of points in the interval. Further, node.cnt stores the number of points in the interval [node.l, node.r).
  • the size of the interval of a node node.r - node.l is always a power of 2.
  • n' records out of node.cnt fall in the left sub-interval.
  • node will have a left child nodel in the event that n' > 0:
  • node will have a right child node 2 :
  • node.cl and node.c2 to store the left/right child of node. Each one could be nil, further, if the size of the interval for a node is 1 , the node doesn't have any child node.
  • the root node of the tree corresponds to the interval [0, N).
  • An exemplary one- dimensional CRT for the value set ⁇ 5, 12, 15 ⁇ is shown in FIG. 2.
  • a CRT can also be constructed in multi-dimension.
  • a CRT in two-dimensional space initially assume we have a list of points (X 1 , V 1 ),...(x n , y n ).
  • a one dimensional CRT is constructed for the list of numbers X 1 ,... x n .
  • This tree is referred to as the primary structure.
  • node.cnt n'.
  • a one dimensional CRT is then built for this node in order to store information for the numbers y' lv ..y' n '- In this way a primary structure is built, and for every node of the primary structure a secondary structure is built. For each node of the primary structure, we use another field node, sec to record the root of the secondary CRT structure.
  • Figure 2 shows an example of two-dimensional CRT. Using this technique higher dimensional CRTs can be constructed.
  • a node of the primary structure is referred to as a first order node and a node of the secondary structure is referred to as a second order node.
  • a first order node stores the number of points in the area [node.l, node.r) x [0, N).
  • node' is a node belongs to the secondary structure attached to node, then node' stores the number of points in the area [node.l, node.r) x [node'.l, node'.r).
  • a node of a k dimensional CRT stores the number of points in a k-dimensional cube.
  • An exemplary two-dimensional CRT for the value set ⁇ (5, 10), (12, 19), (15, 14) ⁇ is shown in FIG. 3.
  • Exemplary embodiments of the present invention support a variety of CRT operational functions. For example, assume that it is desired to insert a record r into a CRT. The root of the k-dimensional CRT will be node 0 . If the tree is empty, then a node is constructed such that node 0 comprises the following:
  • node 0 .cl nil
  • node 0 .c2 nil
  • nodeo.sec nil
  • FIG. 4 shows an exemplary algorithmic function that can be utilized within exemplary embodiments of the present invention to insert a record into a CRT or to create a new node.
  • function CRT Insert(r, node, t) serves as a recursive function to insert a record r to a fth order node that is named node.
  • the function CRT Insert(r, nodeO, 1) is initially called within node insertion or creation procedures.
  • a list of Mi order nodes is provided as counting proof. Therefore, it is needed for the data owner to sign the node for the Mi order nodes. Assuming that node k is the Mi order node, and nodek is in the secondary structure of (k- l)th order node node k -i. Similarly, nodek-2- . . node!, wherein nodek holds the number of records in the cube:
  • the algorithm records the path and has the data owner sign the pair (c, node k .cnt).
  • CRT Delete(r, node, t) deletes information in regard to record r in the fth order node that is named node in addition to the secondary structure of the node.
  • a CRT can be utilized to provide evidence of the existence of records in a cube.
  • the evidence is a list of non-overlapping kth order CRT nodes signed by the data owner.
  • a recursive function CRT Count Cube(node, t, c) can be utilized to return a list of CRT nodes as evidence.
  • a data owner 105 will maintain a k-dimensional CRT for all the records. For example, if there are n records in the database, the data owner 105 can build an empty CRT and insert all of the data to the CRT. The data owner 105 also signs all the kth order nodes. Additionally, the data owner 105 maintains a counter for each access control space.
  • a CRT can use a small number of non-overlapping nodes that are completely within S to prove that there are at least a points in S. This property is very useful for constructing the VO.
  • the data owner 105 gives a signed CRT and the signed list of access control counters to the data publisher 110.
  • the access control space of the client 115 is ac.
  • the data publisher 110 returns the query result to the client 115 with the VO comprising the signature of each record in the query result, the signed number of records in the access control space ac, and the evidence of the existence of all the records in the shell ac ⁇ q.
  • a data owner 105 desires to update T
  • the data owner- 105 can add a new record into the table, or they could delete a record from the current table.
  • the table updating will change counters of some of the nodes within the CRT structure.
  • the data owner 105 will communicate to the data publishers 110 the desire to update T.
  • the data publishers 110 will receive a set of signed nodes, wherein these signed nodes will be used to replace the existing nodes.
  • the data owners 110 would have different versions of the signed nodes, client 115 should be assured the freshness of the data. In the other words, the client should make sure the publisher does not use the outdated VO to verify the query results. Therefore, instead of signing each individual node, the data owner can have a digest scheme (e.g., a Merkle Tree) to have a root hash of the whole CRT, and make the client aware of the root hash. Also, to keep the client 115 aware of the root hash, the data owner 105 can either sign the root hash periodically, or publish the root hash in their own server.
  • a digest scheme e.g., a Merkle Tree
  • each role will have its own access control space.
  • the accessible space for the client 115 is the union of the access control spaces of all the roles.
  • the client 115 is assigned with r roles.
  • the solution we discussed in previous sections assumes that the accessible space for a client is a cube.
  • a way to extend the solution to multiple roles client is to use the same solution as if the client submits r queries and activate one role each time. Thus allowing for the client 115 to combine all the query results to get the final answer.
  • a potential limitation in regard to the fore-mentioned approach is that two queries in the series of queries can share the same query result records. This would incur redundant communication and computational operations.
  • the client 115 can divide the original query space into a set of smaller (non-overlapping) cube query spaces, which are within different access control spaces. Then the client 115 can submit queries for those smaller cube query spaces, thus ensuring there would be no redundant communication and/or computation.
  • the capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof.
  • one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media.
  • the media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention.
  • the article of manufacture can be included as a part of a computer system or sold separately.

Abstract

The present invention relates to a method for proving the correctness of a query result produced by a data publisher while preserving data privacy. The method comprises delivering a public key of a public key/private key pair from a data owner to a client and delivering data and cryptographic metadata to at least one data publisher, wherein the metadata is associated both with the data and the public key of the public key/private key pair. The method further comprises receiving a query from the client, returning a query result and a verification object from the data publisher to the client in response to the query, and verifying the correctness of the query result, wherein the correctness of the query result is verified utilizing the verification object and the public key.

Description

SYSTEM AND METHOD FOR PRIVACY PRESERVING QUERY VERIFICATION
BACKGROUND OF THE INVENTION
FIELD OF THE INVENTION
[0001] This invention relates to the field of data publishing, and particularly to solutions for the preservation of privacy in query verification of outsourced third-party data publishing models.
DESCRIPTION OF BACKGROUND
[0002] Due to the large amounts of data that is available for publication over the Internet or large scale Intranets and the high frequency of query requests for such data, many data owners may find themselves seeking the services of third-party data publishers. In order to provide better service to their clients, data owners typically provide data for publication to one or more third-party data publishers. Problems with the use of third- party data publishers can arise in the event that the publisher or publishers are not trusted. For example, in some instances a publisher may be malicious, meaning that the publisher has the capability to modify the data and as a result return bogus query results to an unsuspecting client.
[0003] In a further example, the data publisher's server could be compromised — resulting in the data publisher losing control of the security of their own server. Typically, the securing of large online data systems has proving to be a daunting task. Therefore, it is most critical for a client to ensure that the query result that is received from a publisher that is not trusted is both authentic and complete. The ability to prove the authenticity and completeness of query results can also be very useful in defeating server spoofing attacks, where attackers try to impersonate legitimate servers with their own data servers and feed the clients with malicious information. [0004] Currently solutions that are implemented to guarantee the authenticity and completeness of the query results may result in unforeseen problems. For example, in some instances in order to guarantee the completeness of a dataset a publisher may inadvertently leak information in regard to data records that are outside of a prescribed query space. This result may conflict with implemented access control policies and a client may obtain information that he or she is not allowed to access — thus the privacy of the data is not preserved within the transaction.
SUMMARY OF THE INVENTION
[0005] The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a method for proving the correctness of a query result produced by a data publisher while preserving data privacy. The method comprises delivering a public key of a public key/private key pair from a data owner to a client and delivering data and cryptographic metadata to at least one data publisher, wherein the cryptographic metadata is associated both with the data and the public key of the public key/private key pair. The method further comprises receiving a query from the client, returning a query result and a verification object from the data publisher to the client in response to the query, and verifying the correctness of the query result, .wherein the correctness of the query result is verified utilizing the verification object and the public key.
[0006] Computer program products corresponding to the above-summarized methods are also described and claimed herein.
[0007] Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and to the drawings. TECHNICAL EFFECTS
[0008] As a result of the summarized invention, technically we have achieved a solution which results in the increased security and the preservation of privacy of a query verification from a third-party data publishing source.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
[0010] FIG. 1 illustrates one example of a data publishing architecture for outsourced data publishing.
[0011] FIG. 2 illustrates one example of a one-dimensional CRT in accordance with exemplary embodiments of the present invention.
[0012] FIG. 3 illustrates one example of a two-dimensional CRT in accordance with exemplary embodiments of the present invention.
[0013] FIG. 4 shows an exemplary algorithmic function for the insertion of a record into a CRT in accordance with exemplary embodiments of the present invention.
[0014] FIG. 5 shows an exemplary algorithmic function for the deletion of a record from a CRT in accordance with exemplary embodiments of the present invention.
[0015] FIG. 6 shows an exemplary algorithmic function for providing evidence of the existence of a record within a cube in accordance with exemplary embodiments of the present invention. [0016] FIG. 7 shows an exemplary algorithmic function for providing evidence of the existence of a shell in accordance with exemplary embodiments of the present invention.
[0017] The detailed description explains the preferred embodiments of the invention, together with advantages and features, by way of example with reference to the drawings.
DETAILED DESCRIPTION OF THE INVENTION
[0018] One or more exemplary embodiments of the invention are described below in detail. The disclosed embodiments are intended to be illustrative only since numerous modifications and variations therein will be apparent to those of ordinary skill in the art.
[0019] Exemplary embodiments of the present invention provide a solution for proving the correctness of query results that have been produced by data publishers that are not trusted, while preserving the privacy of the published data. Thus ensuring that the procedure that is used to verify the correctness of any query results does not require the disclosure of any information that is outside an access control area that is assigned to a query requester. Further, the exemplary embodiments of the present invention are configured to efficiently process multi-dimensional query results while continuing to preserve the privacy of the published data.
[0020] Turning now to the drawings in greater detail, it will be seen that in FIG. 1 a data publishing architecture 100 for the publishing of outsourced data is shown. As shown the system of FIG. 1 comprises three parties — a data owner 105, a data publisher 110 and a clientl l5. The architecture as shown is exemplary in nature, in actual data publishing environments there can be more than one data owner 105 in addition to multiple data publishers 110. In general, data is generated or collected by the data owner 105. The data owner 105 delivers the data and any data updates to the data publisher 110. Thereafter, the client 115 queries the data publisher 110 to retrieve data instead of directly querying the data owner 105. [0021] The data owner 105 has possession of a pair of public/private keys. Using the private key of the public/private key pair, the data owner 105 performs computational cryptographic techniques over a prescribed dataset wherein cryptographic metadata related to the dataset is produced as a result. The data and metadata 106 are delivered to the data publisher 110. In the event that the client 115 queries 108 the data publisher 110, the data publisher 110 returns the query result and a proof called a Verification Object (VO) 109 to the client 115. The VO being constructed based on the generated metadata. The correctness of the query result is verified using the corresponding VO along with the data owner's 105 public key that has been previously transmitted 107 to the client 115.
[0022] Within the exemplary embodiments of the present invention an assumption is made that all data owners 105 are trusted and secure entities. Further, it is assumed that each data owner 105 maintains at least one private-public key pair with which the data owner 105 uses to sign data. It is yet further assumed that all data publishers 110 and clients 115 obtain the correct public keys from each data owner 105 via a trusted communication channel. Since the possibility exists that a data publisher 110 could be compromised, a client 115 is assumed to only trust query results that can be verified using the public key of the corresponding data owner's 105. As such, data publishers 110 enforce access control policies to prevent respective clients 115 from gaining access to information that that the client 115 does not have the right to access. Additionally, since various data publishers 110 may operate independently of each other the data publishers have different access control policies; such policies that may be periodically updated.
[0023] Following is a general discussion of exemplary embodiments of the present invention. For example, assume that a data owner 105 delivers a table to a data publisher 110, and there are k attributes A1,...Ak comprised of the table schema. Each k attribute is of integer type and the attribute range is [0, N). Therefore, each record can be represented by a point in the ft-space. We let T denote the set of all the points so that: T c [O5N)* Equation 1
[0024] Each point in the k-space is equivalent to a record comprised within a dataset. Given any record r e T , we let A;(r) denote the value of the /th attribute of the record. A client 115 may issue a range query Q(L1, R1, . . . , Lk, Rk), wherein the query Q defines a sub-space q of the k-space:
q = [L1 , Rλ)x...x[Lk ,Rk) c [0,N)* Equation 2
[0025] The query space of the query Q is thereafter referred to as q. The client 115 issues Q to get the result:
T' - {r I r e T A r e q) Equation 3
[0026] Upon receiving the query Q5 the data publisher 110 returns the result T' along with a verification object (VO). The VO is returned along with the result T' in order to guarantee the authenticity and completeness of the query result.
[0027] To protect the privacy of the data owner's 105 records, the data publisher 110 enforces a prescribed set of access control policies against the client 115. For example, suppose there is a payroll database wherein each record within the payroll database contains the payroll information belonging to specific individuals. As such, each record contains information in regard to the salary, age and additional miscellaneous information about each person contained within the record. Enacted enforced access policies ensure that a client 115 can only have access to the records wherein the salaries are in the range between $10,000 and $15,000 and the age of the individual is in the range between 20 and 30 years old. These series of ranges are defined as the accessible space of the client 115. The access policy enforced on a client 115 can be represented as AC(L1, R1, ...Lk, Rk). The accessible space ac of a client is a sub-space of the k-space, wherein:
«c = [Z1-Zi1)X-X[I4,^) c [0,.V)* Equation 4 [0028] If the query space of a query Q is q, it is valid only in the event that q is a sub- space of ac, or, q c ac . Within the exemplary embodiments of the present invention any records that exist outside the accessible space of a client 115 are invisible to the client 115. Further, each client 115 is assigned a set of roles, and each role has an accessible space. The accessible space of the client 115 is represented by the union of all accessible spaces of the assigned roles.
[0029] In order to prove the correctness of a query results it is imperative that three requirements be satisfied, the authenticity, completeness of a query and preservation of the privacy of a query. Authenticity is defined as meaning that every record in a query result should be from the data owner's 105 database. For example, suppose the result of a query is T' and the database is T. The result of the query is authentic In the event that T' e T . This aspect can be assured by having a data owner 105 sign every record in their database.
[0030] Completeness is defined as meaning that every record within a query space should be part of the query result. For example, if we assume that a range query space is q. We will say that the query result is complete in the event that the following equation is satisfied, wherein:
Vr € r r ε q =ϊ r <≡ T' Equation 5
[0031] Privacy preserving or the preservation of privacy is defined as meaning that a client 115 should not have access to or receive any information about the points/records that are outside of the accessible space of the client 115. For example, we assume that the accessible space for a client 115 is ac. All of the points/records that are within the client's accessible space can be represented as v = T Pl ac. Further, let r0 e [0,N)* \ ac represent some point outside of the client's accessible space. Let Qs be a query sequence and Res(Qs) be the corresponding sequence of query results (which are combined with the corresponding VOs). We say the privacy is preserved if for any r0 and any Qs, we have the following.
P(r0 e T I (Qs, Re s(Qs))) = P(r0 e T \ v) Equation 6
Intuitively, this means a client's 115 guess of the record distribution outside their accessible space will not be affected by the query results.
[0032] Within the exemplary embodiments of the present invention the following concepts are defined in k-space. A sub-space of the k-space in the following form is defined as a cube, wherein:
[L1 , R1) x ... x [Lk, Rk) Equation 7
From the problem definition, a query space is a cube. Additionally, the accessible space of a client 115 is also referred to as a cube. A sub-space of the k-space in the form C1Vc2 is defined as a shell. Here C1 and C2 are both k-dimensional cubes and c2 C c1.
[0033] In order to guarantee authenticity, within exemplary embodiments the data owner 105 can sign every record to guarantee authenticity. Since the client 115 acquired the public key of a private-public key pair from the data owner 105, the client 115 can verify the authenticity of the records within the query results, hi further exemplary embodiments, the data owner 105 can organize the data utilizing data structures such as merkle hash trees, in which case the data owner only needs to sign the root of the hash tree.
[0034] Assume that the accessible space of the client 115 is ac, and the query space of the client 115 is q. Further, assume that there are nac records in ac, and there are iiq records in q. Thus ac\q is a shell and there are nac - n<, records in the shell. In order to guarantee completeness, the publisher will prove to the client that there are nac records in ac and there exists at least nac - iiq records in the shell ac - q. Given the above- mentioned proofs in combination with the query result — which is a list of nq records — the client is assured that those nq are the only records in the query space q.
[0035] In order to guarantee authenticity and completeness it is possible to have a data owner sign the number of records in the accessible space of every client 115. To prove the existence of a number of records in the shell efficient proof of the existence of the number of records in the shell is needed. A trivial solution would be to give all the records in the shell, the result of such action being resource intensive and expensive, and therefore impractical. As a solution to this problem, within exemplary embodiments of the present invention Canonical Range Trees (CRT) are implemented, and such usage of CRTs will be further discussed below.
[0036] With the exemplary embodiments of the present invention the VO comprises three components: the authentication data structure, which proves the authenticity of the data records in the query result; the number of records in the accessible space of the client 115, which is signed by the data owner 105; and the number of records in the shell which is also authenticated by the data owner 105. It must be noted that although the shell is a function of the query, the exemplary embodiments do not require that data publishers 110 to contact the data owner for each query. The authentication data structure as implemented to allow for data publishers 110 to efficiently prove to a client 115 the number of data records that exist within a particular shell. In order to preserve privacy as defined in Equation 6, we need to make sure the VO doesn't leak any information outside ac. Therefore, a VO is constructed such that the VO only depends on the records outside of the query space and inside the accessible space of the client 115.
[0037] A range tree is a data structure that is used in computational geometry to store points in k-space. In the present solution a data structure that is a modified version of the range tree is utilized — this structure being referred to as a CRT. We use CRT to store the counting information for data points. And we will use a set of nodes of the tree as proof of existence of records in the shell. [0038] CRTs can be constructed as single (FIG. 2) or multi-dimensional (FIG. 3) computational models. In the instance of a one-dimensional CRT, the CRT is used to store a list of numbers X1,... xn. A one dimensional CRT is a binary tree, wherein each node of the tree corresponds to an interval. Suppose you have a CRT node that is labeled as node. The CRT node stores the information of interval [node.l, node.r). For each node, there is also a counter to store the number of points in the interval. Further, node.cnt stores the number of points in the interval [node.l, node.r).
[0039] The size of the interval of a node node.r - node.l is always a power of 2. We will call the interval [node.l, (node.r +node.l)/2) the left sub-interval and the interval [(node.r + node.l)/2, (node.r - node.l)/2) the right sub-interval. Assume that there are n' records out of node.cnt fall in the left sub-interval. Then node will have a left child nodel in the event that n' > 0:
nodeμl ^ node.l nodei.r = (node.r + node.l)/2 node1.cnt = n'
[0040] Similarly suppose n" nodes fall in the right sub-interval, and n" > 0, then node will have a right child node2:
node2.l = (node.l + node.r)/2 HOdC1T = node.r node^cnt 11^"
[0041] We use node.cl and node.c2 to store the left/right child of node. Each one could be nil, further, if the size of the interval for a node is 1 , the node doesn't have any child node. The root node of the tree corresponds to the interval [0, N). An exemplary one- dimensional CRT for the value set {5, 12, 15} is shown in FIG. 2.
[0042] As mentioned above, a CRT can also be constructed in multi-dimension. As an example, in order to construct a CRT in two-dimensional space initially assume we have a list of points (X1, V1),...(xn, yn). First, a one dimensional CRT is constructed for the list of numbers X1,... xn. This tree is referred to as the primary structure. Thereafter, for every node of the primary structure we assume that there are n' points of which the first coordinator is in the interval [node.l, node.r), thus node.cnt = n'. Let (x'i, y'!),...(x'n' , yV) be these points. A one dimensional CRT is then built for this node in order to store information for the numbers y'lv ..y'n'- In this way a primary structure is built, and for every node of the primary structure a secondary structure is built. For each node of the primary structure, we use another field node, sec to record the root of the secondary CRT structure. Figure 2 shows an example of two-dimensional CRT. Using this technique higher dimensional CRTs can be constructed.
[0043] For a two dimensional CRT, a node of the primary structure is referred to as a first order node and a node of the secondary structure is referred to as a second order node. A first order node stores the number of points in the area [node.l, node.r) x [0, N). Assume that node' is a node belongs to the secondary structure attached to node, then node' stores the number of points in the area [node.l, node.r) x [node'.l, node'.r). Similarly, a node of a k dimensional CRT stores the number of points in a k-dimensional cube. An exemplary two-dimensional CRT for the value set {(5, 10), (12, 19), (15, 14)} is shown in FIG. 3.
[0044] Exemplary embodiments of the present invention support a variety of CRT operational functions. For example, assume that it is desired to insert a record r into a CRT. The root of the k-dimensional CRT will be node0. If the tree is empty, then a node is constructed such that node0 comprises the following:
node0.l = 0, node0.r = N, node0.cnt = 0
node0.cl = nil, node0.c2 = nil, nodeo.sec = nil
This node construct can also be used in the event that it is desired to create a new node. FIG. 4 shows an exemplary algorithmic function that can be utilized within exemplary embodiments of the present invention to insert a record into a CRT or to create a new node. As shown, function CRT Insert(r, node, t) serves as a recursive function to insert a record r to a fth order node that is named node. As such, within operational parameters the function CRT Insert(r, nodeO, 1) is initially called within node insertion or creation procedures.
[0045] Within exemplary embodiments, a list of Mi order nodes is provided as counting proof. Therefore, it is needed for the data owner to sign the node for the Mi order nodes. Assuming that nodek is the Mi order node, and nodek is in the secondary structure of (k- l)th order node nodek-i. Similarly, nodek-2- . . node!, wherein nodek holds the number of records in the cube:
c = [nodei.l, nodet.r) x ... x [nodβk.l, nodβk.r)
Thus, the algorithm records the path and has the data owner sign the pair (c, nodek.cnt).
[0046] The deletion of a CRT record is similar to the insertion of a CRT record; but performed in a reverse procedure. As shown in FIG. 5, the recursive function CRT Delete(r, node, t) deletes information in regard to record r in the fth order node that is named node in addition to the secondary structure of the node.
[0047] Within further exemplary embodiments of the present invention a CRT can be utilized to provide evidence of the existence of records in a cube. The evidence is a list of non-overlapping kth order CRT nodes signed by the data owner. Suppose the cube to be "counted" is c = [Ll5Rl) x ... x [Ld, Rd). In this instance, as shown in FIG. 6, a recursive function CRT Count Cube(node, t, c) can be utilized to return a list of CRT nodes as evidence. In the event that it is desired to provide evidence of a shell, that is the space outside of a query space and inside an accessible space, then the function CRT Count Shell(node, t, c, c') (FIG. 7) can be utilized to provide a list of non-overlapping kth order nodes and evidence of records in the shell of c \ c', wherein:
c = [L1, R1) x ... x [Ld, Rd), and c' = [L' !, R',) x ... x [L' d, R' d), and c' cc. [0048] A data owner 105 will maintain a k-dimensional CRT for all the records. For example, if there are n records in the database, the data owner 105 can build an empty CRT and insert all of the data to the CRT. The data owner 105 also signs all the kth order nodes. Additionally, the data owner 105 maintains a counter for each access control space. Assume that there are m access control spaces acj, ...acm, the data owner maintains and signs the pairs (ac1? cnti),...(acm, cntm). The number of records in access control space ac; is represented by cntj. Further, for a CRT given any k dimensional rectangular space S we can assume that there are a points from T that are inside S.
[0049] A CRT can use a small number of non-overlapping nodes that are completely within S to prove that there are at least a points in S. This property is very useful for constructing the VO. The data owner 105 gives a signed CRT and the signed list of access control counters to the data publisher 110. When a client 115 submits a query and the query space is q, the access control space of the client 115 is ac. The data publisher 110 returns the query result to the client 115 with the VO comprising the signature of each record in the query result, the signed number of records in the access control space ac, and the evidence of the existence of all the records in the shell ac\q.
[0050] In the event that a data owner 105 desires to update T, the data owner- 105 can add a new record into the table, or they could delete a record from the current table. The table updating will change counters of some of the nodes within the CRT structure. The data owner 105 will communicate to the data publishers 110 the desire to update T. Thus the data publishers 110 will receive a set of signed nodes, wherein these signed nodes will be used to replace the existing nodes.
[0051] Since the data publishers 110 would have different versions of the signed nodes, client 115 should be assured the freshness of the data. In the other words, the client should make sure the publisher does not use the outdated VO to verify the query results. Therefore, instead of signing each individual node, the data owner can have a digest scheme (e.g., a Merkle Tree) to have a root hash of the whole CRT, and make the client aware of the root hash. Also, to keep the client 115 aware of the root hash, the data owner 105 can either sign the root hash periodically, or publish the root hash in their own server.
[0052] In the event that a client 115 is assigned with a set of roles, each role will have its own access control space. Thus the accessible space for the client 115 is the union of the access control spaces of all the roles. Suppose the client 115 is assigned with r roles. The solution we discussed in previous sections assumes that the accessible space for a client is a cube. A way to extend the solution to multiple roles client is to use the same solution as if the client submits r queries and activate one role each time. Thus allowing for the client 115 to combine all the query results to get the final answer. A potential limitation in regard to the fore-mentioned approach is that two queries in the series of queries can share the same query result records. This would incur redundant communication and computational operations. Therefore, the client 115 can divide the original query space into a set of smaller (non-overlapping) cube query spaces, which are within different access control spaces. Then the client 115 can submit queries for those smaller cube query spaces, thus ensuring there would be no redundant communication and/or computation.
[0053] The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof. As one example, one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The article of manufacture can be included as a part of a computer system or sold separately.
[0054] Additionally, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided. [0055] While the preferred embodiment to the invention has been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.

Claims

CLAIMSWhat is claimed is:
1. A method for proving the correctness of a query result produced by a data publisher while preserving data privacy, the method comprising:
delivering a public key of a public key/private key pair from a data owner to a client;
delivering data and cryptographic metadata to at least one data publisher, wherein the metadata is associated both with the data and the public key of the public key/private key pair;
receiving a query from the client;
returning a query result and a verification object from the data publisher to the client in response to the query; and
verifying the correctness of the query result, wherein the correctness of the query result is verified utilizing the verification object and the public key.
2. The method of claim 1, wherein the client is assigned an accessible space in which to perform query searches in accordance with a determined access control policy.
3. The method of claim 2, wherein the verification object is generated in accordance with the determined access control policy that is assigned to the client and the query results that are comprised within the accessible space of the client.
4. The method of claim 3, wherein the data owner provides a digital signature stating that there are nac data points comprised within the accessible space (ac) that is assigned to a client.
5. The method of claim 4, wherein a verification object comprises a data point authentication data structure, a signature from the data owner stating the number of data points nac within an accessible space (ac), and additional verification data stating that ac-q comprises at least nac-nq data points.
6. The method of claim 5, wherein verifying the correctness of a query result comprises verifying the authenticity and completeness of query result data.
7. The method of claim 6, wherein a valid query space (q) is comprised of a subspace of the client's accessible space (ac).
8. The method of claim 7, wherein a client does not have access to information in regard to data points outside of the accessible space (ac) that has been assigned to the client.
9. The method of claim 8, wherein a client does not have access to information in regard to the access control polices of additional clients of the data publisher.
10. The method of claim 9, wherein a dataset, an access control policy, and a query are multi-dimensional.
11. A computer program product that includes a computer readable medium useable by a processor, the medium having stored thereon a sequence of instructions which, when executed by the processor, causes the processor to verify the correctness of a query result while preserving data privacy by:
receiving data and cryptographic metadata that is associated with the data and the public key of a public key/private key pair from a data owner;
receiving a query from a client;
returning a query result and a verification object from at least one data publisher to the client in response to the query; and
verifying the correctness of the query result, wherein the correctness of the query result is verified utilizing the verification object and the public key.
12. The computer program product of claim 11, wherein the client is assigned an accessible space in which to perform query searches in accordance with a determined access control policy.
13. The computer program product of claim 12, wherein the verification object is generated in accordance with the determined access control policy that is assigned to the client and the query results that are comprised within the accessible space of the client.
14. The computer program product of claim 13, wherein a digital signature is received from the data owner stating that there are nac data points comprised within an accessible space (ac) that is assigned to a client.
15. The computer program product of claim 14, wherein a verification object comprises a data point authentication data structure, a signature from the data owner stating the number of data points nac within an accessible space (ac), and additional verification data stating that ac-q comprises at least nac-nq data points.
16. The computer program product of claim 15, wherein verifying the correctness of a query result comprises verifying the authenticity and completeness of query result data.
17. The computer program product of claim 16, wherein a valid query space (q) is comprised of a subspace of the client's accessible space (ac).
18. The computer program product of claim 17, wherein a client is not permitted to have access to information in regard to data points outside of the accessible space (ac) that has been assigned to the client.
19. The computer program product of claim 18, wherein a client is not permitted to have access to information in regard to the access control polices of additional clients of the data publisher.
20. The computer program product of claim 19, wherein a dataset, an access control policy, and a query are multi-dimensional.
PCT/CA2008/001436 2008-08-08 2008-08-08 System and method for privacy preserving query verification WO2010015063A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CA2008/001436 WO2010015063A1 (en) 2008-08-08 2008-08-08 System and method for privacy preserving query verification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CA2008/001436 WO2010015063A1 (en) 2008-08-08 2008-08-08 System and method for privacy preserving query verification

Publications (1)

Publication Number Publication Date
WO2010015063A1 true WO2010015063A1 (en) 2010-02-11

Family

ID=41663239

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2008/001436 WO2010015063A1 (en) 2008-08-08 2008-08-08 System and method for privacy preserving query verification

Country Status (1)

Country Link
WO (1) WO2010015063A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ITRM20130114A1 (en) * 2013-02-28 2014-08-29 Marcello Bertozzi IT SYSTEM FOR THE MANAGEMENT AND TRANSMISSION OF INFORMATION AND IMAGES IN RELATIONS BETWEEN INSTITUTIONS AND USERS

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070067403A1 (en) * 2005-07-20 2007-03-22 Grant Holmes Data Delivery System
US20070282843A1 (en) * 2006-04-11 2007-12-06 Medox Exchange, Inc. Systems and methods of managing specification, enforcement, or auditing of electronic health information access or use

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070067403A1 (en) * 2005-07-20 2007-03-22 Grant Holmes Data Delivery System
US20070282843A1 (en) * 2006-04-11 2007-12-06 Medox Exchange, Inc. Systems and methods of managing specification, enforcement, or auditing of electronic health information access or use

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
"14th IFIP 11.3 Conference on Database Security", 21 August 2000, article DEVANBU ET AL.: "(4) "Authentic Third Party Data Publication'", pages: 101 - 112 *
"Proc. 13th European Symposium on Reasearch in Computer Security, Oct. 6-8, 2008", MALAGA, SPAIN, article CHEN ET AL.: "Access Control Friendly Query Verification for Outsourced Data Publishing" *
"Proceedings of ACM SIGMOD International Conference on Management of Data, June 14-16, 2005", BALTIMORE, MARYLAND, USA, article PANG ET AL.: "Verifying Completeness of Relational Query Results in Data Publishing'", pages: 407 - 418 *
"Proceeedings of the 29th VLDB conference", 2003, BERLIN, GERMANY, article MIKLAU ET AL.: "Controlling Access to Published Data Using Cryptography", pages: 898 - 909 *
DEVANBU ET AL.: "'Authentic Data Publication over the Intemet", JOURNAL OF COMPUTER SECURITY, vol. 11, no. ISSUE, 2003, pages 291 - 314 *
LI ET AL.: "'Dynamic Authenticated Index Structures for Outsourced Databases", PROCEEDINGS OF ACMSIGMOD 2006, JUNE 27-29, 27 June 2006 (2006-06-27), CHICAGO, IL, USA *
MAITHILI ET AL.: "Authentication of Outsourced Databases Using Signature Aggregation and Chaining'", DASFAA, 2006, pages 420 - 436 *
MAITHILI ET AL.: "DSAC: integrity for outsourced databases with signature aggregation and -haining", CIKM, 2005, pages 235 - 236 *
MIKLAU ET AL.: "Managing Integrity for Data Exchanged on the Web", 8TH INTERNATIONAL WORKSHOP ON THE WEB AND DATABASES (WEBDB 2005), JUNE 16-17 2005, 16 June 2005 (2005-06-16), BALTIMORE, MARYLAND, USA, pages 13 - 18 *
MYKLETUN ET AL.: "Authentication and Integrity of Outsourced Databases", NETWORK AND DISTRIBUTED SYSTEM SECURITY (NDSS 2004), February 2004 (2004-02-01), SAN DIEGO *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ITRM20130114A1 (en) * 2013-02-28 2014-08-29 Marcello Bertozzi IT SYSTEM FOR THE MANAGEMENT AND TRANSMISSION OF INFORMATION AND IMAGES IN RELATIONS BETWEEN INSTITUTIONS AND USERS
WO2014132223A1 (en) * 2013-02-28 2014-09-04 Giano Telesystems Srl System for managing and displaying information and/or images

Similar Documents

Publication Publication Date Title
US7979711B2 (en) System and method for privacy preserving query verification
Xu et al. Enabling efficient and geometric range query with access control over encrypted spatial data
Ghinita Privacy for location-based services
Poh et al. Searchable symmetric encryption: Designs and challenges
Pang et al. Verifying completeness of relational query results in data publishing
Chen et al. An efficient privacy-preserving ranked keyword search method
Zheng et al. VABKS: Verifiable attribute-based keyword search over outsourced encrypted data
Jarecki et al. Outsourced symmetric private information retrieval
Yiu et al. Enabling search services on outsourced private spatial data
Rady et al. Integrity and confidentiality in cloud outsourced data
US20220255743A1 (en) Cryptographic Pseudonym Mapping Method, Computer System, Computer Program And Computer-Readable Medium
CN112332979A (en) Ciphertext searching method, system and equipment in cloud computing environment
Kamel et al. Dynamic spatial index for efficient query processing on the cloud
Guo et al. A provably secure and efficient range query scheme for outsourced encrypted uncertain data from cloud-based Internet of Things systems
Najafi et al. Verifiable ranked search over encrypted data with forward and backward privacy
Li et al. Privacy-preserving reverse nearest neighbor query over encrypted spatial data
Wang et al. Bucket‐based authentication for outsourced databases
Papadopoulos et al. Separating authentication from query execution in outsourced databases
Wang et al. A dynamic-efficient structure for secure and verifiable location-based skyline queries
Hong et al. Privacy protection and integrity verification of aggregate queries in cloud computing
Liu et al. Dissemination of authenticated tree-structured data with privacy protection and fine-grained control in outsourced databases
Ghinita et al. A secure location-based alert system with tunable privacy-performance trade-off
Yang et al. TRQED: Secure and fast tree-based private range queries over encrypted cloud
Muhammad et al. A secure data outsourcing scheme based on Asmuth–Bloom secret sharing
Wei et al. Integrity assurance for outsourced databases without DBMS modification

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2011521413

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08876690

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 08876690

Country of ref document: EP

Kind code of ref document: A1