WO2024077734A1 - 一种实现构造混淆集的方法和客户端 - Google Patents

一种实现构造混淆集的方法和客户端 Download PDF

Info

Publication number
WO2024077734A1
WO2024077734A1 PCT/CN2022/135252 CN2022135252W WO2024077734A1 WO 2024077734 A1 WO2024077734 A1 WO 2024077734A1 CN 2022135252 W CN2022135252 W CN 2022135252W WO 2024077734 A1 WO2024077734 A1 WO 2024077734A1
Authority
WO
WIPO (PCT)
Prior art keywords
client
server
field
encrypted
encryption
Prior art date
Application number
PCT/CN2022/135252
Other languages
English (en)
French (fr)
Inventor
吴炜
魏长征
陆林鹏
吴行行
闫莺
张辉
Original Assignee
蚂蚁区块链科技(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 蚂蚁区块链科技(上海)有限公司 filed Critical 蚂蚁区块链科技(上海)有限公司
Publication of WO2024077734A1 publication Critical patent/WO2024077734A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Definitions

  • the embodiments of this specification belong to the field of privacy computing technology, and in particular, relate to a method and a client for constructing a confusion set.
  • Privacy-Preserving Computing is a collection of technologies that implement data analysis and computing on the premise of protecting the data itself from external disclosure, making the data available but invisible. Through privacy-preserving computing technology, the value of data can be transformed and released on the premise of fully protecting data and privacy security.
  • the mainstream technologies for realizing privacy-preserving computing mainly include three directions: the first category is the privacy computing technology based on cryptography represented by Secure Multi-Party Computation (SMPC); the second category is the technology derived from the integration of artificial intelligence and privacy protection technology represented by Federated Learning (FL); the third category is the confidential computing (CC) technology based on trusted hardware represented by Trusted Execution Environment (Trust Execution Environment).
  • SMPC Secure Multi-Party Computation
  • FL the technology derived from the integration of artificial intelligence and privacy protection technology represented by Federated Learning
  • the third category is the confidential computing (CC) technology based on trusted hardware represented by Trusted Execution Environment (Trust Execution Environment).
  • DP Differential Privacy
  • Differential Privacy (DP) actually protects the calculation results, not the calculation process
  • Federated Learning, Secure Multi-Party Computation and Confidential Computing protect the calculation process and the intermediate results of the calculation process.
  • the first type of multi-party secure computing includes four basic technologies, namely, Garbled Circuit (GC), Secret Sharing, Oblivious Transfer and Homomorphic Encryption (HE).
  • GC Garbled Circuit
  • HE Homomorphic Encryption
  • homomorphic encryption is a special encryption algorithm that directly performs calculations based on ciphertext, and the calculation results are the same as those based on decrypted plaintext. It includes semi-homomorphic encryption (Partially Homomorphic Encryption, PHE) and fully homomorphic encryption (Fully Homomorphic Encryption, FHE).
  • Secure multi-party computing provides privacy protection for input secret data with its solid security theoretical foundation, thus achieving the security of the privacy-preserving computing process.
  • there are two main implementation technology routes for secure multi-party computing including general secure multi-party computing and specific problem secure multi-party computing.
  • the former can solve various computing problems, but this "universal" technology route usually has a large system and high overhead; the latter designs special protocols for specific problems, such as Private Set Intersection (PSI) and Privacy Information Retrieval (PIR), which can often obtain computing results at a lower cost than general secure multi-party computing protocols, but requires domain experts to carefully design them for application scenarios, and are generally not applicable to general scenarios and have high design costs.
  • PSI Private Set Intersection
  • PIR Privacy Information Retrieval
  • Private set intersection is a method for two parties to obtain the intersection of their data without revealing any additional information. Additional information refers to any information other than the intersection of the data of both parties. Private set intersection is very useful in real-world scenarios, such as data alignment in vertical federated learning, or friend discovery through address books in social software.
  • Privacy information retrieval is a method by which a client retrieves information from a database. During the retrieval process, the querying party hides the query target identifier, and the data service provider provides matching query results but cannot know the specific query object.
  • the purpose of this specification is to provide a method and client for constructing a confusion set, including: a method for constructing a confusion set, wherein the client receives a query base sent by a server, and the query base is obtained by encrypting the database; the encryption/decryption performed by the client and the server on the same target adopts an encryption/decryption algorithm with an interchangeable order; the client sends a sensitive field encrypted by itself to the server, and obtains the same sensitive field encrypted by the server through interaction with the server; the client searches the query base according to the sensitive field encrypted by the server to obtain a first identification set of matching records; the client selects at least one point in the query base except the ID field and the field of interest, uses the selected at least one point as a retrieval condition, constructs a retrieval statement according to the retrieval condition and the original field of interest, and performs the retrieval on the query base to obtain a second identification set of matching records; the client constructs a confusion set according to the first identification set and the second identification set.
  • a client for constructing a confusion set wherein the encryption/decryption performed by the client and the server on the same target adopts an encryption/decryption algorithm with an interchangeable order, and: the client is configured with a query base, and the query base is obtained by the server after encrypting a database; the client sends a sensitive field encrypted by itself to the server, and obtains the same sensitive field encrypted by the server through interaction with the server; the query base is searched according to the sensitive field encrypted by the server to obtain a first identification set of matching records; at least one point in the query base except an ID field and an interest field is selected, the selected at least one point is used as a search condition, a search statement is constructed according to the search condition and the original interest field, and the search is performed on the query base to obtain a second identification set of matching records; and a confusion set is constructed according to the first identification set and the second identification set.
  • a client for constructing a confusion set includes: a processor, a memory, and a program stored therein, wherein when the processor executes the program, the above method is executed.
  • a storage medium is used to store a program, wherein the program, when executed, enables a client to execute the above method.
  • FIG1 is a schematic diagram of a flow chart of an embodiment
  • FIG2 is a schematic diagram of a flow chart of an embodiment
  • FIG3 is a schematic diagram of a flow chart of an embodiment.
  • PIR is a method for clients to retrieve information from a database.
  • the PIR scheme was proposed by Chor B et al. in 1995 to protect the privacy of user queries.
  • the main purpose of the PIR scheme is to ensure that the query request submitted by the querying user to the database on the server is completed without leaking the user's private information, that is, during the retrieval process, the server does not know the user's specific query information and the retrieved data items.
  • the application scenarios of privacy information retrieval include: patients want to query the treatment drugs for their diseases through the medical system. If the disease name is used as the query condition, the medical system will know that the patient may have such a disease, and the patient's privacy will be leaked. Such leakage problems can be avoided through privacy information query.
  • a simple implementation scheme is that the database sends all data to the client, but it cannot protect the database security, that is, it cannot guarantee the privacy of the server.
  • PIR that can guarantee the privacy security of both the client and the database
  • APIR asymmetrical PIR
  • SPIR symmetric PIR
  • APIR asymmetrical PIR
  • CPIR computational security
  • the client often searches based on keywords (without knowing the specific location of the keyword in the database), and hopes to retrieve a string (multi-bit).
  • a practical PIR usually needs to meet multiple conditions such as symmetry, single copy, keyword search, and string return at the same time, and achieve a balance between computational efficiency and communication efficiency.
  • the above conditions can be met or partially met through cryptographic techniques such as homomorphic encryption, oblivious transfer (OT), and one-way trapdoor function.
  • This specification provides an embodiment of a method for implementing private information retrieval.
  • the server may encrypt the database in advance to obtain a query base, and send the query base to the client.
  • the server has a local database that can be queried by the client.
  • the local database of the server is as follows:
  • the server can encrypt the database to obtain the query base.
  • the encryption method can use RSA (a widely used asymmetric encryption algorithm proposed by Ronald Rivest, Adi Shamir and Leonard Adleman in 1977) or ECC (Elliptical Curve Cryptography) encryption.
  • the server can use RSA private key/ECC private key ⁇ to encrypt the data, that is, use RSA private key/ECC private key ⁇ to encrypt each field except the ID column (that is, the data in each cell).
  • the server can generate a secret value ⁇ and properly store it.
  • the secret value ⁇ is also the ECC private key.
  • the server can convert the value of the name field into a point on the elliptic curve through a hash function, which can be expressed as Hash(C) or H(C).
  • ⁇ H(C) is easy to calculate based on the scalar multiplication operation on the elliptic curve, but it is difficult to deduce the value of ⁇ by knowing the result of ⁇ H(C) and H(C).
  • ⁇ H(C) it is also difficult to know the value of H(C) by knowing the result of ⁇ H(C).
  • id_3 ⁇ H(D) ⁇ H(46) ⁇ H(henan) id_4 ⁇ H(E) ⁇ H(34) ⁇ H(shandong) id_5 ⁇ H(F) ⁇ H(54) ⁇ H(shanghai) id_6 ⁇ H(G) ⁇ H(24) ⁇ H(beijing) id_7 ⁇ H(H) ⁇ H(34) ⁇ H(shandong) id_8 ⁇ H(I) ⁇ H(42) ⁇ H(guangdong) id_9 ⁇ H(J) ⁇ H(56) ⁇ H(zhejiang)
  • the above hash function can not only convert the original input into an output of fixed length and format, but also convert the output into the x-axis coordinate of a point on the elliptic curve.
  • any 256-bit data can be used as a legal x-axis coordinate on this elliptic curve.
  • sha256 or sha3-256 can be used, or 256 bits can be intercepted from the results of sha384, sha512 or sha3-384, sha3-512.
  • any hash value (not limited to hash results of 256 bits) can be modulo the order of the elliptic curve, and the product of the modulo result and the generator point multiplication (scalar multiplication) is a point on the elliptic curve.
  • the server can send the query base to the client that needs to perform the search.
  • the server can directly send the query base to the client, such as directly to the client's device, or to the client's proxy server; in another way, the server can publish the query base on a Uniform Resource Locator (URL), and then the client can obtain the query base from the URL.
  • URL Uniform Resource Locator
  • the client can receive the query base and save the received query base locally.
  • the server can generate a secret value ⁇ and store it properly.
  • This secret value is the RSA private key.
  • the server can convert the value of the name field into a point on the elliptic curve through a hash function, which can be expressed as Hash(C) or H(C).
  • the server can send the query base to the client that needs to retrieve.
  • the server can directly send the query base to the client, such as directly to the client's device, or to the client's proxy server; in another way, the server can publish the query base on a Uniform Resource Locator (URL), and then the client can obtain the query base from the URL.
  • URL Uniform Resource Locator
  • the client can receive the query base and save the received query base locally.
  • the interaction process between the client and the server may include the following steps: S110 : The client sends a sensitive field encrypted by itself to the server, and obtains the same sensitive field encrypted by the server through interaction with the server.
  • the client's search condition is that the value of the Age field is 25, but 25 is a sensitive field, that is, the client does not want to let the other party know.
  • the client can encrypt the 25.
  • RSA/ECC private key encryption is used, and the encryption algorithm used by the client is the same as the encryption algorithm used by the server to generate the query base.
  • the client when RSA private key encryption is used, the client generates a secret ⁇ and keeps it properly. Then, the client can encrypt 25 with its own private key ⁇ . Specifically, 25 or the hash value of 25 can be encrypted.
  • the hash encryption of 25 is used as an example.
  • the direct encryption of 25 is similar, and the client and the server use the same hash algorithm. For example, the client uses the same large prime number q as the server as the modulus.
  • the client can perform RSA encryption on the hash value of 25 using ⁇ to obtain (H(25)) ⁇ .
  • the sensitive field sent by the client to the server can be (H(25)) ⁇ , where (H(25)) ⁇ represents the ciphertext of the value 25 of the sensitive field.
  • the client can also construct a search statement, encrypt the sensitive fields in the search statement to obtain the privacy fields, replace the sensitive fields with the privacy fields, and send the replaced privacy search statement to the server.
  • the result is as follows:
  • ? represents the search statement after replacement.
  • the client can encrypt 25 with an RSA private key.
  • the client can use the same hash function as the server to perform hash calculation on 25, and then use ⁇ to perform RSA encryption on the hash value of 25 to obtain (H(25)) ⁇ .
  • the query statement sent by the client to the server is, for example, as follows:
  • (H(25)) ⁇ is the ciphertext, which is the content represented by “?” in the above search statement. After obtaining it, the server cannot know ⁇ and 25.
  • the client uses the same elliptic curve as the server, that is, it has the same elliptic curve parameters and generators.
  • the client generates the secret ⁇ itself and keeps it properly.
  • the client can use its own private key ⁇ to encrypt 25.
  • it can be to encrypt the hash value of 25, and the client and the server use the same hash algorithm.
  • the client can use ⁇ to perform ECC encryption on the hash value of 25 to obtain ⁇ H(25).
  • the sensitive field sent by the client to the server can be ⁇ H(25), where ⁇ H(25) represents the ciphertext of the value 25 of the sensitive field.
  • the client can also construct a search statement, encrypt the sensitive fields in the search statement to obtain the privacy fields, replace the sensitive fields with the privacy fields, and send the replaced privacy search statement to the server.
  • the result is as follows:
  • ? represents the search statement after replacement.
  • the client can encrypt 25 with an ECC private key.
  • the client uses the same elliptic curve as the server, that is, it has the same elliptic curve parameters and generators.
  • the client can replace the sensitive fields in the search statement with its own ECC private key and send the replaced privacy search statement to the server.
  • the client generates a secret ⁇ and saves it properly.
  • the client can use the same hash function as the server to perform hash calculation on 25, and then use ⁇ to perform ECC encryption on the hash value of 25 to obtain ⁇ H(25).
  • the query statement sent by the client to the server is, for example, as follows:
  • ⁇ H(25) is the ciphertext, which is the content represented by “?” in the above search statement. After obtaining it, the server cannot know ⁇ and 25.
  • the client obtains the same sensitive field encrypted by the server through interaction with the server, which may include the server using its own key to re-encrypt the sensitive field encrypted by the client and then sending it to the client, and the client using its own key to decrypt the sensitive field encrypted twice to obtain the sensitive field encrypted by the server.
  • the core of this content is to find an encryption algorithm that can exchange the order of decryption for two consecutive encryption operations (two parties encrypt successively).
  • the two parties agree to use the same elliptic curve, that is, have the same elliptic curve parameters and generators, each holding private keys ⁇ and ⁇ , and the encryption operation is to perform scalar multiplication with ⁇ (or ⁇ ).
  • the encryption results can be decrypted in different orders.
  • both parties agree to use the same large prime number q and primitive root g, each holding private keys ⁇ and ⁇ .
  • the encryption operation is to use ⁇ (or ⁇ ) to exponentiate and modulo q. No matter whether ⁇ is used for encryption first and ⁇ is used for encryption or ⁇ is used for encryption first and ⁇ is used for encryption, the encryption results can be decrypted in the same or different order.
  • the encryption/decryption performed by the client and the server on the same target uses an encryption/decryption algorithm with interchangeable order.
  • the server may encrypt the privacy field again and return it to the client, or after receiving the sensitive field sent by the client and encrypted by the client itself, the server may encrypt the encrypted sensitive field again with the server's own key and return it to the client. Then, the client uses its own key to decrypt the twice encrypted sensitive field to obtain the sensitive field encrypted by the server.
  • case 1 the server can receive (H(25)) ⁇ sent by the client.
  • the server can re-encrypt the encrypted sensitive field (ie, the private field) and return the re-encrypted sensitive field to the client. Specifically, the server can re-encrypt the private field (H(25)) ⁇ using its own RSA private key ⁇ to obtain ((H(25)) ⁇ ) ⁇ .
  • the server can encrypt the private field again and return the re-encrypted private field to the client.
  • the server can re-encrypt the privacy field (H(25)) ⁇ using its own RSA private key ⁇ to obtain ((H(25)) ⁇ ) ⁇ .
  • the specific process is similar to the above and will not be repeated here.
  • case 2 the server can receive ⁇ H(25) sent by the client.
  • the server can re-encrypt the private field and return the re-encrypted private field to the client. Specifically, the server can re-encrypt the private field ⁇ H(25) using its own ECC private key ⁇ to obtain ⁇ H(25).
  • the server can encrypt the private field again and return the re-encrypted private field to the client.
  • the server can re-encrypt the privacy field ⁇ H(25) using its own ECC private key ⁇ to obtain ⁇ H(25).
  • the specific process is similar to the above and will not be repeated here.
  • the server uses its own key to re-encrypt the sensitive field (i.e., the privacy field) encrypted by the client and sends it to the client
  • the client can use its own key to decrypt the twice-encrypted privacy field to obtain the sensitive field encrypted by the server.
  • the client can use the inverse element of its own private key ⁇ Decrypt the twice encrypted sensitive fields as follows: In this way, the client obtains the same sensitive field encrypted by the server, namely (H(25)) ⁇ .
  • S120 The client searches the query base according to the sensitive field encrypted by the server, obtains the identifier of the matching record, and returns the identifier to the server.
  • the client can obtain the same sensitive field encrypted by the server.
  • the client returns the identifier of the matching record to the server, which may include two situations.
  • the client constructs a search statement, encrypts the sensitive fields in the search statement to obtain the private fields, replaces the sensitive fields with the private fields, and sends the replaced private search statement to the server.
  • the client can directly return the identifier of the matching record to the server.
  • the client sends the value of the sensitive field encrypted by itself to the server.
  • the client can construct a search statement, for example, the search statement is:
  • the client may send the constructed search formula to the server, where the search formula includes the identifier of the matching record and indicates that the field of interest is Name, that is, the field name immediately following select.
  • S130 The server returns the value of the field of interest in the record corresponding to the identifier in the database to the client.
  • the client can locate the identifier of the field to be queried in the query base by interacting with the server and the query base without exposing the plain text of the database, and further initiate a query to the server according to the identifier to obtain the value of the field of interest in the record corresponding to the identifier.
  • the client can locate the identifier of the field to be queried in the query base by interacting with the server and the query base without exposing the plain text of the database, and further initiate a query to the server according to the identifier to obtain the value of the field of interest in the record corresponding to the identifier.
  • this embodiment does not need to pay attention to the specific position (bit position) of the keyword to be retrieved in the database, can realize the query of the string, and can support structured query language (Structured Query Language, SQL).
  • SQL Structured Query Language
  • the database is still kept on the server, and the query base obtained by encrypting the database is configured to the client, so that the client can locate the data based on the query base to obtain the identifier of the record when searching.
  • the encryption characteristics of the query base prevent the client from obtaining the content of the database, ensuring the privacy protection of the database by the server.
  • the form of the database and query base in this embodiment can be called "asymmetric dual copies" when a server configures the database and a client configures the query base, and can be called "asymmetric multiple copies" when multiple clients configure the query base.
  • the client can initiate a query on the field of interest, such as the Name field to be queried in the above select Name... This exposes the client's fields of interest to a certain extent.
  • you can query the records that meet the conditions, that is, the entire row of data that meets the conditions, which can protect the privacy of the client, but requires the server to return the entire record, which exposes the entire row of data on the server to a certain extent.
  • the result returned by the server can be the record of id_1, for example as follows:
  • the server can encrypt and return the record corresponding to the identifier in the database/the value of the field of interest in the corresponding record to the client.
  • the server can use the symmetric key negotiated with the client to encrypt the record corresponding to the identifier in the database/the value of the field of interest in the corresponding record and return it to the client, or use the public key in the asymmetric key of the client to encrypt the record corresponding to the identifier in the database/the value of the field of interest in the corresponding record and return it to the client, so that the client can decrypt it with its own private key, and use a digital envelope method, etc.
  • the client directly returns the matched ID to the server.
  • the record corresponding to the ID or the field of interest in the record can be obtained from the server, such as S130, this will expose the privacy of the client to a certain extent, that is, the server will know that the identifier that the client wants to query is id_1.
  • it can be implemented in the following embodiments:
  • S210 The client sends the sensitive field encrypted by itself to the server, and obtains the same sensitive field encrypted by the server through interaction with the server.
  • the client's search condition is that the value of the Age field is 25, but 25 is a sensitive field, that is, the client does not want to let the other party know.
  • the client can encrypt the 25.
  • RSA/ECC private key encryption is used, and the encryption algorithm used by the client is the same as the encryption algorithm used by the server to generate the query base.
  • the client when RSA private key encryption is used, the client generates a secret ⁇ and keeps it properly. Then, the client can encrypt 25 with its own private key ⁇ . Specifically, 25 or the hash value of 25 can be encrypted.
  • the hash encryption of 25 is used as an example.
  • the direct encryption of 25 is similar, and the client and the server use the same hash algorithm. For example, the client uses the same large prime number q as the server as the modulus.
  • the client can perform RSA encryption on the hash value of 25 using ⁇ to obtain (H(25)) ⁇ .
  • the sensitive field sent by the client to the server can be (H(25)) ⁇ , where (H(25)) ⁇ represents the ciphertext of the value 25 of the sensitive field.
  • the client uses the same elliptic curve as the server, that is, it has the same elliptic curve parameters and generators.
  • the client generates the secret ⁇ itself and keeps it properly.
  • the client can use its own private key ⁇ to encrypt 25.
  • it can be to encrypt the hash value of 25, and the client and the server use the same hash algorithm.
  • the client can use ⁇ to perform ECC encryption on the hash value of 25 to obtain ⁇ H(25).
  • the sensitive field sent by the client to the server can be ⁇ H(25), where ⁇ H(25) represents the ciphertext of the value 25 of the sensitive field.
  • the client obtains the same sensitive field encrypted by the server through interaction with the server, which may include the server using its own key to re-encrypt the sensitive field encrypted by the client and then sending it to the client, and the client using its own key to decrypt the sensitive field encrypted twice to obtain the sensitive field encrypted by the server.
  • the core of this content is to find an encryption algorithm that can exchange the order of decryption for two consecutive encryption operations (two parties encrypt successively).
  • the two parties agree to use the same elliptic curve, that is, have the same elliptic curve parameters and generators, each holding private keys ⁇ and ⁇ , and the encryption operation is to perform scalar multiplication with ⁇ (or ⁇ ).
  • the encryption/decryption performed by the client and the server on the same target uses an encryption/decryption algorithm with interchangeable order.
  • the server After the server receives the sensitive field sent by the client and encrypted by the client itself, the server encrypts the encrypted sensitive field again with the server's own key and returns it to the client. Then, the client uses its own key to decrypt the twice encrypted sensitive field to obtain the sensitive field encrypted by the server.
  • case 1 the server can receive (H(25)) ⁇ sent by the client.
  • the server can re-encrypt the encrypted sensitive field (ie, the privacy field) and return the re-encrypted sensitive field to the client. Specifically, the server can re-encrypt the privacy field (H(25)) ⁇ using its own RSA private key ⁇ to obtain ((H(25)) ⁇ ) ⁇ .
  • case 2 the server can receive ⁇ H(25) sent by the client.
  • the server can re-encrypt the private field and return the re-encrypted private field to the client. Specifically, the server can re-encrypt the private field ⁇ H(25) using its own ECC private key ⁇ to obtain ⁇ H(25).
  • the server uses its own key to re-encrypt the sensitive field (i.e., the privacy field) encrypted by the client and sends it to the client
  • the client can use its own key to decrypt the twice-encrypted privacy field to obtain the sensitive field encrypted by the server.
  • the client can use the inverse element of its own private key ⁇ Decrypt the twice encrypted sensitive fields as follows: In this way, the client obtains the same sensitive field encrypted by the server, namely (H(25)) ⁇ .
  • S220 The client searches the query base according to the sensitive field encrypted by the server to obtain the identifier of the matching record.
  • the client can obtain the same sensitive field encrypted by the server.
  • S230 The server returns the value of the field of interest in the record corresponding to the set of identifiers of a predetermined size including the matching identifier in the database to the client by an oblivious transmission method.
  • the client does not return the matched ID to the server, so the server cannot know which record or records the client wants to find; in S210, the sensitive field sent by the client after client encryption makes the server also unable to know which record or records the sensitive field searched by the client will hit, and only the client knows it. In this way, the privacy of the client is protected. However, the search still needs to be completed in the end, which requires the server to return the record that the client wants to query to the client.
  • the server can use oblivious transmission.
  • Oblivious Transfer can be implemented based on RSA, ECC, etc., and can implement multiple OTs such as 2-choose-1, n-choose-1, m-choose-1, m-choose-k (k ⁇ m ⁇ n).
  • 2-choose-1 OT as an example to illustrate its principle, the sender has two secrets, m1 and m2, and needs to send two secrets to the receiver. The receiver can only choose to decrypt one of them and cannot know the other. At the same time, the sender cannot know which one the receiver has chosen.
  • RSA a simple implementation process of 2-choose-1 is as follows:
  • the sender generates two different pairs of public and private keys and makes both public keys public. These two public keys are public key 1 and public key 2. Assume that the receiver wants to know m1, but does not want the sender to know that he wants m1.
  • the receiver generates a random number r, encrypts r with public key 1, and sends it to the sender.
  • the sender decrypts the encrypted r with its own two private keys, decrypting with private key 1 to get r1, and decrypting with private key 2 to get r2.
  • r1 is equal to r
  • r2 is a string of meaningless numbers (also the decryption result).
  • the sender does not know which public key the receiver used for encryption, so the sender does not know which of the r1 and r2 calculated by itself is the real r.
  • the sender After receiving m1 and m2, the sender symmetrically encrypts m1 with r1 and symmetrically encrypts m2 with r2, and sends the two symmetrically encrypted results to the receiver.
  • the sender does not know which one of m1 and m2 the receiver has calculated.
  • n-choose-1 2 public-private key pairs can be expanded to n public-private key pairs, which becomes n-choose-1 OT.
  • the core of n-choose-1 is that the server uses n different keys to encrypt n records in the data table/the values of the fields of interest in the corresponding records to obtain n encryption results, and sends the n encryption results to the client; the client uses the key corresponding to the matching identifier to decrypt the 1 encryption result corresponding to the matching identifier among the n encryption results sent by the server.
  • S231 The server generates n different public and private key pairs in advance and publishes the public key.
  • n is equal to the number of records in the database.
  • the server generates n different public-private key pairs (pk-sk; pk is publick key, indicating public key; sk is secret key, indicating private key; public key can be made public, private key needs to be kept secret), for example, pk 0 -sk 0 , pk 1 -sk 1 , pk 2 -sk 2 , ..., pk n-1 -sk n-1 , and makes these n public keys public, that is, pk 0 , pk 1 , pk 2 , ..., pk n-1 .
  • the client can obtain these n public keys.
  • S232 The client generates a random number r, encrypts r with the public key corresponding to the desired ID, and sends the encrypted number to the server.
  • the client wants to obtain the record with id_1, but does not want the server to know that the record the client wants to obtain is the one with id_1.
  • the client can use pk 1 to encrypt r and send it to the server.
  • the above order mainly means that there is a corresponding relationship between ID and public key, and such a corresponding relationship can be known by the client.
  • the client wants to obtain the record with id_1 but does not want the server to know that the record the client wants to obtain is the one with id_1.
  • the client can use pk 1 corresponding to id_1 to encrypt r and send it to the server; similarly, the client wants to obtain the record with id_t but does not want the server to know that the record the client wants to obtain is the one with id_t.
  • the client can use pk t corresponding to id_t to encrypt r and send it to the server.
  • the server uses sk 0 , sk 1 , sk 2 , ..., sk n-1 to decrypt the random number r encrypted by pk 1.
  • the server uses sk 0 to decrypt to obtain r0, uses sk 1 to decrypt to obtain r1, ..., and uses sk n-1 to decrypt to obtain r(n-1).
  • r1 is equal to r, because only the decryption with sk 1 is encrypted with the corresponding pk 1 ; and the results r0, r2, ..., r(n-1) obtained by decryption with sk 0 , sk 2 , ..., sk n-1 that do not correspond to pk 1 will not be the same as r.
  • the server only obtains the decryption results of the same form, and does not know what the real r is, nor does it know which public key the client used to encrypt. In other words, the server does not know which public key the client used to encrypt r, so the server does not know which of the n decryption results r0, r1, r2, ..., r(n-1) is the real r.
  • S234 The server symmetrically encrypts each record in the database according to the serial number using the decryption result of the corresponding serial number, and sends the symmetrically encrypted result to the client.
  • the server symmetrically encrypts the record id_0 using r0, symmetrically encrypts the record id_1 using r1, ..., symmetrically encrypts the record id_n-1 using r(n-1), and sends the n symmetric encryption results to the client.
  • the client uses the random number r to symmetrically decrypt the encryption result corresponding to the ID expected to be obtained in the symmetrical encryption result to obtain a retrieval result.
  • the client uses the random number r to symmetrically decrypt the encryption result corresponding to the ID expected to be obtained in the symmetric encryption result.
  • the client expects to obtain the value of the field of interest in the record/record corresponding to id_1, then the client uses the corresponding public key pk 1 to encrypt the random number r;
  • the server uses r0, r1, r2, ..., r(n-1) to symmetrically encrypt the values of the fields of interest in the corresponding records/records of id_0, id
  • the client can obtain the correct search result, that is, the value of the corresponding record/the field of interest in the corresponding record.
  • the client can only use the random number r to symmetrically decrypt the encryption result corresponding to the ID expected to be obtained from the n symmetric encryption results, that is, the client only uses r to decrypt the encryption result of id_1, so as to obtain the corresponding record of id_1/the value of the field of interest in the corresponding record, without using r to symmetrically decrypt the symmetric encryption results of id_0, id_2,..., id_n-1, because the client can know that these encryption results are not symmetric encrypted using r, and even if r is used for symmetric decryption, the correct result cannot be obtained.
  • the client uses the random number r to symmetrically decrypt the n symmetrical encryption results, and the following explanation is given:
  • the server uses r0, r1, r2, ..., r(n-1) to symmetrically encrypt the values of the fields of interest in the corresponding records/records id_0, id_1, ..., id_n-1 respectively:
  • the Enc mentioned above means encryption (Encrypt), and the first part id_0, id_1, id_2, ..., id_n-1 in the brackets of Enc() represent the values of the fields of interest in n records/n records, and the second part r0, r1, r2, ..., r(n-1) represent the encryption key.
  • the client uses the random number r to symmetrically decrypt the encryption result in S234. Specifically, the client uses the random number r to symmetrically decrypt the following contents respectively:
  • the above Dec means decryption (Decrypt), the first part of Dec() represents the decryption object, which is the encryption result above, and the second part of Dec() represents the key used for decryption.
  • S231 may be after S230 or before S230, which is not limited here.
  • the server does not know which ID or IDs the client is querying, but encrypts all records in the database and returns them to the client, protecting the privacy of the client.
  • the server uses n private keys to decrypt the received encrypted r respectively, so a large number of asymmetric decryption calculations are performed, which consumes a large amount of CPU and memory resources.
  • the transmission of n symmetric encrypted results in S234 will also occupy a large amount of bandwidth. Especially when the number n is relatively large, the server's calculation amount is large and the bandwidth occupancy is also large.
  • n-choose-k if the possible matching result is greater than 1, for example, k (k>1), it can be achieved through n-choose-k oblivious transfer.
  • n-choose-k one implementation is to group each k of the n records into a set, and each set corresponds to a public-private key pair, so that there will be a total of (C represents the combination formula, the number of combinations consisting of any k out of n).
  • C represents the combination formula, the number of combinations consisting of any k out of n).
  • the oblivious transmission method of option 1 transmits all records corresponding to the identifier including the matching identifier in the database/the values of the fields of interest in the corresponding records to the client.
  • the implementation process of the 1-choose-oblivious transmission method is similar to the implementation process of the n-choose-1 oblivious transmission method.
  • Different keys are used to encrypt each k records in the data table/the value of the field of interest in the corresponding record to obtain The encrypted result is sent The client uses the key corresponding to the matching identifier to decrypt the encrypted result sent by the server.
  • the specific implementation is similar to the above-mentioned process of S231-S235, which will not be repeated here.
  • the above r can also be a public key in an asymmetric key.
  • the server uses r0 to asymmetrically encrypt the record id_0, uses r1 to asymmetrically encrypt the record id_1, ..., uses r(n-1) to asymmetrically encrypt the record id_n-1, and sends these n encrypted results to the client.
  • the client After the client receives these encrypted results, it can use its own private key to asymmetrically decrypt the encrypted result corresponding to the ID it expects to obtain to obtain the result. The following is similar and will not be repeated.
  • this specification provides the following implementation method that adds the construction of a confusion set: S310: The client sends the sensitive field encrypted by itself to the server, and obtains the same sensitive field encrypted by the server through interaction with the server.
  • the client's search condition is that the value of the Age field is 25, but 25 is a sensitive field, that is, the client does not want to let the other party know.
  • the client can encrypt the 25.
  • RSA/ECC private key encryption is used, and the encryption algorithm used by the client is the same as the encryption algorithm used by the server to generate the query base.
  • the client when RSA private key encryption is used, the client generates a secret ⁇ and keeps it properly. Then, the client can encrypt 25 with its own private key ⁇ . Specifically, 25 or the hash value of 25 can be encrypted.
  • the hash encryption of 25 is used as an example.
  • the direct encryption of 25 is similar, and the client and the server use the same hash algorithm. For example, the client uses the same large prime number q as the server as the modulus.
  • the client can perform RSA encryption on the hash value of 25 using ⁇ to obtain (H(25)) ⁇ .
  • the sensitive field sent by the client to the server can be (H(25)) ⁇ , where (H(25)) ⁇ represents the ciphertext of the value 25 of the sensitive field.
  • the client uses the same elliptic curve as the server, that is, it has the same elliptic curve parameters and generators.
  • the client generates the secret ⁇ itself and keeps it properly.
  • the client can use its own private key ⁇ to encrypt 25.
  • it can be to encrypt the hash value of 25, and the client and the server use the same hash algorithm.
  • the client can use ⁇ to perform ECC encryption on the hash value of 25 to obtain ⁇ H(25).
  • the sensitive field sent by the client to the server can be ⁇ H(25), where ⁇ H(25) represents the ciphertext of the value 25 of the sensitive field.
  • the client obtains the same sensitive field encrypted by the server through interaction with the server, which may include the server using its own key to re-encrypt the sensitive field encrypted by the client and then sending it to the client, and the client using its own key to decrypt the sensitive field encrypted twice to obtain the sensitive field encrypted by the server.
  • the core of this content is to find an encryption algorithm that can exchange the order of decryption for two consecutive encryption operations (two parties encrypt successively).
  • the two parties agree to use the same elliptic curve, that is, have the same elliptic curve parameters and generators, each holding private keys ⁇ and ⁇ , and the encryption operation is to perform scalar multiplication with ⁇ (or ⁇ ).
  • the encryption results can be decrypted in different orders.
  • both parties agree to use the same large prime number q and primitive root g, each holding private keys ⁇ and ⁇ .
  • the encryption operation is to use ⁇ (or ⁇ ) to exponentiate and modulo q. No matter whether ⁇ is used for encryption first and ⁇ is used for encryption or ⁇ is used for encryption first and ⁇ is used for encryption, the encryption results can be decrypted in the same or different order.
  • the encryption/decryption performed by the client and the server on the same target uses an encryption/decryption algorithm with interchangeable order.
  • the server After the server receives the sensitive field sent by the client and encrypted by the client itself, the server encrypts the encrypted sensitive field again with the server's own key and returns it to the client. Then, the client uses its own key to decrypt the twice encrypted sensitive field to obtain the sensitive field encrypted by the server.
  • case 1 the server can receive (H(25)) ⁇ sent by the client.
  • the server can re-encrypt the encrypted sensitive field (ie, the privacy field) and return the re-encrypted sensitive field to the client. Specifically, the server can re-encrypt the privacy field (H(25)) ⁇ using its own RSA private key ⁇ to obtain ((H(25)) ⁇ ) ⁇ .
  • case 2 the server can receive ⁇ H(25) sent by the client.
  • the server can re-encrypt the private field and return the re-encrypted private field to the client. Specifically, the server can re-encrypt the private field ⁇ H(25) using its own ECC private key ⁇ to obtain ⁇ H(25).
  • the server uses its own key to re-encrypt the sensitive field (i.e., the privacy field) encrypted by the client and sends it to the client
  • the client can use its own key to decrypt the twice-encrypted privacy field to obtain the sensitive field encrypted by the server.
  • the client can use the inverse element of its own private key ⁇ Decrypt the twice encrypted sensitive fields as follows: In this way, the client obtains the same sensitive field encrypted by the server, namely (H(25)) ⁇ .
  • S320 The client searches the query base according to the sensitive field encrypted by the server to obtain the identifier of the matching record.
  • the client can obtain the same sensitive field encrypted by the server.
  • S330 The server returns the value of the field of interest in the record corresponding to the set of identifiers of a predetermined size including the matching identifier in the database to the client by an oblivious transmission method.
  • the sensitive fields sent by the client in S310 are encrypted by the client so that the server cannot know which record the sensitive fields searched by the client will hit, and only the client knows it. In this way, the privacy of the client is protected. However, the search still needs to be completed in the end, which requires the server to return the record that the client wants to query to the client.
  • FIG2 provides an implementation method of n-choose-1 oblivious transmission.
  • m-choose-1 oblivious transmission can be adopted, where m ⁇ n.
  • the client may also not return the matched ID alone to the server, but confuse the matched ID with some other forged IDs to form a confusion set, and send the confusion set to the server. In this way, the server cannot accurately know which record in the confusion set the client wants to find, and it is necessary to ensure that the client can only obtain the record to be found, but cannot obtain other records.
  • the confusion set can also be sent in the following S332, which is not limited here.
  • the server generates n different public-private key pairs (pk-sk; pk is publick key, indicating public key; sk is secret key, indicating private key), for example, pk 0 -sk 0 , pk 1 -sk 1 , pk 2 -sk 2 , ..., pk n-1 -sk n-1 , and publishes these n public keys, that is, publishes pk 0 , pk 1 , pk 2 , ..., pk n-1 . After the server publishes these n ordered public keys, the client can obtain these n public keys.
  • S332 The client generates a confusion set of size m including the desired ID, generates a random number r, encrypts r with the public key corresponding to the desired ID, and sends the encrypted number r together with the confusion set to the server.
  • the client wants to obtain the record with id_1, but does not want the server to know that the record the client wants to obtain is the record with id_1, so a confusion set of size m is generated.
  • the confusion set is, for example: ⁇ id_1, id_2, id_3, id_4 ⁇ .
  • the four IDs and public key pairs have the following corresponding relationship:
  • the client can use pk 1 to encrypt r and send it to the server together with the confusion set. For example:
  • the client can send the confusion set together with the search statement to the server.
  • the client can use pk 1 to encrypt r and send it together with the search statement containing the confusion set, for example:
  • the server uses sk 1 , sk 2 , sk 3 , and sk 4 to decrypt the random number r encrypted by pk 1. For example, the server uses sk 1 to decrypt to get r1, sk 2 to decrypt to get r2, sk 3 to decrypt to get r3, and sk 4 to decrypt to get r4.
  • r1 is equal to r, because only the decryption with sk 1 is encrypted with the corresponding pk 1 ; and the results r2, r3, and r4 obtained by decryption with sk 2 , sk 3 , and sk 4 that do not correspond to pk 1 will not be the same as r.
  • the server only obtains the decryption results of the same form, and does not know what the real r is, nor does it know which public key the client used to encrypt. In other words, the server does not know which public key the client used to encrypt r, so the server does not know which of the four decryption results r1, r2, r3, and r4 is the real r.
  • the server After the server receives the obfuscation set ⁇ id_1, id_2, id_3, id_4 ⁇ , it can know from the obfuscation set that the data the client wants to obtain is one of the four IDs in the obfuscation set, but it is not sure which one it is, thereby protecting the client's privacy.
  • S334 The server symmetrically encrypts the record specified in the obfuscation set using the decryption result of the corresponding serial number, and sends the symmetrically encrypted result to the client.
  • the server symmetrically encrypts the record of id_1 using r1, symmetrically encrypts the record of id_2 using r2, symmetrically encrypts the record of id_3 using r3, and symmetrically encrypts the record of id_4 using r4, and sends the four symmetric encryption results to the client.
  • S335 The client uses the random number r to symmetrically decrypt the encryption result corresponding to the ID expected to be obtained in the symmetrical encryption result to obtain a retrieval result.
  • the client uses the random number r to symmetrically decrypt the encryption result corresponding to the ID expected to be obtained in the symmetric encryption result.
  • the client expects to obtain the value of the field of interest in the record/record corresponding to id_1, then the client uses the corresponding public key pk 1 to encrypt the random number r;
  • the server uses r1, r2, r3, and r4 to symmetrically encrypt the values of the fields of interest in the corresponding records/records of id_1, id_2, id_3, and id_4, respectively, and sends the
  • the client can obtain the correct search result, that is, the value of the corresponding record/the field of interest in the corresponding record.
  • the client can only use the random number r to symmetrically decrypt the encryption result corresponding to the ID expected to be obtained among the four symmetric encryption results, that is, the client only uses r to decrypt the encryption result of id_1, so as to obtain the corresponding record of id_1/the value of the field of interest in the corresponding record, and there is no need to use r to symmetrically decrypt the symmetric encryption results of id_2, id_3, and id_4, because the client can know that these encryption results are not symmetric encrypted using r, and even if r is used for symmetric decryption, the correct result cannot be obtained.
  • the construction and transmission of the obfuscation set can be decoupled from the execution of the OT protocol, and the OT protocol can be used to transmit the key.
  • the client can send an obfuscation set of size m to the server.
  • the client knows which of the obfuscation sets of size m is the identifier of the result that is really wanted to be obtained.
  • the server can generate m symmetric keys. Through the OT of m to 1, the client can obtain a specified symmetric key, that is, the client obtains the symmetric key corresponding to the identifier that really wants to obtain the result.
  • the server can encrypt the records corresponding to the m identifiers in the client obfuscation set with the corresponding symmetric key and send them to the client, so that the client uses the correct symmetric key to decrypt the result that is really wanted to obtain, thereby obtaining the result.
  • the server can generate m symmetric keys in advance, so that the key preparation work can be completed in batches before the OT interaction, without occupying the time of the OT protocol execution.
  • the server transmits the m corresponding records of the identifiers specified in the obfuscation set/the values of the fields of interest in the corresponding records to the client through m-choose-1 oblivious transmission.
  • the number of possible matching results is greater than 1, for example, k (k>1), this can be achieved through m-choose-k oblivious transmission.
  • S310 to S330 provide a solution for constructing a confusion set to protect the privacy of the client.
  • the confusion set may not be constructed reasonably, but it is still easy for the server to guess the ID that the client really wants to query.
  • Age and Native_place are the same in the two rows of id_4 and id_7, that is, the two rows cannot be distinguished by Age and Native_place.
  • the client for example, constructs a confusion set of ⁇ id_1, id_4 ⁇ and constructs a search statement of:
  • the server After the server receives the search statement containing the confusion set, it can be learned that the field of interest in the search statement is Name, not Age and Native_place, which means that the sensitive fields in S310 and S320 of the client should be Age and/or Native_place.
  • the element id_4 in the confusion set cannot be distinguished from the row of id_7 from Age and/or Native_place. In other words, if the elements in the confusion set are located through the sensitive fields Age and/or Native_place, it is unreasonable to have only id_4 but no id_7. In this search statement, if there is id_4 in the confusion set, there should also be id_7.
  • the server can infer that in the existing confusion set ⁇ id_1, id_4 ⁇ , id_4 is a fake element, and id_1 is the actual ID to be queried, which causes privacy leakage of the client to a certain extent.
  • the method for constructing a confusion set may include, as shown in FIG3 , the following steps: S410 : the client sends a sensitive field encrypted by itself to the server, and obtains the same sensitive field encrypted by the server through interaction with the server.
  • S420 The client searches the query base according to the sensitive field encrypted by the server to obtain a first identification set of matching records.
  • S410 and S420 are similar to the aforementioned S310 and S320, respectively, and will not be repeated here.
  • the sensitive field sent by the client to the server is (H(25)) ⁇ or ⁇ H(25)
  • the same sensitive field encrypted by the server received by the client through interaction with the server is ((H(25)) ⁇ ) ⁇ or ⁇ H(25).
  • the client can use its own private key ⁇ to decrypt the twice encrypted sensitive field, for example, (H(25)) ⁇ or ⁇ H(25).
  • the client searches the local query base according to the sensitive field (H(25)) ⁇ or ⁇ H(25) encrypted by the server, and obtains the first identifier of the matching record as id_1.
  • Table 2 The following is mainly explained by taking Table 2 as an example.
  • the client selects at least one point in the query base except the ID field and the field of interest, uses the selected at least one point as a search condition, constructs a search statement according to the search condition and the original field of interest, and performs a search on the query base to obtain a second identification set of matching records.
  • the embodiment of this specification can generate a confusion set consisting of n elements, for example, each element is an identifier, and the n identifiers include the identifier of the matching record, that is, the aforementioned first identifier set.
  • the client selects at least one point in the query base except the ID field and the field of interest, for example, one point, specifically, the point ⁇ H(34).
  • a point in the query base can be defined as a field value determined by a row and a column.
  • the client can use the selected at least one point as a search condition, and construct a search statement based on the search condition and the original field of interest.
  • the original field of interest is the field of interest that the client originally wants to query, for example, Name.
  • the constructed search statement can be as follows:
  • the client can execute the search statement on the query base and obtain the second identifiers of the matching records as id_4 and id_7 as the second identifier set ⁇ id_4, id_7 ⁇ .
  • the constructed search statement is as follows:
  • the identifiers of the matching records are id_0 and id_2.
  • S440 The client constructs a confusion set according to the first identification set and the second identification set.
  • the order of the elements in the obfuscation set is not limited, for example, it can be ⁇ id_4, id_1, id_7 ⁇ , ⁇ id_4, id_7, id_1 ⁇ , ⁇ id_7, id_1, id_4 ⁇ , ⁇ id_7, id_4, id_1 ⁇ , ⁇ id_1, id_7, id_4 ⁇ , etc. It can be a random order as long as the ID to be transmitted can be specified in the OT later. The following examples are similar and will not be given any more examples.
  • n can also be 2, for example, the confusion set is ⁇ id_1, ⁇ id_4, id_7 ⁇ , that is, the second element in the confusion set is the set ⁇ id_4, id_7 ⁇ .
  • the server can return the value of the field of real interest to the client through 2-choose-1 oblivious transmission.
  • the first identification set obtained is ⁇ id_1 ⁇ .
  • the client constructs a confusion set based on the first identification set and the second identification set, which can be a confusion set formed by flattening the elements in the first identification set and the second identification set.
  • the confusion set is, for example, ⁇ id_1, id_3, id_9 ⁇ .
  • the element in the first identification set is k
  • n-choose-k oblivious transmission can be used.
  • the elements in the first identification set can also be flattened and together with the second identification set to form a confusion set.
  • the first identification set is ⁇ id_0, id_6 ⁇ .
  • the second identification set is still ⁇ id_3, id_9 ⁇ , then the confusion set can be ⁇ id_0, id_6, ⁇ id_3, id_9 ⁇ .
  • the element in the first identification set is 1, n-choose-k is n-choose-1.
  • the elements in the second identification set can be flattened and together with the first identification set, form a confusion set.
  • the first identification set is ⁇ id_0, id_6 ⁇ .
  • the confusion set can be ⁇ id_0, id_6 ⁇ , id_3, id_9 ⁇ .
  • the IDs of at least two rows that cannot be distinguished corresponding to at least one point selected except the ID field and the field of interest cannot be split into different sets.
  • the first identification set is ⁇ id_0, id_6 ⁇
  • the second identification set is ⁇ id_4, id_7 ⁇ , id_3, id_9 ⁇ .
  • the remaining fields Age and Native_place cannot distinguish id_4 and id_7, so id_4 and id_7 need to be put into the same set, that is, ⁇ id_4, id_7 ⁇ , and cannot be split.
  • the identifiers of the rows that cannot be distinguished except for the ID field and the field of interest in the confusion set are placed in the same minimum set, which also meets the requirements of other implementation methods, thereby avoiding the unreasonable construction of the confusion set.
  • the elements in the first identification set and the second identification set can be mixed to form a confusion set.
  • the first identification set is ⁇ id_0, id_6 ⁇
  • the second identification set is still ⁇ id_3, id_9 ⁇
  • the confusion set can be ⁇ id_0, id_6 ⁇ , ⁇ id_0, id_3 ⁇ , id_9 ⁇ .
  • Id_3 and id_9 can be distinguished except for the ID field and the field of interest, so the identification of the corresponding row can also be split, that is, not placed in the same minimum set.
  • the above example shows a situation of one conditional field, and two or more conditional fields are similar.
  • the client selects two points in the query base except the ID field and the field of interest, specifically, for example, the two points ⁇ H(34) and ⁇ H(shanghai).
  • the client can use the selected two points as search conditions and construct a search statement based on the search conditions and the original fields of interest.
  • the original fields of interest are the fields of interest that the client originally wanted to query, such as Name. For example, if the two points ⁇ H(34) and ⁇ H(shanghai) are selected, and the corresponding field names are Age and Native_place respectively, the constructed search statement can be as follows:
  • the client can execute the search statement on the query base and obtain the matching second identifiers id_4, id_7, id_1, id_5 as the second identifier set ⁇ id_4, id_7, id_1, id_5 ⁇ .
  • the connective between the two predicates is or here, and can also be and. In other words, the connective can be a random connective. More broadly, when there are more than two predicates, the connective between two adjacent predicates is a random connective.
  • the following describes a client for implementing the construction of a confusion set in an embodiment of this specification.
  • the encryption/decryption performed by the client and the server on the same target uses an encryption/decryption algorithm with an interchangeable order, and:
  • the client is configured with a query base, and the query base is obtained by the server after encrypting the database;
  • the client sends the sensitive field encrypted by itself to the server, and obtains the same sensitive field encrypted by the server through interaction with the server; retrieves the sensitive field encrypted by the server in the query base to obtain a first identification set of matching records; selects at least one point in the query base except the ID field and the field of interest, uses the selected at least one point as a search condition, constructs a search statement according to the search condition and the original field of interest, and performs the search on the query base to obtain a second identification set of matching records; constructs a confusion set according to the first identification set and the second identification set.
  • the following introduces a client for implementing the construction of a confusion set in an embodiment of this specification, including: a processor, a memory, and a program stored therein, wherein when the processor executes the program, the method in FIG. 3 is executed.
  • the following introduces a storage medium in an embodiment of this specification, which is used to store a program, wherein the program, when executed, enables the client to execute the method in FIG. 3 above.
  • a programmable logic device such as a field programmable gate array (FPGA)
  • FPGA field programmable gate array
  • HDL Hardware Description Language
  • HDL high-density LDL
  • ABEL Advanced Boolean Expression Language
  • AHDL Advanced Hardware Description Language
  • Confluence CUPL
  • hardware description languages such as Java Programming Language, HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc.
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller may be implemented in any suitable manner, for example, the controller may take the form of a microprocessor or processor and a computer readable medium storing a computer readable program code (e.g., software or firmware) executable by the (micro)processor, a logic gate, a switch, an application specific integrated circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, and the memory controller may also be implemented as part of the control logic of the memory.
  • a computer readable program code e.g., software or firmware
  • the controller may be implemented in the form of a logic gate, a switch, an application specific integrated circuit, a programmable logic controller, and an embedded microcontroller by logically programming the method steps. Therefore, such a controller may be considered as a hardware component, and the means for implementing various functions included therein may also be considered as a structure within the hardware component. Or even, the means for implementing various functions may be considered as both a software module for implementing the method and a structure within the hardware component.
  • the systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions.
  • a typical implementation device is a server system.
  • the computer that implements the functions of the above embodiments may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
  • one or more embodiments of the present specification provide method operation steps as described in the embodiments or flow charts, more or less operation steps may be included based on conventional or non-creative means.
  • the order of steps listed in the embodiments is only one way of executing the order of many steps, and does not represent the only execution order.
  • the device or terminal product in practice is executed, it can be executed in sequence or in parallel according to the method shown in the embodiments or the drawings (for example, a parallel processor or a multi-threaded processing environment, or even a distributed data processing environment).
  • each module can be implemented in the same or more software and/or hardware, or the module implementing the same function can be implemented by a combination of multiple sub-modules or sub-units, etc.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division. There may be other division methods in actual implementation.
  • multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
  • processors CPU
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in a computer-readable medium, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer readable media include permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information.
  • Information can be computer readable instructions, data structures, program modules or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage, graphene storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary computer readable media (transitory media), such as modulated data signals and carrier waves.
  • one or more embodiments of the present specification may be provided as a method, system or computer program product. Therefore, one or more embodiments of the present specification may take the form of a complete hardware embodiment, a complete software embodiment or an embodiment combining software and hardware. Moreover, one or more embodiments of the present specification may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • One or more embodiments of this specification may be described in the general context of computer-executable instructions executed by a computer, such as program modules.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • One or more embodiments of this specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communication network.
  • program modules may be located in local and remote computer storage media, including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

本说明书一个或多个实施例提供一种实现构造混淆集的方法和客户端。所述方法中,客户端发送经自身加密的敏感字段至服务端,并通过与服务端的交互得到由服务端加密的同一敏感字段;客户端在查询基中根据所述由服务端加密的敏感字段检索,得到匹配记录的第一标识集合;客户端选取查询基中除ID字段和感兴趣字段外的至少一个点,将所选的至少一个点作为检索条件,根据该检索条件和原有的感兴趣字段构造检索语句并在查询基上执行检索,得到匹配记录的第二标识集合;客户端根据第一标识集合和第二标识集合构造混淆集。

Description

一种实现构造混淆集的方法和客户端 技术领域
本说明书实施例属于隐私计算技术领域,尤其涉及一种实现构造混淆集的方法和客户端。
背景技术
隐私保护计算(Privacy-Preserving Computing)是在保护数据本身不对外泄露的前提下实现数据分析计算的技术集合,实现数据的可用不可见。通过隐私保护计算技术,可以在充分保护数据和隐私安全的前提下,实现数据价值的转化和释放。
目前实现隐私保护计算的主流技术主要包括三大方向:第一类是以多方安全计算(Secure Multi-Party Computation,SMPC)为代表的基于密码学的隐私计算技术;第二类是以联邦学习(Federated Learning,FL)为代表的人工智能与隐私保护技术融合衍生的技术;第三类是以可信执行环境(Trust Execution Environment)为代表的基于可信硬件的机密计算(Confidential Computing,CC)技术。此外,还包括差分隐私(Differential Privacy,DP)等。差分隐私(Differential Privacy,DP)实际则是对计算结果的保护,而不是针对计算过程;联邦学习、安全多方计算以及机密计算则是对计算过程以及计算过程中间结果进行保护。
第一类的多方安全计算,又包括四大基础技术,分别是混淆电路(Garbled Circuit,GC)、秘密分享(Secret Sharing)、不经意传输(Oblivious Transfer)和同态加密(Homomorphic Encryption,HE)。其中,同态加密是一种特殊的加密算法,在密文基础上直接进行计算,与基于解密后的明文是一样的计算结果,其又包括半同态加密(Partially Homomorphic Encryption,PHE)和全同态加密(Fully Homomorphic Encryption,FHE)。
安全多方计算凭借其坚实的安全理论基础提供输入秘密数据的隐私保护能力,实现隐私保护计算过程的安全。目前安全多方计算主要有两条实施技术路线,包括通用安全多方计算和特定问题安全多方计算。前者可以解决各类计算问题,但是这种“万能型”的技术路线通常体系庞大,各种开销较大;后者针对特定问题设计专用协议,如隐私集合求交PSI(Private Set Intersection,PSI),隐私信息检索(Privacy Information Retrieval,PIR)等,往往能够以比通用安全多方计算协议更低的代价得到计算结果,但是需要领域专家针对应用场景进行精心设计,一般无法适用于通用场景且设计成本较高。
隐私集合求交是参与双方在不泄露任何额外信息的情况下,得到双方持有数据的交集。额外的信息指的是除了双方的数据交集以外的任何信息。隐私集合求交在现实场景中非常有用,比如在纵向联邦学习中做数据对齐,或是在社交软件中通过通讯录做好友发现等。
隐私信息检索是客户端从数据库检索信息的一种方法。检索过程中,查询方隐藏查询目标标识,数据服务方提供匹配的查询结果却无法获知具体的查询对象。
发明内容
本说明书的目的在于提供一种实现构造混淆集的方法和客户端,包括:一种实现构造混淆集的方法,客户端接收服务端发送的查询基,所述查询基由数据库加密后得到;所述客户端与服务端对同一目标执行的加/解密采用可交换顺序的加/解密算法;所述客户端发送经自身加密的敏感字段至服务端,并通过与服务端的交互得到由服务端加密的同一敏感字段;所述客户端在查询基中根据所述由服务端加密的敏感字段检索,得到匹配记录的第一标识集合;所述客户端选取查询基中除ID字段和感兴趣字段外的至少一个点,将所选的至少一个点作为检索条件,根据该检索条件和原有的感兴趣字段构造检索语句并在查询基上执行检索,得到匹配记录的第二标识集合;所述客户端根据第一标识集合和第二标识集合构造混淆集。
一种实现构造混淆集的客户端,该客户端与服务端对同一目标执行的加/解密采用可 交换顺序的加/解密算法,且:所述客户端配置有查询基,所述查询基由所述服务端将数据库加密后得到;所述客户端发送经自身加密的敏感字段至服务端,并通过与服务端的交互得到由服务端加密的同一敏感字段;在查询基中根据所述由服务端加密的敏感字段检索,得到匹配记录的第一标识集合;选取查询基中除ID字段和感兴趣字段外的至少一个点,将所选的至少一个点作为检索条件,根据该检索条件和原有的感兴趣字段构造检索语句并在查询基上执行检索,得到匹配记录的第二标识集合;根据第一标识集合和第二标识集合构造混淆集。
一种实现构造混淆集的客户端,包括:处理器,存储器,存储有程序,其中在所述处理器执行所述程序时,执行上述的方法。
一种存储介质,用于存储程序,其中所述程序在被执行时使得客户端执行上述的方法。
上述实施例中,通过构造检索语句并在查询基上执行检索,得到第二标识集合构造,进而构造混淆集,可以使得混淆集更合理,从而避免因混淆集构造不合理而导致客户端隐私泄露。
附图说明
为了更清楚地说明本说明书实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是一实施例的流程示意图;
图2是一实施例的流程示意图;
图3是一实施例的流程示意图。
具体实施方式
为了使本技术领域的人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本说明书一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本说明书保护的范围。
如前所述,PIR是客户端从数据库检索信息的一种方法。PIR方案是由Chor B等在1995年提出的解决保护用户查询隐私的方案。PIR方案的主要目的是,保证查询用户向服务器上的数据库提交的查询请求,在用户查询的隐私信息不被泄漏的条件下完成查询,即在检索过程中服务器不知道用户具体查询信息及检索出的数据项。
隐私信息检索的应用场景包括有:病患想通过医药系统查询其疾病的治疗药物,如果以该疾病名为查询条件,医疗系统将会得知该病人可能患有这样的疾病,从而病人的隐私被泄露,通过隐私信息查询可以避免此类泄露问题。
在域名、商标申请过程,用户需要首相向相关数据库提交自己申请的域名或商标信息以查询是否已存在,但有不想让服务提供方知晓自己的申请名称,从而能够抢先注册。
在证券市场中,某用户想查询某个股票信息,但又不能将自己感兴趣的股票泄露给服务方从而影响股票价格和自己的偏好。
一个简单的实现方案是数据库把所有数据发送给客户端,但无法保护数据库安全,即无法保证服务端的隐私。能够同时保证客户端和数据库隐私安全的PIR,称为对称的PIR(Symmetrical PIR,SPIR),同时保证客户端和数据库两者之一隐私安全的PIR,称为非对称的PIR(Asymmetrical PIR,APIR)。根据数据库副本的个数分为多副本PIR和单副本PIR。多副本PIR协议要求多个数据库副本之间不能合谋,这在现实场景中很难满足,因此考虑更多的是单副本PIR。单副本PIR只能达到计算安全(Computational PIR,CPIR)。在大多数PIR方案中,总是假设客户端知道想要检索的是数据库的第几 个比特(单比特)。但是在现实场景中,客户端往往是根据关键字检索(并不知道该关键字对应数据库的具体位置),且希望取回的是字符串(多比特)。总而言之,一个实用的PIR通常需要最好同时满足对称、单副本、按关键字检索、返回字符串等多个条件,并达到计算效率和通信效率的平衡。通过同态加密、不经意传输(Oblivious Transfer,OT)、单向陷门函数(One-way Trapdoor Function)等密码学技术,可以满足或部分满足上述条件。
本说明书提供一种实现隐私信息检索的方法实施例。
该实施例中,服务端(Server)可以预先将数据库加密后得到查询基,并发送该查询基至客户端。
一般的,服务端本地具有数据库,可以供客户端查询。服务端本地的数据库例如为如下:
ID Name Age Native_place
id_0 A 24 anhui
id_1 B 25 shanghai
id_2 C 30 anhui
id_3 D 46 henan
id_4 E 34 shandong
id_5 F 54 shanghai
id_6 G 24 beijing
id_7 H 34 shandong
id_8 I 42 guangdong
id_9 J 56 zhejiang
表1、服务端具有的数据库
上述表1的例子中,包括ID、Name、Age、Native_place这4个字段,例如有id_0,...id_9共10条记录,每一行为一个记录。其中,id_0,...id_9为每一行记录的标识。
为了让客户端可以进行检索,而又不暴露服务端的隐私安全,服务端可以加密该数据库,得到查询基。加密方式可以采用RSA(一种使用广泛的非对称加密算法,1977年由罗纳德〃李维斯特(Ron Rivest)、阿迪〃萨莫尔(Adi Shamir)和伦纳德〃阿德曼(Leonard Adleman)一起提出的)或ECC(Elliptical Curve Cryptography,椭圆曲线密码学)加密。具体的,服务端可以使用RSA私钥/ECC私钥α对数据加密,即对除了ID列的其它每个字段(即每个单元格中的数据)采用RSA私钥/ECC私钥α进行加密。
采用ECC加解密算法的情况下,具体的,服务端可以生成一个秘密值α并妥善保存,该秘密值α也就是ECC私钥。此外,服务端可以将name字段的值通过一个哈希函数转换为椭圆曲线上的一个点,可以表达为Hash(C)或表达为H(C)。
根据椭圆曲线上标量乘法的运算性质,椭圆曲线上的一个点P和一个整数k,计算Q=kP很容易,且得到的结果Q也是该椭圆曲线上的一个点;反之,如果知道椭圆曲线上的一个点对P、Q,求解Q=kP中使等式成立的k的值很难。
这里,根据椭圆曲线上的标量乘法运算α·H(C)很容易计算得到,但是知道α·H(C)的结果和H(C)却很难推算出α的值。很难得到α的值的情况下,知道α·H(C)的结果,也很难得到知道H(C)的值。
进而,服务端采用秘密值α加密后的数据库如下所示:
ID Name Age Native_place
id_0 α·H(A) α·H(24) α·H(anhui)
id_1 α·H(B) α·H(25) α·H(shanghai)
id_2 α·H(C) α·H(30) α·H(anhui)
id_3 α·H(D) α·H(46) α·H(henan)
id_4 α·H(E) α·H(34) α·H(shandong)
id_5 α·H(F) α·H(54) α·H(shanghai)
id_6 α·H(G) α·H(24) α·H(beijing)
id_7 α·H(H) α·H(34) α·H(shandong)
id_8 α·H(I) α·H(42) α·H(guangdong)
id_9 α·H(J) α·H(56) α·H(zhejiang)
表2、服务端采用ECC私钥加密后的查询基
需要说明的是,上述hash函数,不仅能将原始输入转换为固定长度和格式的输出,还能将输出转换为椭圆曲线上的一个点的x轴坐标。例如采用curve25519这样的椭圆曲线,任意的256bits数据都可以作为这条椭圆曲线上的一个合法的x轴坐标。相应的,可以采用sha256或sha3-256,也可以采用sha384、sha512或者sha3-384、sha3-512的结果中截取256bits。更广泛的说,任意hash值(不局限于hash结果是256bits)可以对椭圆曲线的阶取模,取模结果与生成元点乘之积(标量乘法)即为该椭圆曲线上的一个点。
进而,服务端可以将该查询基发送至需要进行检索的客户端。一种方式中,服务端可以直接发送该查询基至客户端,例如直接发送至客户端的设备,或者发送至客户端的代理服务器之类;另一种方式中,服务端可以在一个统一资源定位系统(Uniform Resource Locator,URL)上发布该查询基,进而客户端可以从该URL上获取该查询基。
相应的,客户端可以接收到该查询基,并将接收到的查询基保存在本地。
类似的,采用RSA的情况下,服务端可以生成一个秘密值α并妥善保存,该秘密值也就是RSA私钥。此外,服务端可以将name字段的值通过一个哈希函数转换为椭圆曲线上的一个点,可以表达为Hash(C)或表达为H(C)。
根据模幂运算的性质,已知秘密值α,对于一个大质数q和底数g,计算p=g αmod q很容易;反之,如果知道p、q和底数g,求解p=g αmod q中使等式成立的α的值很难。底数g也称为原根。
这里,根据模拟运算计算(H(C))αmod q很容易,但是知道(H(C))αmod q的结果和H(C)、q却很难推算出α的值。很难得到α的值的情况下,知道(H(C))αmod q的结果,也很难得到知道H(C)的值。后续,将形如(H(C))αmod q的表达式省略mod q,简略表示为(H(C)) α
进而,服务端采用秘密值α加密后的数据库如下所示:
ID Name Age Native_place
id_0 (H(A)) α (H(24)) α (H(anhui)) α
id_1 (H(B)) α (H(25)) α (H(shanghai)) α
id_2 (H(C)) α (H(30)) α (H(anhui)) α
id_3 (H(D)) α (H(46)) α (H(henan)) α
id_4 (H(E)) α (H(34)) α (H(shandong)) α
id_5 (H(F)) α (H(54)) α (H(shanghai)) α
id_6 (H(G)) α (H(24)) α (H(beijing)) α
id_7 (H(H)) α (H(34)) α (H(shandong)) α
id_8 (H(I)) α (H(42)) α (H(guangdong)) α
id_9 (H(J)) α (H(56)) α (H(zhejiang)) α
表3、服务端采用RSA私钥加密后的查询基
进而,服务端可以将该查询基发送至需要进行检索的客户端。类似的,服务端可以 直接发送该查询基至客户端,例如直接发送至客户端的设备,或者发送至客户端的代理服务器之类;另一种方式中,服务端可以在一个统一资源定位系统(Uniform Resource Locator,URL)上发布该查询基,进而客户端可以从该URL上获取该查询基。
相应的,客户端可以接收到该查询基,并将接收到的查询基保存在本地。
例如图1所示,客户端与服务端之间的交互过程可以包括如下步骤:S110:客户端发送经自身加密的敏感字段至服务端,并通过与服务端的交互得到由服务端加密的同一敏感字段。
例如,客户端的检索条件为Age字段值为25,而25为敏感字段,即不希望让对端知道。为了避免让服务端知道客户端检索条件是Age字段的值25,客户端可以将该25加密。例如,采用RSA/ECC私钥加密,客户端采用的加密算法与服务端生成查询基采用的加密算法相同。
具体的,采用RSA私钥加密的情况下,客户端自身生成秘密β并妥善保存。进而,客户端可以采用自身私钥β对25加密。具体的,可以是对25或对25的hash值加密。这里以对25的hash加密为例加以说明,对25直接加密的情况类似,客户端与服务端采用相同的hash算法。例如,客户端采用与服务端相同的大质数q作为模数。客户端可以将对25采用β对25的hash值进行RSA加密,得到(H(25)) β。则客户端发送至服务端的敏感字段可以为(H(25)) β,其中,(H(25)) β表示敏感字段的值25的密文。
另一方面,客户端也可以构造检索语句,并将检索语句中的敏感字段加密后得到隐私字段,并用隐私字段替换敏感字段,将替换后的隐私检索语句发送至服务端。
例如,客户端构造的查询语句为select Name where Age=25。
为了保护隐私,即不让服务端获得查询的是Age=25这个条件,例如是将其中的25隐私保护起来,结果如下:
select Name where Age=?
其中,?表示替换后的检索语句。
具体的,客户端可以将25用RSA私钥加密。例如,客户端可以将对25采用与服务端相同hash函数进行hash计算,进而采用β对25的hash值进行RSA加密,得到(H(25)) β。则客户端发送至服务端的查询语句例如为如下:
select Name where Age=(H(25)) β
如前所述,(H(25)) β为密文,即为上面检索语句中的“?”代表的内容,服务端获得后并不能知晓其中的β和25。
采用ECC私钥加密的情况下,客户端采用与服务端相同的椭圆曲线,即具有相同的椭圆曲线参数和生成元。客户端自身生成秘密β并妥善保存。进而,客户端可以采用自身私钥β对25加密。具体的,可以是对25的hash值加密,客户端与服务端采用相同的hash算法。例如,客户端可以采用β对25的hash值进行ECC加密,得到β·H(25)。则客户端发送至服务端的敏感字段可以为β·H(25),其中,β·H(25)表示敏感字段的值25的密文。
另一方面,客户端也可以构造检索语句,并将检索语句中的敏感字段加密后得到隐私字段,并用隐私字段替换敏感字段,将替换后的隐私检索语句发送至服务端。
例如,客户端构造的查询语句为select Name where Age=25。
为了保护隐私,即不让服务端获得查询的是Age=25这个条件,例如是将其中的25隐私保护起来,结果如下:
select Name where Age=?
其中,?表示替换后的检索语句。
具体的,客户端可以将25用ECC私钥加密。例如,客户端采用与服务端相同的椭圆曲线,即具有相同的椭圆曲线参数和生成元。客户端可以将检索语句中的敏感字段用 自身ECC私钥加密后替换,将替换后的隐私检索语句发送至服务端。例如客户端自身生成秘密β并妥善保存。此外,客户端可以将对25采用与服务端相同hash函数进行hash计算,进而采用β对25的hash值进行ECC加密,得到β·H(25)。则客户端发送至服务端的查询语句例如为如下:
select Name where Age=β·H(25)
如前所述,β·H(25)为密文,即为上面检索语句中的“?”代表的内容,服务端获得后并不能知晓其中的β和25。
所述客户端通过与服务端的交互得到由服务端加密的同一敏感字段,可以包括服务端采用自身密钥对由客户端加密的敏感字段再次加密后发送至客户端,客户端采用自身密钥对两次加密后的敏感字段解密得到由服务端加密的敏感字段。该内容的核心是需要找到一个满足连续两次加密操作(两方先后加密)可以交换顺序进行解密的加密算法。根据ECC的密码学性质,双方约定采用相同的椭圆曲线,即具有相同的椭圆曲线参数和生成元,各自持有私钥α和β,加密操作为用α(或β)进行标量乘法运算,不论先用α加密后用β加密还是先用β加密后用α加密,都可以用相同或不同的顺序解密,即可以对加密结果用不同的顺序解密。类似的,根据RSA的密码学性质加密,双方约定采用一个相同的大质数q和原根g,各自持有私钥α和β,加密操作为用α(或β)求幂并用q取模,不论先用α加密后用β加密还是先用β加密后用α加密,都可以用相同或不同的顺序解密,即可以对加密结果用不同的顺序解密。整体来说,这里客户端和服务端对同一目标执行的加/解密采用可交换顺序的加/解密算法。
具体的,可以是服务端收到隐私检索语句后,对隐私字段再次加密后返回至客户端,也可以是服务端收到客户端发送的经客户端自身加密的敏感字段后,服务端对加密的敏感字段再次用服务端自身密钥加密后返回至客户端。进而,客户端采用自身密钥对两次加密后的敏感字段解密得到由服务端加密的敏感字段。
例如,情况1:服务端可以接收到客户端发送的(H(25)) β
服务端可以对加密后的敏感字段(亦即隐私字段)再次加密,并将再次加密后的敏感字段返回至客户端。具体的,服务端可以对隐私字段(H(25)) β采用自身的RSA私钥α进行再次加密,得到((H(25)) β) α
例如,情况1':服务端可以接收到客户端发送的隐私检索语句select Name where Age=(H(25)) β。这样,服务端可以获得该隐私检索语句中的隐私字段(H(25)) β
服务端可以对隐私字段再次加密,并将再次加密后的隐私字段返回至客户端。
具体的,服务端可以对隐私字段(H(25)) β采用自身的RSA私钥α进行再次加密,得到((H(25)) β) α,具体过程类似上述,这里不再赘述。
例如,情况2:服务端可以接收到客户端发送的β·H(25)。
服务端可以对隐私字段再次加密,并将再次加密后的隐私字段返回至客户端。具体的,服务端可以对隐私字段β·H(25)采用自身的ECC私钥α进行再次加密,得到α·β·H(25)。
例如,情况2':服务端可以接收到客户端发送的隐私检索语句select Name where Age=β·H(25)。这样,服务端可以获得该隐私检索语句中的隐私字段β·H(25)。
服务端可以对隐私字段再次加密,并将再次加密后的隐私字段返回至客户端。
具体的,服务端可以对隐私字段β·H(25)采用自身的ECC私钥α进行再次加密,得到α·β·H(25),具体过程类似上述,这里不再赘述。
服务端采用自身密钥对由客户端加密的敏感字段(即隐私字段)再次加密后发送至客户端后,客户端可以采用自身密钥对两次加密后的隐私字段解密得到由服务端加密的敏感字段。
例如,对应上面情况1和1',客户端接收到服务端发送的((H(25)) β) α,其中的幂次 方运算存在性质如下:((H(25)) β) α=(H(25)) βα=(H(25)) αβ=((H(25)) α) β。进而,客户端可以采用自身私钥β的逆元
Figure PCTCN2022135252-appb-000001
对两次加密后的敏感字段解密,如下:
Figure PCTCN2022135252-appb-000002
这样,客户端得到由服务端加密的同一敏感字段,即(H(25)) α
对应上面情况2和2',客户端接收到服务端发送的α·β·H(25),其中的标量乘法运算存在性质如下:α·β·H(25)=β·α·H(25)。进而,客户端可以采用自身私钥β的逆元β -1对两次加密后的敏感字段解密,如下:β -1·α·β·H(25)=β -1·β·α·H(25)=α·H(25)。这样,客户端同样得到由服务端加密的同一敏感字段,即α·H(25)。
需要说明的是,RSA中,根据欧拉定理,pk·sk=1mod(p-1)·(q-1),其中p和q两个大质数,所以pk和sk互为逆元。类似的,ECC中,pk=sk*G,G为ECC选定曲线上的的一个生成元,所以pk和sk也是互为逆元。
S120:客户端在查询基中根据所述由服务端加密的敏感字段检索,得到匹配记录的标识,并将该标识返回至服务端。
S110执行后,客户端可以得到由服务端加密的同一敏感字段。
客户端可以基于该由服务端加密的敏感字段在查询基中查询。例如,客户端解密后得到由服务端加密的隐私字段α·H(25)或(H(25)) α,从而客户端基于该隐私字段在查询基中查询,例如分别在表2或表3中查询,可以得到Age中包含该隐私字段的记录为ID=d_1的这条记录,ID为这条记录的标识。这样,客户基于该由服务端加密的隐私字段在查询基中查询,匹配到记录后可以定位得到匹配记录的标识。
所述客户端将匹配记录的标识返回至服务端,可以包括两种情况。
一种是S110中,客户端构造检索语句,并将检索语句中的敏感字段加密后得到隐私字段,并用隐私字段替换敏感字段,将替换后的隐私检索语句发送至服务端的情况。该情况下,客户端可以直接将匹配记录的标识返回至服务端。
另一种是S110中,客户端发送经自身加密的敏感字段的值至服务端的情况。该情况下,S120中,客户端可以构造检索语句,例如检索语句为:
select Name where ID=id_1
这样,S120中,客户端可以将构造的上述检索式发送至服务端,该检索式中包含了匹配记录的标识,并指示感兴趣的字段是Name,也就是select后面紧跟的字段名称。
换句话说,S110和S120这两个步骤中,可以选择在其中的一个步骤中发送检索式,该检索式中包含了感兴趣字段。
S130:服务端返回所述数据库中所述标识对应记录中的感兴趣字段的值至客户端。
仍然按照上述例子,服务端接收到客户端发来的标识后,可以在所述数据库中查找所述标识对应记录,并按照S110或S120的感兴趣字段取出查找到的记录中的相应值,并将该取出的感兴趣字段的值返回至客户端。例如,返回id_1对应记录中的Name=B,即返回B至客户端。
上述实施例中,通过将查询基预先配置到客户端的形式,实现不暴露数据库明文的情况下由客户端通过与服务端交互及查询基定位要查询的字段在查询基中的标识,进一步根据标识向服务端发起查询,得到标识对应记录中的感兴趣字段的值。相对于传统的多副本PIR,显然不需要多个副本数据库之间不能合谋的前提假设,实用性更好。相对于传统的单副本PIR中只能实现比特位检索的情形,本实施例不需要关注要检索的关键字在数据库中的具体位置(比特位置),可以实现字符串的查询,且可以支持结构化查询语句(Structured Query Language,SQL)。本实施例中数据库仍然保持在服务端,同时将数据库加密得到的查询基配置到客户端,以便于客户端检索时基于查询基进行数据定位以得到记录的标识,同时查询基的加密特性使得客户端不会获得数据库的内容,保证了服务端对数据库的隐私保护。整体来说,本实施例中数据库、查询基的形式,在一 个服务端配置数据库和一个客户端配置查询基的情况下可以称为“非对称双副本”,在多个客户端配置查询基的情况下可以称为“非对称多副本”。
上述实施例中,通过SQL查询语句,客户端可以发起对感兴趣字段的查询,例如上述select Name...中要查询的Name字段。这在一定程度上暴露了客户端的感兴趣字段。另一种方式中,可以查询符合条件的记录,即符合条件的整行数据,这样可以保护客户端的隐私,但是需要服务端返回整条记录,这就一定程度上暴露了服务端的整行数据。例如S110/S120中通过“select*where Age=?”或“select*where ID=id_1”这样的检索语句。这样,服务端返回的结果可以是id_1这条的记录,例如如下:
id_1B25shanghai
另外,为了保证传输过程的安全,所述服务端可以加密返回所述数据库中所述标识对应记录/对应记录中感兴趣字段的值至所述客户端。例如,服务端可以采用与客户端协商所得的对称密钥对数据库中所述标识对应记录/对应记录中感兴趣字段的值加密后返回至所述客户端,或者采用所述客户端的非对称密钥中的公钥对数据库中所述标识对应记录/对应记录中感兴趣字段的值加密后返回至所述客户端,从而客户端可以用自身私钥解密,以及采用数字信封方式等等。
上述S120中,客户端直接将匹配得到的ID返回至服务端,虽然可以从服务端获得该ID对应的记录或记录中的感兴趣字段,如S130,但是,这会一定程度上暴露客户端的隐私,即会让服务端知道客户端想查询的标识是id_1。为了保护客户端的隐私,可以通过下述实施例中的方式实现:
S210:客户端发送经自身加密的敏感字段至服务端,并通过与服务端的交互得到由服务端加密的同一敏感字段。
例如,客户端的检索条件为Age字段值为25,而25为敏感字段,即不希望让对端知道。为了避免让服务端知道客户端检索条件是Age字段的值25,客户端可以将该25加密。例如,采用RSA/ECC私钥加密,客户端采用的加密算法与服务端生成查询基采用的加密算法相同。
具体的,采用RSA私钥加密的情况下,客户端自身生成秘密β并妥善保存。进而,客户端可以采用自身私钥β对25加密。具体的,可以是对25或对25的hash值加密。这里以对25的hash加密为例加以说明,对25直接加密的情况类似,客户端与服务端采用相同的hash算法。例如,客户端采用与服务端相同的大质数q作为模数。客户端可以将对25采用β对25的hash值进行RSA加密,得到(H(25)) β。则客户端发送至服务端的敏感字段可以为(H(25)) β,其中,(H(25)) β表示敏感字段的值25的密文。
采用ECC私钥加密的情况下,客户端采用与服务端相同的椭圆曲线,即具有相同的椭圆曲线参数和生成元。客户端自身生成秘密β并妥善保存。进而,客户端可以采用自身私钥β对25加密。具体的,可以是对25的hash值加密,客户端与服务端采用相同的hash算法。例如,客户端可以采用β对25的hash值进行ECC加密,得到β·H(25)。则客户端发送至服务端的敏感字段可以为β·H(25),其中,β·H(25)表示敏感字段的值25的密文。
所述客户端通过与服务端的交互得到由服务端加密的同一敏感字段,可以包括服务端采用自身密钥对由客户端加密的敏感字段再次加密后发送至客户端,客户端采用自身密钥对两次加密后的敏感字段解密得到由服务端加密的敏感字段。该内容的核心是需要找到一个满足连续两次加密操作(两方先后加密)可以交换顺序进行解密的加密算法。根据ECC的密码学性质,双方约定采用相同的椭圆曲线,即具有相同的椭圆曲线参数和生成元,各自持有私钥α和β,加密操作为用α(或β)进行标量乘法运算,不论先用α加密后用β加密还是先用β加密后用α加密,都可以用相同或不同的顺序解密,即可以对加密结果用不同的顺序解密。类似的,根据RSA的密码学性质加密,双方约定采用一个相同的大质数q和原根g,各自持有私钥α和β,加密操作为用α(或β)求幂并用q取模, 不论先用α加密后用β加密还是先用β加密后用α加密,都可以用相同或不同的顺序解密,即可以对加密结果用不同的顺序解密。整体来说,这里客户端和服务端对同一目标执行的加/解密采用可交换顺序的加/解密算法。
具体的,可以是服务端收到客户端发送的经客户端自身加密的敏感字段后,服务端对加密的敏感字段再次用服务端自身密钥加密后返回至客户端。进而,客户端采用自身密钥对两次加密后的敏感字段解密得到由服务端加密的敏感字段。
例如,情况1:服务端可以接收到客户端发送的(H(25)) β
服务端可以对加密后的敏感字段(亦即隐私字段)再次加密,并将再次加密后的敏感字段返回至客户端。具体的,服务端可以对隐私字段(H(25)) β采用自身的RSA私钥α进行再次加密,得到((H(25)) β) α
例如,情况2:服务端可以接收到客户端发送的β·H(25)。
服务端可以对隐私字段再次加密,并将再次加密后的隐私字段返回至客户端。具体的,服务端可以对隐私字段β·H(25)采用自身的ECC私钥α进行再次加密,得到α·β·H(25)。
服务端采用自身密钥对由客户端加密的敏感字段(即隐私字段)再次加密后发送至客户端后,客户端可以采用自身密钥对两次加密后的隐私字段解密得到由服务端加密的敏感字段。
例如,对应上面情况1,客户端接收到服务端发送的((H(25)) β) α,其中的幂次方运算存在性质如下:((H(25)) β) α=(H(25)) βα=(H(25)) αβ=((H(25)) α) β。进而,客户端可以采用自身私钥β的逆元
Figure PCTCN2022135252-appb-000003
对两次加密后的敏感字段解密,如下:
Figure PCTCN2022135252-appb-000004
Figure PCTCN2022135252-appb-000005
这样,客户端得到由服务端加密的同一敏感字段,即(H(25)) α
对应上面情况2,客户端接收到服务端发送的α·β·H(25),其中的标量乘法运算存在性质如下:α·β·H(25)=β·α·H(25)。进而,客户端可以采用自身私钥β的逆元β -1对两次加密后的敏感字段解密,如下:β -1·α·β·H(25)=β -1·β·α·H(25)=α·H(25)。这样,客户端同样得到由服务端加密的同一敏感字段,即α·H(25)。
需要说明的是,RSA中,根据欧拉定理,pk·sk=1mod(p-1)·(q-1),其中p和q两个大质数,所以pk和sk互为逆元。类似的,ECC中,pk=sk·G,G为ECC选定曲线上的的一个生成元,所以pk和sk也是互为逆元。
S220:客户端在查询基中根据所述由服务端加密的敏感字段检索,得到匹配记录的标识。
S210执行后,客户端可以得到由服务端加密的同一敏感字段。
客户端可以基于该由服务端加密的敏感字段在查询基中查询。例如,客户端解密后得到由服务端加密的隐私字段α·H(25)或(H(25)) α,从而客户端基于该隐私字段在查询基中查询,例如分别在表2或表3中查询,可以得到Age中包含该隐私字段的记录为ID=d_1的这条记录,ID为这条记录的标识。这样,客户基于该由服务端加密的隐私字段在查询基中查询,匹配到记录后可以定位得到匹配记录的标识。
S230:服务端采用不经意传输方式返回所述数据库中包含所述匹配标识在内的预定大小标识集合对应记录中的感兴趣字段的值至客户端。
S220中,客户端并不将匹配得到的ID返回至服务端,这样服务端无法获知客户端想要查找的是哪一条或哪几条记录;S210中客户端发送的经过客户端加密的敏感字段使得服务端也无法获知客户端查找的敏感字段将命中哪一条或哪几条记录,而只有客户端自己知道。这样,保护了客户端的隐私。但是,最终仍需要完成检索,这就需要服务端将客户端想查询的记录返回至客户端。
这里,服务端可以采用不经意传输方式。
不经意传输(Oblivious Transfer,OT)可以基于RSA、ECC等实现,可以实现2选1、n选1和m选1、m选k(k<m<n)等多种OT。以2选1OT为例说明其原理,发送者有两个秘密,分别是m1和m2,需要发送2个秘密至接收者,接收者只能选择解密其中的1个而无法获知另一个,同时发送者也无法得知接收者选择的是哪一个。以RSA为例,2选1的一个简单的实施流程如下:
首先,发送者生成两对不同的公私钥,并公开两个公钥,记这两个公钥分别为公钥1和公钥2。假设接收者希望知道m1,但不希望发送人知道他想要的是m1。接收者生成一个随机数r,再用公钥1对r进行加密,传给发送者。发送者用自身的两个私钥对这个加密后的r进行解密,用私钥1解密得到r1,用私钥2解密得到r2。显然,只有r1是和r相等的,r2则是一串毫无意义的数(也是解密结果)。但发送者不知道接收者加密时用的哪个公钥,因此发送者也不知道自己算出来的r1和r2中的哪个才是真的r。发送者接收到m1和m2后,用r1对m1进行对称加密,用r2对m2进行对称加密,并将两个对称加密结果发送至接收者。接收者本地具有的r=r1,所以接收者用r对发来的两个结果分别进行对称解密可以得到m1,但是无法解密得到m2,这是因为接收者所具有的r≠r2,接收者也就无法用正确的对称密钥进行解密得到m2的值。这个过程中,发送者也不知道接收者算出的是m1和m2中的哪一个。
有了2选1作为基础,可以将2个公私钥对扩展为n个公私钥对,就成为了n选1的OT。n选1的核心在于,服务端用n个不同密钥分别加密所述数据表中的n个记录/对应记录中感兴趣字段的值得到n个加密结果,并发送该n个加密结果至客户端;客户端采用匹配标识对应的密钥解密所述服务端发送的n个加密结果中匹配标识对应的1个加密结果。
结合本说明书上述实施例,假设服务端的数据表中具有总计n条记录,这样,客户端的查询基中相应的也具有n条加密的记录。为了方便,数据记录的ID按照顺序标识为id_0、id_1、id_2、...id_n-1。一个简单的实施流程如下:
S231:服务端预先生成n对不同的公私钥对并公开公钥。
这里的n等于数据库中的记录的数量。
服务端生成n对不同的公私钥对(pk-sk;pk是publick key,表示公钥;sk是secret key,表示私钥;公钥可以公开,私钥需要保密),例如分别是pk 0-sk 0,pk 1-sk 1,pk 2-sk 2,...,pk n-1-sk n-1,并公开这n个公钥,即公开pk 0,pk 1,pk 2,...,pk n-1。服务端公开这n个有序的公钥后,客户端可以获得这n个公钥。
S232:客户端生成随机数r,并用期望获得的ID对应的公钥对r进行加密后发送至服务端。
这里假设客户端希望获得id_1的那条记录,同时不希望服务端知道客户端想要获得的记录是id_1的那条。这样,客户端可以采用pk 1对r进行加密后发送至服务端。上述的有序,主要是指ID和公钥有对应关系,而这样的对应关系可以被客户端知晓。例如上述例子中,客户端希望获得id_1的那条记录但同时不希望服务端知道客户端想要获得的记录是id_1的那条,客户端可以采用id_1对应的pk 1对r进行加密后发送至服务端;类似的,客户端希望获得id_t的那条记录但同时不希望服务端知道客户端想要获得的记录是id_t的那条,客户端可以采用id_t对应的pk t对r进行加密后发送至服务端。
S233:服务端接收到加密的r后,用n个私钥分别对其解密。
服务端分别用sk 0,sk 1,sk 2,...,sk n-1分别解密经过pk 1加密的随机数r。例如,服务端用sk 0解密得到r0,用sk 1解密得到r1,...,用sk n-1解密得到r(n-1)。
显然,只有r1是和r相等的,因为只有用sk 1进行解密的才是用对应pk 1加密的;而用不对应pk 1的sk 0、sk 2、...、sk n-1解密得到的结果r0、r2、...、r(n-1)都不会与r相同。通过解密,服务端只是得到形式相同的解密结果,并不知道真正的r是什么,也不 知道客户端是用哪个公钥进行加密的。换句话说,服务端不知道客户端加密r时用的哪个公钥,因此服务端也不知道解密得到的n个结果r0、r1、r2、...、r(n-1)中的哪个才是真正的r。
S234:服务端将数据库中每条记录按照序号采用对应序号的解密结果进行对称加密,将对称加密后的结果发送至客户端。
例如,服务端将id_0这条记录采用r0进行对称加密,将id_1这条记录采用r1进行对称加密,...,将id_n-1这条记录采用r(n-1)进行对称加密,并将这n个对称加密结果发送至客户端。
S235:客户端采用所述随机数r对所述对称加密结果中期望获得的ID对应的加密结果进行对称解密,得到检索结果。
客户端采用所述随机数r对所述对称加密结果中期望获得的ID对应的加密结果进行对称解密。具体的,例如上述S232中客户端期望获得的是id_1对应的那条记录/记录中的感兴趣字段的值,则客户端采用对应的公钥pk 1对所述随机数r进行加密;S233中,服务端用对应的私钥sk 1对解密结果进行解密,得到的r1=r,而用不对应pk 1的sk 0、sk 2、...、sk n-1解密得到的结果r0、r2、...、r(n-1)都不会与r相同;S234中,服务端用r0、r1、r2、...、r(n-1)分别对对应的id_0、id_1、...、id_n-1这些记录/记录中的感兴趣字段的值进行对称加密,并将这n个对称加密结果发送至客户端;S235中,客户端采用所述随机数r对所述n个对称加密结果进行对称解密。其中,n个对称解密结果中,只有id_1的加密结果是用r对称加密的,因此这里只有对id_1的加密结果采用r进行解密才能得到正确的值。从而,客户端可以获得正确的检索结果,即获得对应记录/对应记录中感兴趣字段的值。
当然,为了减少计算量,客户端可以仅采用所述随机数r对所述n个对称加密结果中期望获得的ID对应的加密结果进行对称解密,即客户端仅采用r对id_1的加密结果进行解密,从而获得id_1的对应记录/对应记录中感兴趣字段的值,而无须采用r对id_0、id_2、...、id_n-1这些对称加密结果进行对称解密,因为客户端可以知道这些加密结果并非采用r进行的对称加密,即使采用r进行对称解密也无法解出正确结果。
为了更清楚的呈现,这里对客户端采用所述随机数r对所述n个对称加密结果进行对称解密,进行如下解释:S234中,服务端用r0、r1、r2、...、r(n-1)分别对对应的id_0、id_1、...、id_n-1这些记录/记录中的感兴趣字段的值进行对称加密:
Enc(id_0,r0),其中r0≠r;
Enc(id_1,r1),其中r1=r;
Enc(id_2,r2),其中r2≠r;
...
Enc(id_n-1,r(n-1)),其中r(n-1)≠r;
上述Enc表示加密(Encrypt),Enc()括号中的前一部分的id_0、id_1、id_2、...、id_n-1表示n条记录/n条记录中感兴趣字段的值,后一部分的r0、r1、r2、...、r(n-1)表示加密密钥。
S235中,客户端采用随机数r对所述对S234中的加密结果进行对称解密。具体的,客户端采用所述随机数r对下述内容分别进行对称解密:
Dec(Enc(id_0,r0),r),其中r0≠r;
Dec(Enc(id_1,r1),r),其中r1=r;
Dec(Enc(id_2,r2),r),其中r2≠r;
...
Dec(Enc(id_n-1,r(n-1),r),其中r(n-1)≠r;
上述Dec表示解密(Decrypt),Dec()中的前一部分表示解密对象,这里也就是上面的加密结果,Dec()中的后一部分表示解密采用的密钥。
可见,客户端只能解密得到id_1的那条记录,而无法推测出其它记录。这是因为 服务端只有对id_1的那条记录采用了随机数r进行对称加密,而对其它ID采用的并非随机数r进行的对称加密,而客户端也无法获得除r1=r以外的r0、r2、...、r(n-1)。
需要说明的是,S231可以是在S230之后,或者是在S230之前,这里并不限制。
上面是所述客户端在查询基中检索得到匹配记录的数量为1时,通过n选1不经意传输,将所述数据库中包含所述匹配标识在内的所有所述标识对应记录/对应记录中感兴趣字段的值传输至所述客户端。
上述实施例中,服务端并不知道客户端查询的是哪个或哪些ID,而是将数据库中的所有记录均加密返回至客户端,保护了客户端的隐私。但是,S233中,服务端用n个私钥分别对接收到加密的r进行解密,这样进行大量的非对称解密计算,需要消耗大量的CPU和内存资源。并且,S234中传输n个对称加密后的结果也将占用大量带宽。尤其是当n的数量比较大时,服务端的计算量较大,带宽占用也较大。
此外,可能匹配的结果大于1,例如为k条(k>1),则可以通过n选k不经意传输来实现。关于n选k,一种实现方案是将n个记录中的每k个组成一个集合,每个集合对应一个公私钥对,这样总计会有
Figure PCTCN2022135252-appb-000006
(C表示组合公式,n里任选k个构成的组合的数量)。接下来,采用
Figure PCTCN2022135252-appb-000007
选1的不经意传输方式将所述数据库中包含所述匹配标识在内的所有所述标识对应记录/对应记录中感兴趣字段的值传输至所述客户端。
Figure PCTCN2022135252-appb-000008
选1的不经意传输方式,实现过程类似于上述n选1不经意传输的实现过程。即所述服务端用
Figure PCTCN2022135252-appb-000009
个不同密钥分别加密所述数据表中的每k个记录/对应记录中感兴趣字段的值得到
Figure PCTCN2022135252-appb-000010
个加密结果,并发送该
Figure PCTCN2022135252-appb-000011
个加密结果至客户端;所述客户端采用匹配标识对应的密钥解密所述服务端发送的
Figure PCTCN2022135252-appb-000012
个加密结果中匹配标识对应的1个加密结果。具体实现类似上述S231-S235的过程,这里不再赘述。
需要说明的是,上述的r也可以是非对称密钥中的公钥,这样,客户端接收到由r加密的结果后,可以采用自身的私钥对其解密得到结果。即S234和S235中,服务端将id_0这条记录采用r0进行非对称加密,将id_1这条记录采用r1进行非对称加密,...,将id_n-1这条记录采用r(n-1)进行非对称加密,并将这n个加密结果发送至客户端,客户端接收到这些加密的结果后,可以采用自身私钥对其中期望获得的ID对应的加密结果进行非对称解密得到结果。下面也类似,不再重复。
基于此,本说明书给出以下增加了构造混淆集的一种实施方式:S310:客户端发送经自身加密的敏感字段至服务端,并通过与服务端的交互得到由服务端加密的同一敏感字段。
例如,客户端的检索条件为Age字段值为25,而25为敏感字段,即不希望让对端知道。为了避免让服务端知道客户端检索条件是Age字段的值25,客户端可以将该25加密。例如,采用RSA/ECC私钥加密,客户端采用的加密算法与服务端生成查询基采用的加密算法相同。
具体的,采用RSA私钥加密的情况下,客户端自身生成秘密β并妥善保存。进而,客户端可以采用自身私钥β对25加密。具体的,可以是对25或对25的hash值加密。这里以对25的hash加密为例加以说明,对25直接加密的情况类似,客户端与服务端采用相同的hash算法。例如,客户端采用与服务端相同的大质数q作为模数。客户端可以将对25采用β对25的hash值进行RSA加密,得到(H(25)) β。则客户端发送至服务端的敏感字段可以为(H(25)) β,其中,(H(25)) β表示敏感字段的值25的密文。
采用ECC私钥加密的情况下,客户端采用与服务端相同的椭圆曲线,即具有相同的椭圆曲线参数和生成元。客户端自身生成秘密β并妥善保存。进而,客户端可以采用自身私钥β对25加密。具体的,可以是对25的hash值加密,客户端与服务端采用相同的hash算法。例如,客户端可以采用β对25的hash值进行ECC加密,得到β·H(25)。则客户端发送至服务端的敏感字段可以为β·H(25),其中,β·H(25)表示敏感字段的值25的密文。
所述客户端通过与服务端的交互得到由服务端加密的同一敏感字段,可以包括服务端采用自身密钥对由客户端加密的敏感字段再次加密后发送至客户端,客户端采用自身密钥对两次加密后的敏感字段解密得到由服务端加密的敏感字段。该内容的核心是需要找到一个满足连续两次加密操作(两方先后加密)可以交换顺序进行解密的加密算法。根据ECC的密码学性质,双方约定采用相同的椭圆曲线,即具有相同的椭圆曲线参数和生成元,各自持有私钥α和β,加密操作为用α(或β)进行标量乘法运算,不论先用α加密后用β加密还是先用β加密后用α加密,都可以用相同或不同的顺序解密,即可以对加密结果用不同的顺序解密。类似的,根据RSA的密码学性质加密,双方约定采用一个相同的大质数q和原根g,各自持有私钥α和β,加密操作为用α(或β)求幂并用q取模,不论先用α加密后用β加密还是先用β加密后用α加密,都可以用相同或不同的顺序解密,即可以对加密结果用不同的顺序解密。整体来说,这里客户端和服务端对同一目标执行的加/解密采用可交换顺序的加/解密算法。
具体的,可以是服务端收到客户端发送的经客户端自身加密的敏感字段后,服务端对加密的敏感字段再次用服务端自身密钥加密后返回至客户端。进而,客户端采用自身密钥对两次加密后的敏感字段解密得到由服务端加密的敏感字段。
例如,情况1:服务端可以接收到客户端发送的(H(25)) β
服务端可以对加密后的敏感字段(亦即隐私字段)再次加密,并将再次加密后的敏感字段返回至客户端。具体的,服务端可以对隐私字段(H(25)) β采用自身的RSA私钥α进行再次加密,得到((H(25)) β) α
例如,情况2:服务端可以接收到客户端发送的β·H(25)。
服务端可以对隐私字段再次加密,并将再次加密后的隐私字段返回至客户端。具体的,服务端可以对隐私字段β·H(25)采用自身的ECC私钥α进行再次加密,得到α·β·H(25)。
服务端采用自身密钥对由客户端加密的敏感字段(即隐私字段)再次加密后发送至客户端后,客户端可以采用自身密钥对两次加密后的隐私字段解密得到由服务端加密的敏感字段。
例如,对应上面情况1,客户端接收到服务端发送的((H(25)) β) α,其中的幂次方运算存在性质如下:((H(25)) β) α=(H(25)) βα=(H(25)) αβ=((H(25)) α) β。进而,客户端可以采用自身私钥β的逆元
Figure PCTCN2022135252-appb-000013
对两次加密后的敏感字段解密,如下:
Figure PCTCN2022135252-appb-000014
Figure PCTCN2022135252-appb-000015
这样,客户端得到由服务端加密的同一敏感字段,即(H(25)) α
对应上面情况2,客户端接收到服务端发送的α·β·H(25),其中的标量乘法运算存在性质如下:α·β·H(25)=β·α·H(25)。进而,客户端可以采用自身私钥β的逆元β -1对两次加密后的敏感字段解密,如下:β -1·α·β·H(25)=β -1·β·α·H(25)=α·H(25)。这样,客户端同样得到由服务端加密的同一敏感字段,即α·H(25)。
需要说明的是,RSA中,根据欧拉定理,pk·sk=1mod(p-1)·(q-1),其中p和q两个大质数,所以pk和sk互为逆元。类似的,ECC中,pk=sk·G,G为ECC选定曲线上的的一个生成元,所以pk和sk也是互为逆元。
S320:客户端在查询基中根据所述由服务端加密的敏感字段检索,得到匹配记录的标识。
S310执行后,客户端可以得到由服务端加密的同一敏感字段。
客户端可以基于该由服务端加密的敏感字段在查询基中查询。例如,客户端解密后得到由服务端加密的隐私字段α·H(25)或(H(25)) α,从而客户端基于该隐私字段在查询基中查询,例如分别在表2或表3中查询,可以得到Age中包含该隐私字段的记录为 ID=d_1的这条记录,ID为这条记录的标识。这样,客户基于该由服务端加密的隐私字段在查询基中查询,匹配到记录后可以定位得到匹配记录的标识。
S330:服务端采用不经意传输方式返回所述数据库中包含所述匹配标识在内的预定大小标识集合对应记录中的感兴趣字段的值至客户端。
S310中客户端发送的经过客户端加密的敏感字段使得服务端也无法获知客户端查找的敏感字段将命中哪一条记录,而只有客户端自己知道。这样,保护了客户端的隐私。但是,最终仍需要完成检索,这就需要服务端将客户端想查询的记录返回至客户端。
上述图2对应的实施例给出了n选1不经意传输的实现方式,这里,可以采用m选1的不经意传输,其中m<n。S320中,客户端还可以不将匹配得到的ID单独返回至服务端,而是将匹配得到的ID与其它一些伪造的ID混淆组合在一起构造成混淆集,将混淆集发送至服务端,这样服务端无法准确获知客户端想要查找的是混淆集中的哪一条记录,并且需要保证客户端只能获得其中要查找的那一条记录,而无法获得其它记录。S320中,客户端发送的混淆集可以连同检索语句一并发送,例如select Name where ID=混淆集。或者,混淆集也可以在下面的S332中发送,这里并不限定。
结合本说明书上述实施例,假设服务端的数据表中具有总计n条记录,这样,客户端的查询基中相应的也具有n条加密的记录。为了方便,数据记录的ID按照顺序标识为id_0、id_1、id_2、...id_n-1。一个简单的实施流程如下:S331:服务端预先生成n对不同的公私钥对并公开公钥。
服务端生成n对不同的公私钥对(pk-sk;pk是publick key,表示公钥;sk是secret key,表示私钥),例如分别是pk 0-sk 0,pk 1-sk 1,pk 2-sk 2,...,pk n-1-sk n-1,并公开这n个公钥,即公开pk 0,pk 1,pk 2,...,pk n-1。服务端公开这n个有序的公钥后,客户端可以获得这n个公钥。
S332:客户端生成包含期望获得ID在内的m大小的混淆集,并生成随机数r,并用期望获得的ID对应的公钥对r进行加密后与混淆集一并发送至服务端。
这里假设客户端希望获得id_1的那条记录,同时不希望服务端知道客户端想要获得的记录是id_1的那条,便生成m大小的混淆集,m=4时这个混淆集例如为:{id_1,id_2,id_3,id_4}。
这4个ID与公钥对例如存在以下对应关系:
pk 1,id_1
pk 2,id_2
pk 3,id_3
pk 4,id_4
客户端可以采用pk 1对r进行加密后与混淆集一并发送至服务端。例如:
此外,客户端可以将混淆集连同检索语句一并发送至服务端。这样,客户端可以采用pk 1对r进行加密后与包含混淆集的检索语句一并发送,例如:
select Name where ID={id_1,id_2,id_3,id_4}|Enc(r,pk 1)
其中,“|”用于分割前面的检索语句和后面的加密后的随机数,下同。
S333:服务端接收到混淆集和加密的r后,用对应的m个私钥分别对加密的r进行解密。
服务端分别用sk 1,sk 2,sk 3,sk 4,分别解密经过pk 1加密的随机数r。例如,服务端用sk 1解密得到r1,用sk 2解密得到r2,用sk 3解密得到r3,用sk 4解密得到r4。
显然,只有r1是和r相等的,因为只有用sk 1进行解密的才是用对应pk 1加密的;而用不对应pk 1的sk 2、sk 3、sk 4解密得到的结果r2、r3、r4都不会与r相同。通过解密,服务端只是得到形式相同的解密结果,并不知道真正的r是什么,也不知道客户端是用哪个公钥进行加密的。换句话说,服务端不知道客户端加密r时用的哪个公钥,因此服务端也不知道解密得到的4个结果r1、r2、r3、r4中的哪个才是真正的r。
此外,服务端接收到混淆集{id_1,id_2,id_3,id_4}后,可以从混淆集中得知客户端想要获取的数据是混淆集中4个ID中的1个,但不确定是其中哪一个,从而保护了客户端隐私。
S334:服务端将混淆集中指定的记录采用对应序号的解密结果进行对称加密,将对称加密后的结果发送至客户端。
例如,服务端将id_1这条记录采用r1进行对称加密,将id_2这条记录采用r2进行对称加密,将id_3这条记录采用r3进行对称加密,将id_4这条记录采用r4进行对称加密,并将这4个对称加密结果发送至客户端。
S335:客户端采用所述随机数r对所述对称加密结果中期望获得的ID对应的加密结果进行对称解密,得到检索结果。
客户端采用所述随机数r对所述对称加密结果中期望获得的ID对应的加密结果进行对称解密。具体的,例如上述S332中客户端期望获得的是id_1对应的那条记录/记录中的感兴趣字段的值,则客户端采用对应的公钥pk 1对所述随机数r进行加密;S333中,服务端用对应的私钥sk 1对解密结果进行解密,得到的r1=r,而用不对应pk 1的sk 2、sk 3、sk 4解密得到的结果r2、r3、r4都不会与r相同;S334中,服务端用r1、r2、r3、r4分别对对应的id_1、id_2、id_3、id_4这些记录/记录中的感兴趣字段的值进行对称加密,并将这个对称加密结果发送至客户端;S335中,客户端采用所述随机数r对所述4个对称加密结果进行对称解密。其中,4个对称解密结果中,只有id_1的加密结果是用r对称加密的,因此这里只有对id_1的加密结果采用r进行解密才能得到正确的值。从而,客户端可以获得正确的检索结果,即获得对应记录/对应记录中感兴趣字段的值。
当然,为了减少计算量,客户端可以仅采用所述随机数r对所述4个对称加密结果中期望获得的ID对应的加密结果进行对称解密,即客户端仅采用r对id_1的加密结果进行解密,从而获得id_1的对应记录/对应记录中感兴趣字段的值,而无须采用r对id_2、id_3、id_4这些对称加密结果进行对称解密,因为客户端可以知道这些加密结果并非采用r进行的对称加密,即使采用r进行对称解密也无法解出正确结果。
上述S331~S335仅是示例性的一种实现方式。在另一种实现方式中,可以将混淆集的构建和传输与OT协议的执行进行解耦,用OT协议来传输密钥。具体的,一方面,客户端可以将m大小的混淆集发送至服务端,当然客户端知道m大小的混淆集中的第几个是真正想要获得的结果的标识,另一方面,服务端可以生成m个对称密钥,通过m选1的OT,客户端可以获得其中指定的一个对称密钥,即客户端获得真正想要获得结果的那个标识对应的对称密钥。这样,服务端可以将客户端混淆集中m个标识对应的记录采用对应的对称密钥加密后发送至客户端,从而客户端对其中真正想要获得的结果采用正确的对称密钥解密,从而得到结果。其中,服务端可以预先生成m个对称密钥,这样可以在进行OT交互之前批量完成密钥准备的工作,而不会占用OT协议执行的时间。
上面是所述客户端在查询基中检索得到匹配记录的数量为1时,通过m选1不经意传输,服务端将所述混淆集中指明的m个标识对应记录/对应记录中感兴趣字段的值传输至所述客户端。此外,可能匹配的结果大于1,例如为k条(k>1),则可以通过m选k不经意传输来实现。m选k不经意传输的核心在于,客户端构造m大小的混淆集并发送至服务端,1<m<n,该m个混淆集中的1个包含所述匹配的k个记录的标识,例如m=4,k=2,匹配标识为id_1和id_3,则构造的混淆集例如为{{id_1,id_3},{id_2和id_4},{id_3和id_4},{id_2}},显然其中第一个为匹配标识构成的子集合;进而,服务端生成m个对称密钥,通过m选1的OT协议使客户端获得其指定的那个对称密钥;服务端用m个不同对称密钥分别加密所述混淆集中的m个子集合得到m个加密结果子集合,并发送该m个加密结果子集合至客户端;客户端采用获得的对称密钥解密k个匹配标识构成的那个子集合,从而获得正确的解密结果。具体实现与上述将混淆集的构建和传输与OT协议的执行进行解耦,用OT协议来传输密钥的过程类似,不再展开。
上述S310~S330的例子提供了构造混淆集的方案,以保护客户端隐私。在有些情况下,混淆集可能构造的并不合理,而仍然容易使服务器猜测到客户端真正想查询的ID。
例如上述表1中的例子,可以注意到id_4与id_7这两行中Age和Native_place是分别相同的,也就是说,无法通过Age和Native_place区分这两行。上述S332中,客户端例如构造的混淆集为{id_1,id_4},构造的检索语句为:
select Name where ID={id_1,id_4}
服务端收到该包含混淆集的检索语句后,可以得知该检索语句中的感兴趣字段是Name,而不是Age和Native_place,也就说明之前客户端在S310和S320中是敏感字段应当是Age和/或Native_place。而混淆集中的元素id_4,从Age和/或Native_place都无法与id_7这一行相区分。也就是说,混淆集中的元素如果是通过敏感字段Age和/或Native_place来定位得到的,则仅有id_4而没有id_7是不合理的。这个检索语句中,混淆集中如果有id_4,则也应当有id_7。这样,服务端可以推测出现有的混淆集{id_1,id_4}中,id_4是伪造的元素,而id_1才是真正要查询的ID,这样就一定程度上造成了客户端的隐私泄露。
上述内容是混淆集构造不合理导致客户端隐私泄露的一个示例。为了避免混淆集构造不合理,本说明书还提供一种构造混淆集的实施例。
则构造混淆集的方法如图3所示可以包括:S410:客户端发送经自身加密的敏感字段至服务端,并通过与服务端的交互得到由服务端加密的同一敏感字段。
S420:客户端在查询基中根据所述由服务端加密的敏感字段检索,得到匹配记录的第一标识集合。
S410、S420与前述S310、S320分别类似,不再赘述。假设客户端发送至服务端的敏感字段为(H(25)) β或β·H(25),则客户端通过与服务端的交互接收到由服务端加密的同一敏感字段为((H(25)) β) α或α·β·H(25)。客户端可以采用自身私钥β的对两次加密后的敏感字段解密,例如为(H(25)) α或α·H(25)。这样,客户端在本地查询基中根据所述由服务端加密的敏感字段(H(25)) α或α·H(25)检索,得到匹配记录的第一标识为id_1。后续主要以表2所示为例加以说明。
S430:客户端选取查询基中除ID字段和感兴趣字段外的至少一个点,将所选的至少一个点作为检索条件,根据该检索条件和原有的感兴趣字段构造检索语句并在查询基上执行检索,得到匹配记录的第二标识集合。
假设混淆集大小为n,即混淆集中包含n个元素。通过本说明书实施例可以生成n个元素构成的混淆集,例如每个元素为一个标识,且n个标识中包含匹配记录的标识,即前述的第一标识集合。
S430中,仍以表2或表3为例,客户端选取查询基中除ID字段和感兴趣字段外的至少一个点,例如为1个点,具体例如是选择α·H(34)这个点。查询基中的一个点可以定义为由行和列确定的一个字段值。
进而,客户端可以将所选的至少一个点作为检索条件,根据该检索条件和原有的感兴趣字段构造检索语句。原有的感兴趣即客户端原本期望查询的感兴趣字段,例如原本即希望查询Name。例如选择α·H(34)这个点,对应的字段名为Age,则构造的检索语句可以如下:
select Name where Age=α·H(34)
客户端可以在查询基上执行该检索语句,得到匹配记录的第二标识为id_4和id_7,作为第二标识集合{id_4,id_7}。
此外,还可以进一步选查询基中除ID字段和感兴趣字段外的点,将所选的点作为检索条件,根据该检索条件和原有的感兴趣字段构造检索语句并在查询基上执行检索,得到匹配记录的标识添加进第二标识集合。例如,构造的检索语句如下:
select Name where Native_place=α·H(anhui)
则匹配记录的标识为id_0和id_2。
将id_0和id_2添加进第二标识集合中,得到第二标识集合为:{{id_4,id_7},{id_0,id_2}}。
S440:客户端根据第一标识集合和第二标识集合构造混淆集。
客户端可以将第一标识id_1和第二标识为id_4和id_7共同作为混淆集,即混淆集为{id_1,id_4,id_7}。这样,得到n=3的混淆集。进而,执行S331~S335中的其它步骤,可以使得混淆集更合理,从而避免因混淆集构造不合理而导致客户端隐私泄露。最后,服务端可以通过3选1不经意传输将真正感兴趣字段的值返回至客户端。
其中,混淆集中元素的顺序并不限定,例如可以是{id_4,id_1,id_7},{id_4,id_7,id_1},{id_7,id_1,id_4},{id_7,id_4,id_1},{id_1,id_7,id_4}之类,可以是随机顺序,只要后续在OT中可以指定真正要传输的ID即可。下面例子中也类似,不再举例说明。
除此之外,还可以是n=2,例如混淆集为{id_1,{id_4,id_7}},即混淆集中的第二个元素为集合{id_4,id_7}。最后,服务端可以通过2选1不经意传输将真正感兴趣字段的值返回至客户端。另外,还可以是{id_1,{id_4,id_7},{id_0,id_2}},这时n=3,即3选1。
此外,S410、S420中的敏感字段例如是α·H(25),则得到的第一标识集合为{id_1}。例如第二标识集合为通过构建的检索式select Name where Age=α·H(46)和select Name where Age=α·H(56)分别检索得到的结果,第二标识集合为{id_3,id_9}。则,客户端根据第一标识集合和第二标识集合构造混淆集,可以是将第一标识集合与第二标识集合中的元素平铺后构成混淆集。这样,混淆集例如是{id_1,id_3,id_9}。进而,可以采用n选1不经意传输将真正感兴趣字段的值返回至客户端,这里n=3,混淆集中匹配记录数量1,即可以通过3选1不经意传输将id_0中Name字段的值返回至客户端。此外,当第一标识集合中的元素为k时,可以采用n选k不经意传输。
除了将第一标识集合与第二标识集合中的元素平铺后构成混淆集外,还可以是将第一标识集合中的元素平铺,并与第二标识集合共同构成混淆集。例如上述,第一标识集合为{id_0,id_6}。例如第二标识集合仍然为{id_3,id_9},则混淆集可以为{id_0,id_6,{id_3,id_9}}。进而,可以采用n选k不经意传输将真正感兴趣字段的值返回至客户端,这里n=3,混淆集中匹配记录数量2,即可以通过3选2不经意传输将id_0、id_6中Name字段的值返回至客户端。当第一标识集合中的元素为1时,n选k即为n选1。
类似的,还可以是将第二标识集合中的元素平铺,并与第一标识集合共同构成混淆集。例如上述,第一标识集合为{id_0,id_6}。例如第二标识集合为通过构建的检索式select Name where Age=α·H(46)和select Name where Age=α·H(56)分别检索得到的结果,第二标识集合为{id_3,id_9}。则混淆集可以为{{id_0,id_6},id_3,id_9}。进而,可以采用n选k不经意传输将真正感兴趣字段的值返回至客户端,这里n=3,混淆集中匹配记录数量1,这2个匹配记录的标识作为1个集合称为混淆集中的1个元素,即可以通过3选1不经意传输将{id_0,id_6}中Name字段的值返回至客户端。
但是,不能将除ID字段和感兴趣字段外所选的至少一个点对应的无法区分的至少两行的ID拆分到不同集合中。例如,第一标识集合为{id_0,id_6},第二标识集合为{{id_4,id_7},id_3,id_9}。其中,如前所述,除ID字段和感兴趣字段Name外,剩余的字段Age和Native_place这两个字段无法区分id_4和id_7,因此需要将id_4和id_7放入同一个集合中,即{id_4,id_7},不能拆分。反过来说,所述混淆集中除ID字段和感兴趣字段外无法区分的行的标识放置在同一个最小集合中,这一点也符合其它实现方式的要求,从而避免混淆集构造的不合理。
此外,还可以是将第一标识集合、第二标识集合中的元素混合后构成混淆集。例如上述,第一标识集合为{id_0,id_6},例如第二标识集合仍然为{id_3,id_9},则混淆集 可以为{{id_0,id_6},{id_0,id_3},id_9}。id_3和id_9是除ID字段和感兴趣字段外可以区分的,因此对应的行的标识也可以拆分,即不放置在同一个最小集合中。进而,可以采用n选k不经意传输将真正感兴趣字段的值返回至客户端,这里n=3,混淆集中匹配记录数量2,而这2个匹配记录的标识作为1个集合称为混淆集中的1个元素,即可以通过3选1不经意传输将{id_0,id_6}中Name字段的值返回至客户端。
上述例子中给出了一个条件字段的情形,两个及以上的条件字段也是类似。具体的,S430中,仍以表2为例,客户端选取查询基中除ID字段和感兴趣字段外的两个点,具体例如是选择α·H(34)和α·H(shanghai)这2个点。
进而,客户端可以将所选的2个点作为检索条件,根据该检索条件和原有的感兴趣字段构造检索语句。原有的感兴趣即客户端原本期望查询的感兴趣字段,例如原本即希望查询Name。例如选择α·H(34)和α·H(shanghai)这2个点,对应的字段名分别为Age和Native_place,则构造的检索语句可以如下:
select Name where Age=α·H(34)or Native_place=α·H(shanghai)
客户端可以在查询基上执行该检索语句,得到匹配的第二标识为id_4、id_7、id_1、id_5,作为第二标识集合{id_4,id_7,id_1、id_5}。
上述检索语句中,where这一检索条件中,Age=α·H(34)是SQL中的一个谓词,Native_place=α·H(shanghai)是SQL中的另一个谓词。两个谓词之间的连接词,这里是or,此外也可以是and之类。换句话说,连接词可以是随机的一种连接词。更广泛的,多于两个谓词时,相邻两个谓词之间的连接词是随机的一种连接词。
以下介绍本说明书一实施例中的一种实现构造混淆集的客户端,该客户端与服务端对同一目标执行的加/解密采用可交换顺序的加/解密算法,且:
所述客户端配置有查询基,所述查询基由所述服务端将数据库加密后得到;
所述客户端发送经自身加密的敏感字段至服务端,并通过与服务端的交互得到由服务端加密的同一敏感字段;在查询基中根据所述由服务端加密的敏感字段检索,得到匹配记录的第一标识集合;选取查询基中除ID字段和感兴趣字段外的至少一个点,将所选的至少一个点作为检索条件,根据该检索条件和原有的感兴趣字段构造检索语句并在查询基上执行检索,得到匹配记录的第二标识集合;根据第一标识集合和第二标识集合构造混淆集。
以下介绍本说明书一实施例中的一种实现构造混淆集的客户端,包括:处理器,存储器,存储有程序,其中在所述处理器执行所述程序时,执行上述图3中的方法。
以下介绍本说明书一实施例中的一种存储介质,用于存储程序,其中所述程序在被执行时使得客户端执行上述图3中的方法。
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University  Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为服务器系统。当然,本说明书不排除随着未来计算机技术的发展,实现上述实施例功能的计算机例如可以为个人计算机、膝上型计算机、车载人机交互设备、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
虽然本说明书一个或多个实施例提供了如实施例或流程图所述的方法操作步骤,但基于常规或者无创造性的手段可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际中的装置或终端产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境,甚至为分布式数据处理环境)。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、产品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、产品或者设备所固有的要素。在没有更多限制的情况下,并不排除在包括所述要素的过程、方法、产品或者设备中还存在另外的相同或等同要素。例如若使用到第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本说明书一个或多个时可以把各模块的功能在同一个或多个软件和/或硬件中实现,也可以将实现同一功能的模块由多个子模块或子单元的组合实现等。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
本说明书是参照根据本说明书实施例的方法、装置(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方 框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储、石墨烯存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
本领域技术人员应明白,本说明书一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本说明书一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书一个或多个实施例可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本本说明书一个或多个实施例,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本说明书的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
以上所述仅为本说明书一个或多个实施例的实施例而已,并不用于限制本本说明书一个或多个实施例。对于本领域技术人员来说,本说明书一个或多个实施例可以有各种 更改和变化。凡在本说明书的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在权利要求范围之内。

Claims (11)

  1. 一种实现构造混淆集的方法,客户端接收服务端发送的查询基,所述查询基由数据库加密后得到;所述客户端与服务端对同一目标执行的加/解密采用可交换顺序的加/解密算法;
    所述客户端发送经自身加密的敏感字段至服务端,并通过与服务端的交互得到由服务端加密的同一敏感字段;
    所述客户端在查询基中根据所述由服务端加密的敏感字段检索,得到匹配记录的第一标识集合;
    所述客户端选取查询基中除ID字段和感兴趣字段外的至少一个点,将所选的至少一个点作为检索条件,根据该检索条件和原有的感兴趣字段构造检索语句并在查询基上执行检索,得到匹配记录的第二标识集合;
    所述客户端根据第一标识集合和第二标识集合构造混淆集。
  2. 如权利要求1所述的方法,所述客户端根据第一标识集合和第二标识集合构造混淆集,包括:
    所述客户端将第一标识集合与第二标识集合中的元素平铺后构成混淆集。
  3. 如权利要求1所述的方法,所述客户端根据第一标识集合和第二标识集合构造混淆集,包括:
    所述客户端将第一标识集合中的元素平铺,并与第二标识集合共同构成混淆集。
  4. 如权利要求1所述的方法,所述客户端根据第一标识集合和第二标识集合构造混淆集,包括:
    所述客户端将第二标识集合中的元素平铺,并与第一标识集合共同构成混淆集。
  5. 如权利要求1所述的方法,所述客户端根据第一标识集合和第二标识集合构造混淆集,包括:
    所述客户端将第一标识集合、第二标识集合中的元素混合后构成混淆集。
  6. 如权利要求1所述的方法,所述检索条件包括至少两个谓词时,相邻两个谓词之间的连接词是随机的一种连接词。
  7. 如权利要求1所述的方法,所述混淆集中元素的顺序是随机顺序。
  8. 如权利要求1-7中任一项所述的方法,所述混淆集中除ID字段和感兴趣字段外无法区分的行的标识放置在同一个最小集合中。
  9. 一种实现构造混淆集的客户端,该客户端与服务端对同一目标执行的加/解密采用可交换顺序的加/解密算法,且:
    所述客户端配置有查询基,所述查询基由所述服务端将数据库加密后得到;
    所述客户端发送经自身加密的敏感字段至服务端,并通过与服务端的交互得到由服务端加密的同一敏感字段;在查询基中根据所述由服务端加密的敏感字段检索,得到匹配记录的第一标识集合;选取查询基中除ID字段和感兴趣字段外的至少一个点,将所选的至少一个点作为检索条件,根据该检索条件和原有的感兴趣字段构造检索语句并在查询基上执行检索,得到匹配记录的第二标识集合;根据第一标识集合和第二标识集合构造混淆集。
  10. 一种实现构造混淆集的客户端,包括:
    处理器,
    存储器,存储有程序,其中在所述处理器执行所述程序时,执行上述权利要求1-8中任一项所述的方法。
  11. 一种存储介质,用于存储程序,其中所述程序在被执行时使得客户端执行上述权利要求1-8中任一项所述的方法。
PCT/CN2022/135252 2022-10-09 2022-11-30 一种实现构造混淆集的方法和客户端 WO2024077734A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211229255.6 2022-10-09
CN202211229255.6A CN115801233A (zh) 2022-10-09 2022-10-09 一种实现构造混淆集的方法和客户端

Publications (1)

Publication Number Publication Date
WO2024077734A1 true WO2024077734A1 (zh) 2024-04-18

Family

ID=85432668

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/135252 WO2024077734A1 (zh) 2022-10-09 2022-11-30 一种实现构造混淆集的方法和客户端

Country Status (2)

Country Link
CN (1) CN115801233A (zh)
WO (1) WO2024077734A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170323118A1 (en) * 2016-05-05 2017-11-09 The Johns Hopkins University Apparatus and Method for Private Information Retrieval
US10289816B1 (en) * 2018-06-08 2019-05-14 Gsfm Llc Methods, systems, and devices for an encrypted and obfuscated algorithm in a computing environment
US20210194668A1 (en) * 2019-12-18 2021-06-24 International Business Machines Corporation Weighted partial matching under homomorphic encryption
CN114036565A (zh) * 2021-11-19 2022-02-11 上海勃池信息技术有限公司 隐私信息检索系统及隐私信息检索方法
CN114065252A (zh) * 2021-11-19 2022-02-18 北京数牍科技有限公司 一种带条件检索的隐私集合求交方法、装置及计算机设备
CN114386089A (zh) * 2021-12-07 2022-04-22 北京数牍科技有限公司 一种基于多方条件检索的隐私集合求交方法
US20220198048A1 (en) * 2020-12-18 2022-06-23 Seagate Technology Llc Search and access pattern hiding verifiable searchable encryption for distributed settings with malicious servers

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170323118A1 (en) * 2016-05-05 2017-11-09 The Johns Hopkins University Apparatus and Method for Private Information Retrieval
US10289816B1 (en) * 2018-06-08 2019-05-14 Gsfm Llc Methods, systems, and devices for an encrypted and obfuscated algorithm in a computing environment
US20210194668A1 (en) * 2019-12-18 2021-06-24 International Business Machines Corporation Weighted partial matching under homomorphic encryption
US20220198048A1 (en) * 2020-12-18 2022-06-23 Seagate Technology Llc Search and access pattern hiding verifiable searchable encryption for distributed settings with malicious servers
CN114036565A (zh) * 2021-11-19 2022-02-11 上海勃池信息技术有限公司 隐私信息检索系统及隐私信息检索方法
CN114065252A (zh) * 2021-11-19 2022-02-18 北京数牍科技有限公司 一种带条件检索的隐私集合求交方法、装置及计算机设备
CN114386089A (zh) * 2021-12-07 2022-04-22 北京数牍科技有限公司 一种基于多方条件检索的隐私集合求交方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ATUKURI VEERA RAGHAVA RAO; PRASAD RAMINENI SIVA RAMA: "A novel approach: Reliable and secure data storage and retrieval in a cloud", 2017 INTERNATIONAL CONFERENCE ON ENERGY, COMMUNICATION, DATA ANALYTICS AND SOFT COMPUTING (ICECDS), IEEE, 1 August 2017 (2017-08-01), pages 1296 - 1300, XP033360004, DOI: 10.1109/ICECDS.2017.8389653 *

Also Published As

Publication number Publication date
CN115801233A (zh) 2023-03-14

Similar Documents

Publication Publication Date Title
Miao et al. Lightweight fine-grained search over encrypted data in fog computing
Shen et al. Secure phrase search for intelligent processing of encrypted data in cloud-based IoT
Boneh et al. Private database queries using somewhat homomorphic encryption
Guan et al. Toward privacy-preserving cybertwin-based spatiotemporal keyword query for ITS in 6G era
US20240104234A1 (en) Encrypted information retrieval
US11223472B2 (en) Encrypted message search method, message transmission/reception system, server, terminal and program
US20230254126A1 (en) Encrypted search with a public key
Zhang et al. Secure and efficient searchable public key encryption for resource constrained environment based on pairings under prime order group
WO2024066008A1 (zh) 一种实现隐私信息检索的方法、系统、服务器和客户端
WO2024066013A1 (zh) 实现隐私信息检索
CN115795514A (zh) 一种隐私信息检索方法、装置及系统
Karati et al. Design of a secure file storage and access protocol for cloud-enabled Internet of Things environment
WO2021185434A1 (en) Fuzzy datamatching using homomorphic encryption
US20210391976A1 (en) Low latency calculation transcryption method
Cao et al. A Lightweight Fine‐Grained Search Scheme over Encrypted Data in Cloud‐Assisted Wireless Body Area Networks
Niu et al. A data-sharing scheme that supports multi-keyword search for electronic medical records
Wang et al. Attribute-based encryption scheme with multi-keyword search and supporting attribute revocation in cloud storage
US20230006813A1 (en) Encrypted information retrieval
WO2023185360A1 (zh) 一种数据处理方法、装置、系统、设备及存储介质
WO2024077734A1 (zh) 一种实现构造混淆集的方法和客户端
WO2024066015A1 (zh) 实现隐私信息检索
CN111901447B (zh) 域名数据管理方法、装置、设备及存储介质
US20230318809A1 (en) Multi-key information retrieval
WO2024087312A1 (zh) 一种数据库访问方法、计算设备和服务器
Yin et al. Attribute-Based Secure Keyword Search for Cloud Computing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22961913

Country of ref document: EP

Kind code of ref document: A1