CN114329527A - Intersection data acquisition method, equipment and system - Google Patents

Intersection data acquisition method, equipment and system Download PDF

Info

Publication number
CN114329527A
CN114329527A CN202111555247.6A CN202111555247A CN114329527A CN 114329527 A CN114329527 A CN 114329527A CN 202111555247 A CN202111555247 A CN 202111555247A CN 114329527 A CN114329527 A CN 114329527A
Authority
CN
China
Prior art keywords
hash
privacy
intersection
compression
encrypted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111555247.6A
Other languages
Chinese (zh)
Inventor
刘巍然
梁子原
彭力强
张磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202111555247.6A priority Critical patent/CN114329527A/en
Publication of CN114329527A publication Critical patent/CN114329527A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The application provides a method, a device and a system for acquiring intersection data, wherein the system comprises the following steps: the first device corresponding to the sender and the second device corresponding to the receiver. The first device encrypts a local first privacy data set by using a key obtained through OPRF to obtain a first encrypted privacy data set, a compression function is adopted to compress the first encrypted privacy data set to obtain a first compression value set, the first compression value set is sent to the second device, the compression function is used for enabling the length of input data not to exceed a set length, and the set length is determined according to a differential privacy budget and the data quantity in the first privacy data set. The second device obtains a second encrypted privacy data set obtained by encrypting the second privacy data set through the secret key, calculates the second encrypted privacy data set through a compression function to obtain a second compressed value set, and determines the intersection of the second privacy data set and the first privacy data set according to the first compressed value set and the second compressed value set to obtain the intersection of the differential privacy protection.

Description

Intersection data acquisition method, equipment and system
Technical Field
The invention relates to the technical field of data processing, in particular to a method, equipment and a system for acquiring intersection data.
Background
The protocol is a specific secure multiparty computing protocol, which allows two parties holding data to compute the Intersection part of the data sets of the two parties, and the Intersection result can be obtained by one party or both parties, but does not expose any data Set information except the Intersection, that is, except the data in the Intersection, any data in the Set except the Intersection cannot be leaked to any data party.
Since the introduction, the PSI protocol has attracted a great deal of attention in both academic and industrial sectors, and has been applied in recent years to many practical computing scenarios, such as advertisement placement, contact finding, and so on. However, when the PSI protocol is actually deployed and applied, attention needs to be paid to whether the output result of the PSI protocol, that is, the intersection set, contains some sensitive information.
In some privacy-preserving computing scenarios, some sensitive information, such as the identity information of the user, may be included in the set intersection. The existing PSI scheme can not ensure the privacy of intersection results.
Disclosure of Invention
The embodiment of the invention provides a method, equipment and a system for acquiring intersection data, which are used for protecting privacy data in the process of acquiring the intersection data.
In a first aspect, an embodiment of the present invention provides a system for acquiring intersection data, including:
executing a first device corresponding to a sender and a second device corresponding to a receiver of the privacy set intersection protocol;
the first device is configured to obtain a first privacy data set corresponding to the sender, obtain a key through an inadvertent pseudorandom function, and encrypt the first privacy data set by using the key to obtain a first encrypted privacy data set; the first encrypted privacy data set is compressed and calculated by adopting a set compression function to obtain a first compression value set, the first compression value set is sent to the second equipment, the compression function is used for changing the length of input data to be not more than a set length, and the set length is determined according to the difference privacy budget and the data quantity contained in the first privacy data set;
the second device is configured to obtain a second privacy data set corresponding to the receiving party, and obtain a second encrypted privacy data set obtained by encrypting the second privacy data set with the key generated by the oblivious pseudorandom function; and performing compression calculation on the second encrypted private data set by adopting the compression function to obtain a second compressed value set, and determining the intersection of the second private data set and the first private data set according to the first compressed value set and the second compressed value set.
In a second aspect, an embodiment of the present invention provides an intersection data obtaining method, which is applied to a first device corresponding to a sender that executes a privacy set intersection protocol, where the method includes:
acquiring a first privacy data set corresponding to the sender;
obtaining a key by an inadvertent pseudorandom function;
encrypting the first privacy data set by using the key to obtain a first encrypted privacy data set;
performing compression calculation on the first encrypted privacy data set by adopting a set compression function to obtain a first compression value set, wherein the compression function is used for changing the length of input data to be not more than a set length, and the set length is determined according to a differential privacy budget and the data quantity contained in the first privacy data set;
and sending the first compressed value set to a second device corresponding to a receiver executing a privacy set intersection solving protocol, so that the second device determines the intersection of a second privacy data set corresponding to the receiver and the first privacy data set according to the first compressed value set and a second compressed value set, wherein the second compressed value set is obtained by compressing and calculating the second encrypted privacy data set by adopting the compression function after the second device obtains the second encrypted privacy data set obtained by encrypting the second privacy data set based on the secret key.
In a third aspect, an embodiment of the present invention provides an intersection data obtaining apparatus, which is applied to a first device corresponding to a sender that executes a privacy set intersection protocol, where the apparatus includes:
the obtaining module is used for obtaining a first privacy data set corresponding to the sender;
the processing module is used for obtaining a key through an accidental pseudorandom function, encrypting the first privacy data set by using the key to obtain a first encrypted privacy data set, and performing compression calculation on the first encrypted privacy data set by adopting a set compression function to obtain a first compression value set, wherein the compression function is used for changing the length of input data to be not more than a set length, and the set length is determined according to a differential privacy budget and the number of data contained in the first privacy data set;
and the sending module is used for sending the first compressed value set to second equipment corresponding to a receiver executing a privacy set intersection solving protocol so that the second equipment determines the intersection of a second privacy data set corresponding to the receiver and the first privacy data set according to the first compressed value set and a second compressed value set, wherein the second compressed value set is obtained by compressing and calculating the second encrypted privacy data set by adopting the compression function after the second equipment obtains the second encrypted privacy data set obtained by encrypting the second privacy data set based on the key.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor, a communication interface; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to implement at least the intersection data acquisition method of the second aspect.
In a fifth aspect, an embodiment of the present invention provides an intersection data obtaining method, which is applied to a first device corresponding to a receiving party that performs a privacy set intersection protocol, where the method includes:
acquiring a first privacy data set corresponding to the receiver;
obtaining a first encrypted privacy data set in which the first privacy data set is encrypted by a key generated by an inadvertent pseudorandom function;
performing compression calculation on the first encrypted privacy data set by adopting a set compression function to obtain a first compression value set, wherein the compression function is used for changing the length of input data to be not more than a set length, the set length is determined according to a differential privacy budget and the number of data contained in a second privacy data set, and the second privacy data set corresponds to a sender executing a privacy set intersection solving protocol;
receiving a second compressed value set sent by second equipment corresponding to the sender, wherein the second compressed value is obtained by the second equipment through an inadvertent pseudorandom function to obtain a key, encrypting the second privacy data set by using the key to obtain a second encrypted privacy data set, and performing compression calculation on the second encrypted privacy data set by using the compression function;
determining an intersection of the first private data set and the second private data set from the first set of compressed values and the second set of compressed values.
In a sixth aspect, an embodiment of the present invention provides an intersection data obtaining apparatus, which is applied to a first device corresponding to a receiving party that performs a privacy set intersection protocol, where the apparatus includes:
the acquiring module is used for acquiring a first privacy data set corresponding to the receiving party and acquiring a first encrypted privacy data set obtained by encrypting the first privacy data set by a secret key generated by an accidental pseudorandom function;
the processing module is used for carrying out compression calculation on the first encrypted privacy data set by adopting a set compression function to obtain a first compression value set, wherein the compression function is used for changing the length of input data to be not more than a set length, the set length is determined according to a differential privacy budget and the data quantity contained in a second privacy data set, and the second privacy data set corresponds to a sender executing a privacy set intersection solving protocol;
a receiving module, configured to receive a second compressed value set sent by a second device corresponding to the sender, where the second compressed value is obtained by obtaining a key by the second device through an inadvertent pseudorandom function, encrypting the second privacy data set with the key to obtain a second encrypted privacy data set, and performing compression calculation on the second encrypted privacy data set with the compression function;
the processing module is further configured to determine an intersection of the first private data set and the second private data set according to the first set of compressed values and the second set of compressed values.
In a seventh aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor, a communication interface; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to implement at least the intersection data acquisition method of the fifth aspect.
In an eighth aspect, the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is enabled to implement at least the intersection data acquisition method according to the second aspect or the fifth aspect.
In a ninth aspect, an embodiment of the present invention provides a system for acquiring intersection data, including:
executing a privacy set to solve a terminal device corresponding to a user side and a server corresponding to a server side of the intersection protocol;
the terminal device is used for acquiring a contact set corresponding to the user side, acquiring a key through an accidental pseudorandom function, and encrypting the contact set by using the key to obtain an encrypted contact set; compressing and calculating the encrypted contact person set by adopting a set compression function to obtain a first compression value set, and sending the first compression value set to the server, wherein the compression function is used for changing the length of input data to be not more than a set length, and the set length is determined according to the differential privacy budget and the number of contact persons contained in the contact person set;
the server is used for acquiring a registered user set stored by the server and acquiring an encrypted registered user set obtained by encrypting the registered user set by the secret key; and compressing and calculating the encrypted registered user set by adopting the compression function to obtain a second compression value set, determining an intersection of the registered user set and the contact set according to the first compression value set and the second compression value set, and sending a registered account corresponding to a contact in the intersection to the terminal equipment.
In a tenth aspect, an embodiment of the present invention provides a system for acquiring intersection data, including:
a first financial server and a second financial server that perform a privacy set intersection protocol;
the first financial server is used for acquiring a first registered user set corresponding to a first financial service provider, acquiring a key through an oblivious pseudorandom function, and encrypting the first registered user set by using the key to obtain a first encrypted registered user set; compressing and calculating the first encrypted registered user set by adopting a set compression function to obtain a first compression value set, and sending the first compression value set to the second financial server, wherein the compression function is used for changing the length of input data to be not more than a set length, and the set length is determined according to the differential privacy budget and the number of registered users contained in the first registered user set;
the second financial server is used for acquiring a second registered user set corresponding to a second financial service provider and acquiring a second encrypted registered user set obtained by encrypting the second registered user set by the secret key; and performing compression calculation on the second encrypted registered user set by adopting the compression function to obtain a second compression value set, and determining the intersection of the first registered user set and the second registered user set according to the first compression value set and the second compression value set.
For a sender and a receiver executing the PSI protocol, it is assumed that the sender obtains a first private data set of the sender through a first device corresponding to the sender, and the receiver obtains a second private data set of the receiver through a second device corresponding to the receiver. When a sender and a receiver want to obtain the intersection of respective data sets, the implementation is realized by the scheme provided by the embodiment of the invention. Specifically, the first device obtains a key through an inadvertent pseudo random Function (OPRF), encrypts the first privacy data set using the key to obtain a first encrypted privacy data set, then performs a compression calculation on the first encrypted privacy data set using a set compression Function (such as a first hash Function) to obtain a first compression value set (such as a first hash value set), and sends the first compression value set to the second device. Wherein the compression function is adapted to change the length of the input data to not exceed a set length, the set length being determined in dependence on the differential privacy budget and the amount of data contained in the first privacy data set. The second device obtains a second encrypted private data set obtained by encrypting the second private data set by the key, that is, obtains the encrypted second private data set, and then performs compression calculation on the second encrypted private data set by using a compression function to obtain a second compressed value set (for example, a second hash value set). Finally, the second device may determine an intersection of its second private data set with the sender's first private data set from the received first set of compressed values and a locally generated second set of compressed values.
In the embodiment of the invention, the noise meeting the definition of the differential privacy safety is added into the intersection output result so as to protect the intersection privacy. Specifically, the introduction of the differential privacy noise into the intersection calculation result is realized by defining a set compression function (such as a first hash function). And a compression function for converting the input longer data (such as each piece of data included in the first encrypted privacy data set and the second encrypted privacy data set) into data with a shorter set length, wherein the set length is determined based on the differential privacy budget and the number of data included in the first privacy data set corresponding to the sender, and the set length is set so that the intersection result meets the differential privacy requirement, thereby realizing privacy protection of the intersection result.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic diagram of a system for acquiring intersection data according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an OPRF implementation according to an embodiment of the present invention;
fig. 3 is an interaction flowchart of an intersection data obtaining method according to an embodiment of the present invention;
fig. 4 is a flowchart of an intersection data obtaining method according to an embodiment of the present invention;
fig. 5 is a flowchart of an intersection data obtaining method according to an embodiment of the present invention;
fig. 6 is an application schematic diagram of an intersection data obtaining method according to an embodiment of the present invention;
fig. 7 is an application diagram of an intersection data obtaining method according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an intersection data acquisition apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an intersection data acquisition apparatus according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The features of the embodiments and examples described below may be combined with each other without conflict between the embodiments. In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.
The intersection data acquisition method provided by the embodiment of the invention can be executed by an electronic device, the electronic device can be a server or a user terminal, and the server can be a physical server or a virtual server (virtual machine) of a cloud.
In practical application, when two participating parties of the PSI protocol use personal identity information as input of the PSI protocol (that is, respective data sets of the two participating parties are composed of user identity information collected by the two participating parties), based on the conventional PSI protocol, one of the two participating parties, which obtains the intersection calculation result, will know the user identity information included in the intersection result, and these user identity information are both real and effective user information of the other party. For example, when the participating parties are companies or institutions such as hospitals and financial companies, these registered user identity information are very private sensitive data (i.e. private data), and therefore, it is desirable to provide an optimized PSI protocol, which can protect the respective private data of the participating parties during the execution process and can obtain an intersection calculation result with acceptable accuracy for the participating parties.
In this embodiment, a Differential Privacy (DP) mechanism is used to protect the respective Privacy data of the two participating parties, that is, the actually output intersection calculation result includes Differential Privacy noise. The introduction of the differential privacy noise can be realized by generating a controllable false alarm rate and a controllable missing alarm rate.
For example, assume that the intersection of the two participating parties' trues includes data 1, data 2, and data 3. The false report means that the data that should originally exist in the intersection is not included in the actually calculated intersection, for example, the actually calculated intersection only includes data 1 and data 2, that is, the data 3 that should originally exist in the intersection is not included in the actually calculated intersection. The false alarm means that data that should not exist in the intersection originally exists in the actually calculated intersection, for example, the actually calculated intersection also includes data 4, and the data 4 should not exist in the intersection originally.
The following describes in detail an execution process of the intersection data acquisition method provided by the embodiment of the present invention.
Fig. 1 is a schematic diagram of a system for acquiring intersection data according to an embodiment of the present invention, and as shown in fig. 1, the system includes two parties participating in executing a PSI protocol, which are respectively called a sender and a receiver. It is assumed that the sender performs the PSI protocol through its first device and the receiver performs the PSI protocol through its second device.
In practical application, who is the sender and who is the receiver can be randomly set in advance in two parties participating in the PSI protocol. The sender and the receiver may execute the processing steps to be executed according to the set Security Model.
And (4) safety model: according to the behavior of the participants in the protocol execution process, the method can be divided into a semi-honest adversary model and a malicious adversary model. Where the semi-honest adversary model assumes that the participating parties execute according to the steps specified by the protocol, it may be desirable to be able to speculate on the other party's input data during the execution of the protocol. The malicious model assumes that the participant can tamper with the protocol execution process at will in order to obtain the input data of the other party. In the embodiment of the invention, the sender and the receiver are assumed to obey a semi-honest adversary model.
The sender can collect the private data of the sender, and form a first private data set to be stored in the first device. The receiving party can collect the same kind of private data of the receiving party, and the second private data set is formed and stored in the second device.
In practical applications, the pieces of privacy data stored in the first privacy data set are, for example, user identity information, and similarly, the pieces of privacy data stored in the second privacy data set are also user identity information, for example, a user name and a phone number.
It should be noted that, in the embodiment of the present invention, it is emphasized that the two participating parties of the PSI protocol respectively collect the privacy data set composed of a plurality of pieces of privacy data, because the privacy data have a particularly strong privacy protection requirement. However, this does not limit that the intersection finding method provided by the embodiment of the present invention is applicable only to a set composed of private data.
In fig. 1, it is assumed that the first private data set includes the identities (such as names, phone numbers) of user a, user b, user c, and user d, and the second private data set includes the identities of user b, user c, user e, and user f. As can be seen from the assumed condition, based on the conventional PSI protocol, the final intersection calculation result is: user b and user c. However, based on the scheme provided by the embodiment of the present invention, the final intersection calculation result may not be, for example: a user b; it may be that: user b, user c, user e.
In order to complete intersection processing of private data sets corresponding to a sender and a receiver, a processing procedure of a first device and a second device corresponding to the sender and the receiver (that is, an execution step of both parties specified by an optimized PSI protocol provided in an embodiment of the present invention) includes:
a first device configured to obtain a key c through an OPRF, and encrypt a first privacy data set using the key c to obtain a first encrypted privacy data set; and performing compression calculation on the first encrypted private data set by adopting a set compression function to obtain a first compression value set, and sending the first compression value set to the second equipment.
And the second device is used for acquiring a second encrypted privacy data set obtained by encrypting the second privacy data set by the key c, performing compression calculation on the second encrypted privacy data set by adopting the compression function to obtain a second compression value set, and determining the intersection of the second privacy data set and the first privacy data set according to the first compression value set and the second compression value set.
Wherein the compression function is adapted to change the length of the input data to not exceed a set length, the set length being determined in dependence on the differential privacy budget epsilon and the amount of data contained in the first privacy data set.
In the embodiment of the present invention, optionally, the compression function may be a hash function, which is referred to as a first hash function to distinguish from other hash functions used hereinafter. The following description will be given taking an example in which the compression function is the first hash function. It will be appreciated that when a first hash function is employed, the above-described compression calculation is embodied as a hash calculation, whereby the first and second sets of compressed values will become the first and second sets of hash values.
It is to be understood that, for the first device and the second device, the input data of the first hash function is the respective pieces of encrypted privacy data in the first set of encrypted privacy data and the respective pieces of encrypted privacy data in the second set of encrypted privacy data, respectively.
A brief description of OPRF will be given. OPRF, which is the basic cryptographic protocol for constructing PSI protocols. In the OPRF protocol, assuming that input information of a receiving side is m and input information of a transmitting side is null, after the OPRF protocol is executed, the receiving side can obtain a ciphertext, i.e., F (c, m), obtained by encrypting a message m by using a key c, and the transmitting side obtains the key c. Note that the receiver cannot get information of the key c.
Before performing the OPRF, optionally, the first private data set and the second private data set may be respectively subjected to hash calculation, and the original private data is mapped to a hash value of a fixed length, so as to ensure that data of two parties participating in the PSI are equal in length.
Specifically, the first device may perform hash calculation on the first private data set by using a second hash function to obtain a third hash value set, and the second device also performs hash calculation on the second private data set by using the second hash function to obtain a fourth hash value set. And the third hash value set is composed of hash values obtained by performing hash calculation on each piece of private data in the first private data set, and the fourth hash value set is in the same way.
It can be understood that, actually, performing hash calculation on each piece of private data in the private data set is to perform hash calculation on each piece of private data in the private data set to obtain a corresponding hash value. For example, H (user i) is a hash value obtained by performing a second hash function calculation process on the identity of the user i, the user i is any piece of private data in the first private data set and the second private data set, and H () represents the second hash function.
In practice, a second hash function may be defined for mapping the input data into a binary string of a preset length, which is assumed to be s (e.g. 128 bits). That is to say, after being processed by the second hash function, each piece of private data in the first private data set and the second private data set becomes a binary string with a length s.
The first device then obtains a key c via the OPRF, and encrypts the third set of hash values using the key c to obtain the first encrypted private data set. And the second device obtains a second encrypted privacy data set after encryption of the fourth set of hash values by the key c.
Figure 2 gives an illustration of an alternative implementation of OPRF. In fig. 2, it is assumed that the first private data set of the sender is represented by Y, the second private data set of the receiver is represented by X, and H () represents the hash calculation process of the second hash function. The receiving party uses OPRF to generate a random factor z, multiplies the random factor z by its own H (X), and sends the random factor z to the sending party. The transmitting side uses OPRF to generate a secret key c, multiplies the secret key c by H (Y) of the transmitting side and z H (X) sent by the receiving side respectively, and sends a calculation result c z H (X) to the receiving side. Receiver using the inverse z of z-1Multiplying by c x z h (x), and eliminating z to obtain c x h (x).
In the example of fig. 2, h (y) and h (x) are a third set of hash values for the sender and a fourth set of hash values for the receiver, respectively. c h (y) and c h (x) are the first encrypted private data set corresponding to the sender and the second encrypted private data set corresponding to the recipient, respectively.
Assuming that the second privacy data set corresponding to the receiver contains n1 pieces of privacy data, and the first privacy data set of the sender contains n2 pieces of privacy data, in practical application, the length of the privacy data encrypted by the key c may be: λ + log (n1 × n2), where λ is a preset value, which may be 40.
Note that, in fig. 2, only the encryption method of the key c is assumed to be multiplying by data to be encrypted, and the present invention is not limited to this.
And then, the first equipment performs hash calculation on the first encrypted private data set by adopting a first hash function to obtain a first hash value set, and sends the first hash value set to the second equipment. And the second equipment performs hash calculation on the second encrypted private data set by adopting a first hash function to obtain a second hash value set.
As described above, the first hash function is used to change the length of the input data to not more than a set length determined in accordance with the differential privacy budget and the amount of data contained in the first privacy data set.
In an alternative embodiment, the set length is determined according to the following:
determining a collision factor k as: ln (1+ e)) And further determining that the set length is: the rounding-down of the quotient of the amount of data contained in the first private data set and the collision factor k.
Assuming that the second privacy data set corresponding to the receiving party includes n1 pieces of privacy data, and the first privacy data set of the sending party includes n2 pieces of privacy data, the set length is:
Figure BDA0003418901690000081
wherein the content of the first and second substances,
Figure BDA0003418901690000082
indicating a rounding down.
In order to distinguish the first hash function from the second hash function, the first hash function is represented as H1(), and as described above, the first hash function may map the input data into a character string having a length not exceeding the set length, and may be represented as:
H1:{0,1}^{λ+log(n1*n2)}->{1,2, …, N }, wherein,
Figure BDA0003418901690000083
the input data of the first hash function is the pieces of data (for example, a binary string obtained by encrypting each hash value in the third hash value set by using a key c, and a binary string obtained by encrypting each hash value in the fourth hash value set by using a key c) contained in the first encrypted private data set and the second encrypted private data set, and the data length of the input data is λ + log (N1 × N2), and after the input data is processed by the first hash function, the value range of the length of the output hash value is [1, N ], that is, for an input encrypted private data set, the output values of the first hash function are uniformly distributed in the range of 1-N.
The first device performs hash calculation on each piece of data included in the first encrypted private data set by using a first hash function to obtain a first hash value set formed by the calculated hash values. Similarly, the second device performs hash calculation on each piece of data included in the second encrypted private data set by using the first hash function to obtain a second hash value set formed by the calculated hash values.
The first device may further perform out-of-order processing on each hash value in the first hash value set, for example, randomly sort the hash values in the first hash value set, and send the sorted first hash value set to the second device.
And the second equipment determines the intersection of the second private data set and the first private data set according to the first hash value set received from the first equipment and the second hash value set locally calculated by the second equipment.
Specifically, for any private data in the second private data set, if a corresponding hash value of the any private data in the second hash value set is included in the first hash value set, the second device determines that the any private data is included in the intersection. In contrast, if the hash value corresponding to the any private data in the second hash value set is not included in the first hash value set, it is determined that the any private data is not within the intersection. By traversing each private data in the second private data set one by one, the second device may derive an intersection of the second private data set and the first private data set.
Since the second private data set and the second hash value set are both calculated and stored locally by the second device, the second device knows the hash value corresponding to each private data in the second private data set in the second hash value.
As can be seen from the above, the input data length of the first hash function is λ + log (N1 × N2), and after the processing by the first hash function, the length becomes [1, N ], that is, the longest length is N, and N is generally much lower than λ + log (N1 × N2). In the embodiment of the present invention, the purpose of defining the first hash function is to: and introducing a controllable false alarm rate for the intersection calculation result. Specifically, assuming that any private data y1 in the first private data set and any private data x1, x1 in the second private data set are not equal to y1, it is assumed that x1 is not included in the first private data set. After the second hash function processing, H (x1) is not equal to H (y1), and the results of encrypting H (x1) and H (y1) respectively by key C are respectively represented as C1 and C2, so that both C1 and C2 are not equal. The lengths of C1 and C2 are λ + log (n1 × n2), and then, H1(C1) and H1(C2) are obtained by processing C1 and C2 through a first hash function, at this time, because the lengths are shortened, H1(C1) and H1(C2) are more easily made equal, that is, the probability of equality between the two is increased through the processing of the first hash function. If H1(C1) and H1(C2) are equal, then when the second device makes a decision regarding x1 in the second privacy data set, it may determine that x1 belongs to an intra-intersection element because its corresponding H1(C1) is equal to H1(C2) received from the first device, thereby generating a false alarm — originally x1 is not included in the first privacy data set.
As can be seen from the above, in the embodiment of the present invention, by introducing the differential privacy noise (for example, by introducing the false alarm situation) into the intersection calculation result, the receiver may obtain the intersection data subjected to the differential privacy protection, where the mapping length corresponding to the first hash function is determined according to the differential privacy budget and the number of the private data included in the private data set of the sender, that is, is designed to meet the requirement that the intersection calculation result meets the differential privacy requirement. The intersection result subjected to the differential privacy protection is simply an intersection calculation result with a certain error between the intersection result and a real intersection result, but the error does not affect the usability of the intersection data.
In the above embodiment, the intersection calculation result is made to satisfy the differential privacy by introducing the form of the false alarm rate, and actually, the intersection calculation result may also be made to satisfy the differential privacy by introducing the form of the false alarm rate.
In summary, the first device may generate a random string with a set probability to replace the private data in the first private data set, where the set probability p is determined according to the differential privacy budget, and then the first device encrypts the replaced first private data set with the key c to obtain a first encrypted private data set. Of course, under the condition that the second hash function is used to perform the hash calculation processing on the first private data set first, the second hash function performs the hash calculation processing on the first private data set first to obtain a third hash value set, then a random character string is generated with a set probability p to replace hash values in the third hash value set, then a secret key c is obtained, and each hash value in the third hash value set after the replacement processing is encrypted by using the secret key c to obtain the first encrypted private data set.
Optionally, the probability p-e is setAnd epsilon is a preset differential privacy budget.
Wherein, the probability p, in terms of the third hash value set, should be understood as: each hash value in the third hash value set is replaced with a randomly generated binary string with a probability p, where the lengths of the strings before and after replacement are equal, for example, if the length of the hash value string before replacement is s, the length of the binary string after replacement is s.
In the embodiment of the present invention, the purpose of randomly generating the binary string for replacement by using the probability p is: and introducing a controllable missing report rate for the intersection calculation result. Specifically, it is assumed that any one of the privacy data y1 in the first privacy data set and any one of the privacy data x1, x1 in the second privacy data set is y 1. After the second hash function processing, H (x1) ═ H (y1), if H (y1) is replaced with a character string D1 of equal length, then the values of H (x1) and D1 would be unequal. The results after encryption of H (x1) and D1 by key C are denoted C1 and C3, respectively, then both C1 and C3 are unequal. For convenience of description, assuming that subsequent processing of the first hash function is not performed, the first device of the sender sends the encryption result C3 to the receiver, and after the second device of the receiver receives C3, an element equal to C3 is not queried in the encryption result set obtained by the second device of the receiver, it is determined that x1 is not in the intersection, which makes x1 that should be in the intersection finally determined as not in the intersection — false positive.
In summary, the embodiment of the present invention provides a set intersection solving mechanism based on differential privacy, in which a receiver obtains an intersection result with differential privacy noise, thereby ensuring privacy of intersection data itself. Meanwhile, when the mechanism is realized by using the OPRF, the first hash function is used for calculating the hash value of the output result (the encrypted private data set) of the OPRF, so that the data volume sent by the sender to the receiver is finally reduced, and the communication overhead of the protocol is further saved. Because the first hash function maps longer input data to shorter hash values.
In order to more intuitively understand the execution process of the intersection solving protocol by using the differential privacy sets provided by the embodiment of the present invention, an example is described with reference to fig. 3.
The relevant definitions referred to in fig. 3 are explained first.
Assuming that the data set X input by the receiver R is { X _1, …, X _ n1}, e {0,1}, and there are n1 pieces of private data; the data set Y input by the sender S is { Y _1, …, Y _ n2 }. epsilon {0,1 }. lambda, and n2 pieces of private data. Where {0,1 }. sup. } denotes that any piece of data in the set is a binary string of any length. Further, assuming that the set differential privacy budget is represented by ∈, the replacement probability p ═ eCollision factor k ═ ln (1+ e)). In addition, the hash function H _0: {0,1 }. sub ^ x->{0,1 }. Lambda > s, for mapping a binary string of any length to a binary string of fixed value s. In addition, another hash function H is defined: {0,1} { λ + log (n1 × n2) } ->{1,2, …, N }, where
Figure BDA0003418901690000101
The hash function is used to map a binary string of length λ + log (n1 × n2)Is a binary string of a certain length which is uniformly distributed in the range of 1-N. It will be appreciated that the manner in which information is stored and calculated in the electronic device is in the form of a binary string, and thus the binary string is described in terms of the form in which information is present in the electronic device.
As shown in fig. 3, in performing the differential privacy set intersection protocol:
1. the receiver R and the sender S firstly carry out hash calculation processing on the corresponding set X and set Y based on the hash function H _0 to obtain the following results:
{H_0(x_i)},i=1,…,n1
{H_0(y_i)},i=1,…,n2
wherein X _ i is any piece of data in the set X, and Y _ i is any piece of data in the set Y.
2. The sender S replaces H _0(y _ i) with a random character string r _ i ∈ {0,1} ^ S by the replacement probability p, and the set after replacement is { H _0(y _ i)' }, i ═ 1, …, n 2.
3. The sender S and receiver R invoke OPRF, the sender S acting as a sender of the OPRF and the receiver R acting as a receiver of the OPRF. After the OPRF execution is finished, the receiver R obtains the set { T _ i }, where i is 1, …, n1, where T _ i is F (c, H _0(x _ i)), and S obtains the key c. Where F (c, H _0(x _ i)) represents a ciphertext obtained by encrypting H _0(x _ i) with a key c. The bit length of T _ i is λ + log (n1 × n 2).
4. The sender S locally calculates a set { Q _ i }, where i is 1, …, n2, where Q _ i is F (c, H _0(y _ i)').
5. And the sender S performs hash calculation processing on the set { Q _ i } by using a hash function H to obtain a set O ═ { H (Q _ i) }, i ═ 1, … and n2, randomly sorts the elements in the set O and sends the elements to the receiver R.
6. And the receiver R performs hash calculation processing on the T _ i by using a hash function H to obtain H (T _ i). Receiver R outputs two sets:
X_1={x_i∈X|H(T_i)∈O},
Figure BDA0003418901690000111
wherein, X _1 represents an intersection set, and X _0 represents a non-intersection set.
The expression above means: for any data X _ i in the receiver set X, if the condition that the hash value H (T _ i) corresponding to X _ i is contained in the set O is met, X _ i is determined to be located in the intersection set, namely X _ i is intersection data of the set X of the receiver R and the set Y of the sender. On the contrary, if the hash value H (T _ i) corresponding to x _ i is not included in the set O, it is determined that x _ i is not located in the intersection set.
The above embodiments present one implementation of the differential privacy set intersection protocol provided herein. The following provides a general description of the principle of the set intersection protocol based on differential privacy.
Assuming that the data set X input by the receiver R is { X _1, …, X _ n1}, e {0,1}, and there are n1 pieces of private data; the data set Y input by the sender S is { Y _1, …, Y _ n2 }. epsilon {0,1 }. lambda, and n2 pieces of private data. A binary vector V ∈ {0,1} { n1 of length n1 is generated, where, for any element X _ i in the set X, V [ i ] ═ 1 if X _ i belongs to the set Y, and V [ i ] ═ 0 otherwise. Then, noise is added to the vector V based on the ε -DP differential privacy definition, and the vector V' after noise addition satisfies the security definition of ε -DP. Then the receiver may output the following two sets:
x _1 ∈ { X | V '[ i ] ═ 1} and X _0 ∈ { X | V' [ i ] ═ 0 }. Wherein, X _1 is an intersection and X _0 is a non-intersection.
Wherein, the process of adding noise, i.e. the process of obtaining V' from V. The process of V transformation to V' is actually a 1, 0 permutation process: element 1 in V is replaced with 0 with a certain probability; element 0 in V is replaced with 1 with some probability. Where a 1 to 0 is intended to generate a false alarm rate and a 0 to 1 is intended to generate a false alarm rate.
Specifically, if an element X _ i in the set X of the receiver R is located in the set Y of the sender S, V [ i ] is 1, which means that X _ i is an element in the intersection, so that V [ i ] is changed from 1 to 0, so that the element that should originally be in the intersection is changed to be not in the intersection, that is, a false positive is generated.
On the other hand, if an element X _ i in the set X of the receiver R is not located in the set Y of the sender S, V [ i ] is 0, which means that X _ i is not an element in the intersection, and therefore V [ i ] changes from 0 to 1 so that an element that should not be in the intersection is changed to be in the intersection, that is, a false alarm is generated.
It can be seen that the purpose of the transformation from V to V' is to satisfy the security definition of ε -DP, and the generation of false alarm rate and false negative rate is the means to satisfy the security definition of ε -DP. The intersection result satisfies the security definition of epsilon-DP, which means that the completely correct intersection result cannot be output, but the intersection result with a certain error rate, so as to realize the privacy protection of the true intersection result.
Based on the above description of the principle, the replacement probability p and the related processing procedure of the first hash function described above are based on a specific implementation scheme provided based on the principle.
Fig. 4 is a flowchart of an intersection data obtaining method provided in an embodiment of the present invention, where the method is applied to a first device corresponding to a sender that executes a privacy aggregation intersection protocol, and as shown in fig. 4, the method includes the following steps:
401. and acquiring a first privacy data set corresponding to the sender.
402. The key is obtained by an inadvertent pseudorandom function.
403. Encrypting the first privacy data set using the key to obtain a first encrypted privacy data set.
404. And performing hash calculation on the first encryption privacy data set by adopting a first hash function to obtain a first hash value set, wherein the first hash function is used for changing the length of the input data to be not more than a set length, and the set length is determined according to the difference privacy budget and the data quantity contained in the first privacy data set.
405. And sending the first hash value set to second equipment corresponding to a receiver executing a privacy set intersection protocol so that the second equipment determines the intersection of a second privacy data set corresponding to the receiver and the first privacy data set according to the first hash value set and a second hash value set, wherein the second hash value set is obtained by performing hash calculation on the second encryption privacy data set by adopting a first hash function after the second equipment acquires the second encryption privacy data set obtained by encrypting the second privacy data set based on the key.
Optionally, encrypting the first privacy data set by using the key to obtain a first encrypted privacy data set includes:
performing hash calculation on the first private data set by adopting a second hash function to obtain a third hash value set;
generating a random character string according to a set probability, replacing the hash value in the third hash value set, wherein the set probability is determined according to the differential privacy budget;
and encrypting the third hash value set based on the key to obtain a first encrypted privacy data set.
Based on this, the second device further performs hash calculation on the second private data set by using a second hash function to obtain a fourth hash value set, and obtains the second encrypted private data set obtained by encrypting the fourth hash value set based on the key.
In this embodiment, the processing steps at the sender side are described, and specific reference may be made to the relevant descriptions in the foregoing embodiments, which are not described herein again.
Fig. 5 is a flowchart of an intersection data obtaining method provided in an embodiment of the present invention, where the method is applied to a first device corresponding to a receiving party that executes a privacy set intersection protocol, and as shown in fig. 5, the method includes the following steps:
501. and acquiring a first privacy data set corresponding to the receiver.
502. A first encrypted privacy data set is obtained after encryption of the first privacy data set by a key generated by an inadvertent pseudorandom function.
503. And performing hash calculation on the first encryption privacy data set by adopting a first hash function to obtain a first hash value set, wherein the first hash function is used for changing the length of input data to be not more than a set length, the set length is determined according to the difference privacy budget and the data quantity contained in a second privacy data set, and the second privacy data set corresponds to a sender executing the privacy set intersection solving protocol.
504. And receiving a second hash value set sent by second equipment corresponding to the sender, wherein the second hash value set is obtained by the second equipment through an inadvertent pseudorandom function to obtain a key, encrypting the second privacy data set by using the key to obtain a second encrypted privacy data set, and performing hash calculation on the second encrypted privacy data set by using the first hash function.
505. And determining the intersection of the first private data set and the second private data set according to the first hash value set and the second hash value set.
The second privacy data set encryption process mode is as follows: and the second equipment generates a random character string according to a set probability, replaces the private data in the second private data set, encrypts the replaced second private data set by using the key to obtain a second encrypted private data set, and the set probability is determined according to the differential private budget.
In this embodiment, the processing steps at the receiver side are described, and specific reference may be made to the relevant descriptions in the foregoing embodiments, which are not repeated herein. It should be noted that the first device, the second device, the first privacy data set and the second privacy data set in the present embodiment are not consistent with the assumed situation in the other embodiments: in this embodiment, it is assumed that the sender corresponds to the second device and the second privacy data set; the receiver corresponds to the first device and the first private data set.
The set intersection solving method based on the difference privacy provided by the embodiment of the invention can be applied to various computing scenes, such as application scenes of advertisement putting, contact person searching and the like.
Taking a scenario of searching for a contact as an example, when a user registers to use a new communication service, the user wants to search for a friend who has registered a similar service from existing contacts, which is a common requirement for finding friends. In order to meet the requirement of the user, the contact information of the user can be used as one data set, the registered user information owned by the provider of the communication service can be used as another data set, and the set intersection solution based on the difference privacy provided by the embodiment of the invention is executed, so that the function of finding the contact can be completed, and the information outside the intersection can be prevented from being leaked to any one of the two parties.
Specifically, to complete the contact finding function, an embodiment of the present invention provides a system for acquiring intersection data, as shown in fig. 6, the system includes: terminal equipment corresponding to a user terminal executing the PSI protocol and a server corresponding to a server terminal. The user end can be the user in the above example, and the service end is the provider of the communication service. The server corresponding to the server may be a server cluster deployed at the cloud.
The terminal equipment is used for acquiring a contact set corresponding to the user side, acquiring a key through OPRF (optical phase frequency), and encrypting the contact set by using the key to obtain an encrypted contact set; the method comprises the steps that a set compression function such as a first hash function is adopted to carry out compression calculation on an encrypted contact person set to obtain a first compression value set (such as the first hash value set), the first compression value set is sent to a server, the compression function is used for enabling the length of input data to be not more than a set length, and the set length is determined according to a differential privacy budget and the number of contact persons contained in the contact person set;
the server is used for acquiring a registered user set stored by the server, acquiring an encrypted registered user set obtained by encrypting the registered user set by the secret key, performing hash calculation on the encrypted registered user set by adopting a compression function to obtain a second compressed value set (such as a second hash value set), determining an intersection of the registered user set and the contact set according to the first compressed value set and the second compressed value set, and sending a registered account corresponding to a contact in the intersection to the terminal equipment.
The contact set may be formed by information of each contact included in an address book in the user terminal, and the information of the contact includes a name and a contact phone. The registered user set includes registration information of each user who has registered the communication service, and the registration information includes a name and a contact phone.
The server determines that the registered user information i is contained in the intersection if the hash value corresponding to the registered user information i in the second hash value set is contained in the first hash value set aiming at any registered user information i in the registered user set of the server, so that intersection data of the registered user set and the contact set is found, communication service accounts (such as nicknames and the like) corresponding to the registered user information in the intersection are inquired and sent to the terminal equipment of the user, and the user can initiate a request for adding friends.
In this embodiment, for the execution process of the terminal device and the server, reference may be made to the execution process of the first device and the second device in the foregoing embodiment, which is not described herein again.
As described above, the purpose of introducing the first hash function is to introduce a controllable false alarm rate in the final output intersection calculation result, so that the actually output intersection calculation result meets the differential privacy requirement. Therefore, it can be understood that, assuming that 20 people originally registered for the communication service in the contact set of the user, the intersection calculation result output by the server does not necessarily include the registration information of the 20 people exactly, that is, the actually output intersection may include the registration information of other people in addition to the 20 people, and the intersection result is subjected to differential privacy protection by introducing a false alarm rate. Meanwhile, based on the calculation processing of the first hash function, the data volume sent to the server by the terminal equipment can be reduced, and the communication overhead is saved.
Taking a financial scenario as an example, many users can use not only the services such as savings provided by various banks, mobile banking/online bank transfer and the like, but also the financial services provided by various internet financial service providers today where internet financial services and products are widely accepted. Therefore, the following requirements may arise: a fsp W1 wants to count users who use the financial services (e.g., payment services) provided by both fsp W1 and fsp W2.
An embodiment of the present invention provides a system for acquiring intersection data that can meet the above requirements, and as shown in fig. 7, the system includes: a first financial server and a second financial server that execute PSI protocols. Wherein the first financial server corresponds to a first financial service provider and the second financial server corresponds to a second financial service provider. The first financial server and the second financial server may be a cluster of servers deployed in a cloud.
The first financial server is used for acquiring a first registered user set corresponding to a first financial service provider, acquiring a key through an oblivious pseudorandom function, and encrypting the first registered user set by using the key to obtain a first encrypted registered user set; the method comprises the steps that a set compression function is adopted to perform compression calculation on a first encryption registered user set to obtain a first compression value set, the first compression value set is sent to a second financial server, the compression function is used for enabling the length of input data to be not more than a set length, and the set length is determined according to a differential privacy budget and the number of registered users contained in the first registered user set;
the second financial server is used for acquiring a second registered user set corresponding to a second financial service provider and acquiring a second encrypted registered user set obtained by encrypting the second registered user set by the secret key; and performing compression calculation on the second encrypted registered user set by adopting the compression function to obtain a second compression value set, and determining the intersection of the first registered user set and the second registered user set according to the first compression value set and the second compression value set.
It is to be understood that the first and second sets of registered users include unique identifiers registered by the users when registering financial services provided by the corresponding monetary service providers.
As previously described, the compression function includes a first hash function and the compression calculation includes a hash calculation such that the first set of compressed values includes a first set of hash values and the second set of compressed values includes a second set of hash values.
In addition, optionally, the first financial server is further configured to perform hash calculation on the first registered user set by using a second hash function to obtain a third hash value set; generating a random character string according to a set probability, replacing the hash value in the third hash value set, wherein the set probability is determined according to the differential privacy budget; and encrypting the third hash value set based on the key to obtain a first encrypted registered user set. Similarly, the second financial server is further configured to perform a hash calculation on the second registered user set by using a second hash function to obtain a fourth hash value set, and obtain a second encrypted registered user set obtained by encrypting the fourth hash value set based on the key.
The above-mentioned process of acquiring the user intersection is performed only by taking the requirement that different fsps want to know the co-registered user as an example. For a specific implementation process, reference may be made to the related descriptions in the foregoing other embodiments, which are not described herein again.
In practical applications, in the money field, the calculation requirement of the user intersection is not limited to the above example, for example, a certain integrity evaluation mechanism wants to inquire whether some users have behavior of not repayment on time of the credit card, and at this time, the integrity evaluation mechanism creates a user set 1 containing the identity of each user to be evaluated and serves as a sender of the PSI protocol. Each bank is used as another participant (receiver) of the PSI protocol, a user set 2 of all credit card registered users is maintained, each bank can obtain the user who is registered with the credit card in the bank in the user set 1, namely the intersection user, by executing the intersection data acquisition scheme, so that credit card use records of the intersection user are inquired, whether overdue and non-repayment behaviors exist or not is determined, and the behavior is fed back to the integrity evaluation mechanism.
Taking a medical scene as an example, a user often does not only seek medical advice in one hospital, but when one hospital wants to know the past medical history of some patients, the following system provided by the embodiment of the present invention can be implemented:
a system for obtaining intersection data, the system comprising: a first medical server and a second medical server executing PSI protocols. Wherein the first medical server corresponds to a first hospital and the second medical server corresponds to a second hospital.
The medical system comprises a first medical server, a second medical server and a third medical server, wherein the first medical server is used for acquiring a first treatment user set corresponding to a first hospital, acquiring a secret key through an accidental pseudorandom function, and encrypting the first treatment user set by using the secret key to obtain a first encrypted treatment user set; the method comprises the steps that a set compression function is adopted to carry out compression calculation on a first encryption visiting user set to obtain a first compression value set, the first compression value set is sent to a second medical server, the compression function is used for changing the length of input data to be not more than a set length, and the set length is determined according to a difference privacy budget and the number of visiting users contained in the first visiting user set;
the second medical server is used for acquiring a second visiting user set corresponding to a second hospital and acquiring a second encrypted visiting user set obtained by encrypting the second visiting user set by the secret key; and adopting the compression function to perform compression calculation on the second encrypted visiting user set to obtain a second compression value set, and determining the intersection of the first visiting user set and the second visiting user set according to the first compression value set and the second compression value set.
And then, the second medical server can locally inquire the treatment records of the treatment users in the intersection, encrypt and transmit the treatment records to the first medical server.
Optionally, the compression function may be a first hash function, and the compression calculation includes a hash calculation, such that the first set of compressed values includes a first set of hash values and the second set of compressed values includes a second set of hash values.
In addition, optionally, the first medical server is further configured to perform hash calculation on the first medical visit user set by using a second hash function to obtain a third hash value set; generating a random character string according to a set probability, replacing the hash value in the third hash value set, wherein the set probability is determined according to the differential privacy budget; and encrypting the third hash value set based on the key to obtain a first encrypted visiting user set. Similarly, the second medical server is further configured to perform a hash calculation on the second visiting user set by using a second hash function to obtain a fourth hash value set, and obtain a second encrypted visiting user set obtained by encrypting the fourth hash value set based on the key.
In fact, the scheme provided by the embodiment of the present invention may be used to obtain intersection data in various application scenarios requiring intersection calculation, which is not limited to the above application scenarios.
The intersection data acquisition means of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these means can each be constructed using commercially available hardware components and by performing the steps taught in this disclosure.
Fig. 8 is a schematic structural diagram of an intersection data obtaining apparatus according to an embodiment of the present invention, which is applied to a first device corresponding to a sender that executes a privacy aggregation intersection protocol, and as shown in fig. 8, the apparatus includes: the device comprises an acquisition module 11, a processing module 12 and a sending module 13.
An obtaining module 11, configured to obtain a first privacy data set corresponding to the sender.
The processing module 12 is configured to obtain a key through an inadvertent pseudorandom function, encrypt the first privacy data set using the key to obtain a first encrypted privacy data set, and perform compression calculation on the first encrypted privacy data set using a set compression function to obtain a first compression value set, where the compression function is configured to change a length of input data to be not greater than a set length, and the set length is determined according to a difference privacy budget and a data amount included in the first privacy data set.
A sending module 13, configured to send the first compressed value set to a second device corresponding to a receiver that executes a privacy set intersection solving protocol, so that the second device determines, according to the first compressed value set and a second compressed value set, an intersection between a second privacy data set corresponding to the receiver and the first privacy data set, where the second compressed value set is obtained by performing compression calculation on a second encrypted privacy data set by using the compression function after the second device obtains the second encrypted privacy data set that is encrypted by using the key.
Optionally, the compression function comprises a first hash function, and the compression calculation comprises a hash calculation; the first set of compressed values comprises a first set of hash values and the second set of compressed values comprises a second set of hash values.
Optionally, the processing module 12 is specifically configured to: performing hash calculation on the first private data set by adopting a second hash function to obtain a third hash value set; generating a random character string according to a set probability, and replacing the hash values in the third hash value set, wherein the set probability is determined according to the differential privacy budget; encrypting the third set of hash values based on the key results in the first encrypted private data set.
The second device further performs hash calculation on the second private data set by using the second hash function to obtain a fourth hash value set, and obtains the second encrypted private data set obtained by encrypting the fourth hash value set based on the key.
The apparatus shown in fig. 8 may perform the steps performed by the sender side in the foregoing embodiment, and the detailed performing process and technical effect refer to the description in the foregoing embodiment, which are not described herein again.
Fig. 9 is a schematic structural diagram of an intersection data obtaining apparatus according to an embodiment of the present invention, which is applied to a first device corresponding to a receiving party that performs an intersection protocol for a privacy set, as shown in fig. 9, the apparatus includes: an acquisition module 21, a processing module 22, and a receiving module 23.
The obtaining module 21 is configured to obtain a first privacy data set corresponding to the receiving party, and obtain a first encrypted privacy data set obtained by encrypting the first privacy data set with a key generated by an unintentional pseudorandom function.
The processing module 22 is configured to perform compression calculation on the first encrypted privacy data set by using a set compression function to obtain a first compression value set, where the compression function is configured to change the length of input data to be not greater than a set length, the set length is determined according to the differential privacy budget and a data amount included in a second privacy data set, and the second privacy data set corresponds to a sender that performs the privacy set intersection solution protocol.
The receiving module 23 is configured to receive a second compressed value set sent by a second device corresponding to the sending party, where the second compressed value is obtained by the second device obtaining a key through an inadvertent pseudorandom function, encrypting the second privacy data set with the key to obtain a second encrypted privacy data set, and performing compression calculation on the second encrypted privacy data set with the compression function.
The processing module 22 is further configured to determine an intersection of the first private data set and the second private data set according to the first compressed value set and the second compressed value set.
Optionally, the compression function comprises a first hash function, and the compression calculation comprises a hash calculation; the first set of compressed values comprises a first set of hash values and the second set of compressed values comprises a second set of hash values.
Optionally, the second encrypted private data set is processed in the following manner:
and the second device generates a random character string according to a set probability, replaces the private data in the second private data set, encrypts the replaced second private data set by using the key to obtain a second encrypted private data set, and the set probability is determined according to the differential private budget.
The apparatus shown in fig. 9 may perform the steps performed by the receiver side in the foregoing embodiment, and the detailed performing process and technical effect refer to the description in the foregoing embodiment, which are not described herein again.
In one possible design, the structure of the intersection data acquiring apparatus may be implemented as an electronic device. As shown in fig. 10, the electronic device may include: a processor 31, a memory 32, and a communication interface 33. Wherein the memory 32 has stored thereon executable code which, when executed by the processor 31, makes the processor 31 at least to implement the intersection data acquisition method performed by the first device or the second device as provided in the previous embodiments.
In addition, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the intersection data acquisition method performed by the first device or the second device as provided in the foregoing embodiments.
The above described embodiments of the apparatus are merely illustrative, wherein the network elements illustrated as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (14)

1. A system for obtaining intersection data, comprising:
executing a first device corresponding to a sender and a second device corresponding to a receiver of the privacy set intersection protocol;
the first device is configured to obtain a first privacy data set corresponding to the sender, obtain a key through an inadvertent pseudorandom function, and encrypt the first privacy data set by using the key to obtain a first encrypted privacy data set; the first encrypted privacy data set is compressed and calculated by adopting a set compression function to obtain a first compression value set, the first compression value set is sent to the second equipment, the compression function is used for changing the length of input data to be not more than a set length, and the set length is determined according to the difference privacy budget and the data quantity contained in the first privacy data set;
the second device is configured to obtain a second privacy data set corresponding to the receiving party, and obtain a second encrypted privacy data set obtained by encrypting the second privacy data set with the key generated by the oblivious pseudorandom function; and performing compression calculation on the second encrypted private data set by adopting the compression function to obtain a second compressed value set, and determining the intersection of the second private data set and the first private data set according to the first compressed value set and the second compressed value set.
2. The system of claim 1, wherein the compression function comprises a first hash function;
the first device is configured to perform hash calculation on the first encrypted private data set by using a first hash function to obtain a first hash value set, and send the first hash value set to the second device;
the second device is configured to perform hash calculation on the second encrypted private data set by using the first hash function to obtain a second hash value set, and determine an intersection of the second private data set and the first private data set according to the first hash value set and the second hash value set.
3. The system according to claim 2, wherein the second device is configured to, for any private data in the second private data set, determine that any private data is included in the intersection if a corresponding hash value of the any private data in the second hash value set is included in the first hash value set.
4. The system according to claim 2, wherein the first device is configured to send the hash values in the first hash value set to the second device after randomly sorting the hash values.
5. The system according to claim 1, wherein the first device is configured to generate a random string to replace the private data in the first private data set with a set probability, the set probability being determined according to the differential privacy budget, and wherein encrypting the replaced first private data set using the key results in the first encrypted private data set.
6. The system according to any one of claims 1 to 5, wherein the first device is configured to perform a hash calculation on the first private data set by using a second hash function before encrypting the first private data set to obtain a third hash value set, and the first encrypted private data set is obtained by encrypting the third hash value set based on the key;
the second device is configured to perform hash calculation on the second private data set by using the second hash function to obtain a fourth hash value set, where the second encrypted private data set is obtained by encrypting the fourth hash value set based on the key.
7. The intersection data acquisition method is applied to first equipment corresponding to a sender executing a privacy set intersection protocol, and comprises the following steps:
acquiring a first privacy data set corresponding to the sender;
obtaining a key by an inadvertent pseudorandom function;
encrypting the first privacy data set by using the key to obtain a first encrypted privacy data set;
performing compression calculation on the first encrypted privacy data set by adopting a set compression function to obtain a first compression value set, wherein the compression function is used for changing the length of input data to be not more than a set length, and the set length is determined according to a differential privacy budget and the data quantity contained in the first privacy data set;
and sending the first compressed value set to a second device corresponding to a receiver executing a privacy set intersection solving protocol, so that the second device determines the intersection of a second privacy data set corresponding to the receiver and the first privacy data set according to the first compressed value set and a second compressed value set, wherein the second compressed value set is obtained by compressing and calculating the second encrypted privacy data set by adopting the compression function after the second device obtains the second encrypted privacy data set obtained by encrypting the second privacy data set based on the secret key.
8. The method of claim 7, wherein the compression function comprises a first hash function, and wherein the compression calculation comprises a hash calculation; the first set of compressed values comprises a first set of hash values and the second set of compressed values comprises a second set of hash values.
9. The method according to claim 7 or 8, wherein said encrypting said first privacy data set using said key results in a first encrypted privacy data set, comprising:
performing hash calculation on the first private data set by adopting a second hash function to obtain a third hash value set;
generating a random character string according to a set probability, and replacing the hash values in the third hash value set, wherein the set probability is determined according to the differential privacy budget;
encrypting the third set of hash values based on the key to obtain the first encrypted private data set;
the second device further performs hash calculation on the second private data set by using the second hash function to obtain a fourth hash value set, and obtains the second encrypted private data set obtained by encrypting the fourth hash value set based on the key.
10. An electronic device, comprising: a memory, a processor, a communication interface; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the intersection data acquisition method of any of claims 7 to 9.
11. A system for obtaining intersection data, comprising:
executing a privacy set to solve a terminal device corresponding to a user side and a server corresponding to a server side of the intersection protocol;
the terminal device is used for acquiring a contact set corresponding to the user side, acquiring a key through an accidental pseudorandom function, and encrypting the contact set by using the key to obtain an encrypted contact set; compressing and calculating the encrypted contact person set by adopting a set compression function to obtain a first compression value set, and sending the first compression value set to the server, wherein the compression function is used for changing the length of input data to be not more than a set length, and the set length is determined according to the differential privacy budget and the number of contact persons contained in the contact person set;
the server is used for acquiring a registered user set stored by the server and acquiring an encrypted registered user set obtained by encrypting the registered user set by the secret key; and compressing and calculating the encrypted registered user set by adopting the compression function to obtain a second compression value set, determining an intersection of the registered user set and the contact set according to the first compression value set and the second compression value set, and sending a registered account corresponding to a contact in the intersection to the terminal equipment.
12. A system for obtaining intersection data, comprising:
a first financial server and a second financial server that perform a privacy set intersection protocol;
the first financial server is used for acquiring a first registered user set corresponding to a first financial service provider, acquiring a key through an oblivious pseudorandom function, and encrypting the first registered user set by using the key to obtain a first encrypted registered user set; compressing and calculating the first encrypted registered user set by adopting a set compression function to obtain a first compression value set, and sending the first compression value set to the second financial server, wherein the compression function is used for changing the length of input data to be not more than a set length, and the set length is determined according to the differential privacy budget and the number of registered users contained in the first registered user set;
the second financial server is used for acquiring a second registered user set corresponding to a second financial service provider and acquiring a second encrypted registered user set obtained by encrypting the second registered user set by the secret key; and performing compression calculation on the second encrypted registered user set by adopting the compression function to obtain a second compression value set, and determining the intersection of the first registered user set and the second registered user set according to the first compression value set and the second compression value set.
13. The system of claim 12, wherein the compression function comprises a first hash function, and wherein the compression calculation comprises a hash calculation; the first set of compressed values comprises a first set of hash values and the second set of compressed values comprises a second set of hash values.
14. The system of claim 12, wherein:
the first financial server is further configured to perform hash calculation on the first registered user set by using a second hash function to obtain a third hash value set; generating a random character string according to a set probability, and replacing the hash values in the third hash value set, wherein the set probability is determined according to the differential privacy budget; encrypting the third set of hash values based on the key to obtain the first set of encrypted registered users;
the second financial server is further configured to perform hash calculation on the second registered user set by using the second hash function to obtain a fourth hash value set, and obtain the second encrypted registered user set obtained by encrypting the fourth hash value set based on the key.
CN202111555247.6A 2021-12-17 2021-12-17 Intersection data acquisition method, equipment and system Pending CN114329527A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111555247.6A CN114329527A (en) 2021-12-17 2021-12-17 Intersection data acquisition method, equipment and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111555247.6A CN114329527A (en) 2021-12-17 2021-12-17 Intersection data acquisition method, equipment and system

Publications (1)

Publication Number Publication Date
CN114329527A true CN114329527A (en) 2022-04-12

Family

ID=81051896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111555247.6A Pending CN114329527A (en) 2021-12-17 2021-12-17 Intersection data acquisition method, equipment and system

Country Status (1)

Country Link
CN (1) CN114329527A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722049A (en) * 2022-05-18 2022-07-08 华控清交信息科技(北京)有限公司 Multi-party data intersection calculation method and device and electronic equipment
CN114866312A (en) * 2022-04-24 2022-08-05 支付宝(杭州)信息技术有限公司 Common data determination method and device for protecting data privacy
CN115277253A (en) * 2022-09-26 2022-11-01 北京融数联智科技有限公司 Three-party privacy set intersection acquisition method and system
CN115936112A (en) * 2023-01-06 2023-04-07 北京国际大数据交易有限公司 Client portrait model training method and system based on federal learning
CN115935438A (en) * 2023-02-03 2023-04-07 杭州金智塔科技有限公司 Data privacy intersection system and method
CN115967491A (en) * 2023-03-07 2023-04-14 华控清交信息科技(北京)有限公司 Privacy intersection method, system and readable storage medium
CN117240619A (en) * 2023-11-13 2023-12-15 杭州金智塔科技有限公司 System and method for solving intersection base number of privacy set

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114866312A (en) * 2022-04-24 2022-08-05 支付宝(杭州)信息技术有限公司 Common data determination method and device for protecting data privacy
CN114722049A (en) * 2022-05-18 2022-07-08 华控清交信息科技(北京)有限公司 Multi-party data intersection calculation method and device and electronic equipment
CN114722049B (en) * 2022-05-18 2022-08-12 华控清交信息科技(北京)有限公司 Multi-party data intersection calculation method and device and electronic equipment
CN115277253A (en) * 2022-09-26 2022-11-01 北京融数联智科技有限公司 Three-party privacy set intersection acquisition method and system
CN115277253B (en) * 2022-09-26 2022-12-27 北京融数联智科技有限公司 Three-party privacy set intersection acquisition method and system
CN115936112A (en) * 2023-01-06 2023-04-07 北京国际大数据交易有限公司 Client portrait model training method and system based on federal learning
CN115935438A (en) * 2023-02-03 2023-04-07 杭州金智塔科技有限公司 Data privacy intersection system and method
CN115967491A (en) * 2023-03-07 2023-04-14 华控清交信息科技(北京)有限公司 Privacy intersection method, system and readable storage medium
CN117240619A (en) * 2023-11-13 2023-12-15 杭州金智塔科技有限公司 System and method for solving intersection base number of privacy set
CN117240619B (en) * 2023-11-13 2024-04-16 杭州金智塔科技有限公司 System and method for solving intersection base number of privacy set

Similar Documents

Publication Publication Date Title
CN114329527A (en) Intersection data acquisition method, equipment and system
KR102348418B1 (en) Methods and apparatus for efficiently implementing a distributed database within a network
JP6908700B2 (en) Systems and methods for information protection
CN110520881A (en) Method and system for secure data record distribution using blockchains
JP2020504916A (en) Method and apparatus for a distributed database containing anonymous entries
CN114860735A (en) Method and device for inquiring hiding trace
EP3465524A1 (en) Secure transmission of sensitive data
CN107196840B (en) Data processing method, device and equipment
CN114175028B (en) Cryptographic pseudonym mapping method, computer system, computer program and computer-readable medium
Li et al. SPFM: Scalable and privacy-preserving friend matching in mobile cloud
CN112073196B (en) Service data processing method and device, electronic equipment and storage medium
CN108805574B (en) Transaction method and system based on privacy protection
JP2023527713A (en) Filtering blockchain transactions
CN114640444A (en) Privacy protection set intersection acquisition method and device based on domestic cryptographic algorithm
KR20220012347A (en) proof of knowledge
CN114119013A (en) Block chain system and operation method thereof
CN115242371A (en) Method, device and system for calculating set intersection and cardinality of differential privacy protection
CN103368918A (en) Method, device and system for dynamic password authentication
JP2023554148A (en) Block sensitive data
CN113254989B (en) Fusion method and device of target data and server
Zhang et al. Outsourcing hierarchical threshold secret sharing scheme based on reputation
CN112765570B (en) Identity-based provable data holding method supporting data transfer
Tang et al. Two-party signing for ISO/IEC digital signature standards
Zhu et al. Universally Composable Key-Insulated and Privacy-Preserving Signature Scheme with Publicly Derived Public Key
JP2021148850A (en) Information processing system, information processing method, information processing program, secure computing system, secure computing method, and secure computing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination