CN116361649A - Efficient unbalanced PSI (program specific information) based on bloom filter and hash - Google Patents

Efficient unbalanced PSI (program specific information) based on bloom filter and hash Download PDF

Info

Publication number
CN116361649A
CN116361649A CN202310254758.7A CN202310254758A CN116361649A CN 116361649 A CN116361649 A CN 116361649A CN 202310254758 A CN202310254758 A CN 202310254758A CN 116361649 A CN116361649 A CN 116361649A
Authority
CN
China
Prior art keywords
party
hash
bloom filter
psi
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310254758.7A
Other languages
Chinese (zh)
Inventor
谈扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qianhai Xinxin Digital Technology Co ltd
Original Assignee
Shenzhen Qianhai Xinxin Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qianhai Xinxin Digital Technology Co ltd filed Critical Shenzhen Qianhai Xinxin Digital Technology Co ltd
Priority to CN202310254758.7A priority Critical patent/CN116361649A/en
Publication of CN116361649A publication Critical patent/CN116361649A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a bloom filter and hash-based efficient unbalanced PSI, wherein in the PSI calculation process, a bloom filter is used for firstly screening a large data set for one round to reduce the overall complexity; the method comprises the steps that a large data set party sends bloom filter parameters to a small data set party; a small data set side calculates an index set of which the bit position of the data set in the bloom filter needs to be 1 according to the data set of the small data set side, and sends the index set to a large data set side; initializing a bloom filter by a large data set side according to bloom filter parameters and an index set of a small data set side; the large data set party uses a bloom filter to screen to obtain an element set which is possibly in the privacy intersection in the data set, and the screened set is used as a new privacy data set for subsequently solving the privacy intersection; under the unbalanced scene, PSI efficiency based on hash and the half honest third party originally improves greatly.

Description

Efficient unbalanced PSI (program specific information) based on bloom filter and hash
Technical Field
The invention relates to a bloom filter and hash-based efficient unbalanced PSI.
Background
PSI privacy intersection is one of multiparty computing, and is also a very well-studied multiparty computing scenario. The earliest PSI algorithm based on public key modular exponentiation was computationally expensive and impractical. Along with the increasing performance of the current computer, the related algorithm is also continuously improved and developed, and the performance is greatly improved. Has been put to practical use in real life. An application of personal password leakage in microsoft Edge browser is to use PSI algorithm based on homomorphic encryption. In addition, in the social network, the method can also be used for protecting complete friend information of two parties from being revealed when the two people compare common friends. In addition, PSI can be used for botnet discovery, similarity detection, gene detection and the like, and has great potential in practical application.
Recent developments in federal machine learning to protect user privacy have also used PSI algorithms for training sample alignment between federal learning participants.
Privacy Set Intersection (PSI) involves the following concepts:
1. hash algorithm
The hash function is also called as a function for converting an input with any length into an output with a fixed length, and the hash algorithm can be generally used for improving the utilization rate of a storage space and improving the query efficiency of data. The method can also be used as a digital signature and a unique abstract of data in cryptography to ensure the security of data transmission. The hash function in cryptography is irreversible, i.e. by means of an input, the output can be easily calculated, whereas by means of an output the input can not be calculated. Furthermore, the hash function has deterministic properties, i.e. when the input is determined, the output is also determined.
2.PSI
PSI is an acronym for private set intersection (private collection intersection), where each entity has a private collection of data between two or more entities that wish to calculate the common intersection between all entities, but do not wish to reveal the contents of the respective private collection to each other. PSI is a technology for implementing such requirements, and various ways of implementing PSI can be based on public key encryption, or careless transmission, etc.
PSI belongs to one of multiparty calculation, and has application prospects, such as privacy contact person discovery, gene detection and the like. PSI is used between two or more participants, each having a private data set, who wish to calculate the common intersection between all the participants, but do not wish to reveal other content beyond the intersection to the other. The way in which PSI is implemented is varied and may be based on hashing, public key encryption, homomorphic encryption, garbling circuits, or inadvertent transmission, among others.
In the application of PSI, there are a distinction between balanced PSI and unbalanced PSI:
in balancing PSI, the privacy datasets of the two parties involved in the computation have an approximate size.
In unbalanced PSIs, the private data sets of the participants have different data set sizes, often with hundreds of standard deviations.
3. Bloom filter
Bloom filters (Bloom filters) were proposed by Bloom in 1970. It consists of a very long bit vector and a series of random mapping functions for retrieving whether an element is in a set. The method has the main advantages that the space efficiency and the query time are much better than those of the common algorithm, and the method can be used for saving the communication overhead when the network communication transmits data.
The bloom filter of a data set initializes its bit vector by a set of hash functions and the specific element values of the data set. The set of hash functions may in turn be used to detect whether an element is in the dataset. At the time of detection, it is checked whether the bit values of the bit vector positions of the index positions generated by hashing this element are all 1.
4. Semi-honest safety model
In multiparty computing, a general multiparty security computing security model is provided, in which an attacker strictly adheres to the protocol flow of multiparty computing, but it can wonder the privacy data of other participants, and attempts to restore the privacy data of other participants through the received intermediate data.
Currently, privacy Set Intersection (PSI) prior art:
(1) PSI based on hash, through calculating hash value of original data of both sides, both sides exchange and compare, and further obtain privacy intersection. This method is relatively straightforward and at the same time the efficiency is the fastest. But there is a security problem because the plaintext space for making the intersection match is often limited, such as an identification card, a mobile phone number, etc. The two interactive parties can easily deduce the original data through the private data hash value of the other party and the method of exhaustive blasting.
The security problem of the hash-based PSI can be solved by introducing a semi-honest third party. And negotiating a shared key between PSI participants as a key of the HMAC, calculating an HMAC hash value, solving the PSI hash value by a third party, and returning to the participants to restore the final PSI. Since the key of HMAC is not known by a third party, the attack by exhaustive blasting is not possible.
Under the semi-honest security model, the security problem can be solved by introducing a hash-based PSI of a third party under the condition of keeping the hash PSI efficiency, but under the unbalanced condition, the calculation and communication complexity is in a linear relation with the size of a large data set, and the overall efficiency is still low for a participant with a smaller data set.
(2) PSI calculated based on public key: the early PSI algorithm is generally based on a popular public key cryptographic algorithm of a certain class, such as RSA (rivest-Shamir-Adleman) and discrete logarithm, but the algorithm of the class often involves modular exponentiation of large integers, so that the efficiency is generally lower, and the efficiency is lower under the condition that the data sets of the two parties are larger. In short, modular exponentiations of a large number of public keys are required, and the calculation is consumed greatly and the efficiency is very low.
(3) OT-based: this class of algorithms is based on OT (inadvertent transmission), can construct one-to-many efficient OT sets with a small number of public key computations and symmetric encryption computations, and perform efficient PSI algorithms.
(4) Based on a garbled circuit: the mixed circuit is proposed by Yao Qizhi to solve the million-rich problem at the earliest, is a general safe multiparty calculation framework, can be used for converting arbitrary calculation into Boolean calculation and then running the safe multiparty calculation, has remarkable development in performance in recent years, and can convert PSI calculation into the safe multiparty calculation of the mixed circuit.
(5) Based on homomorphic encryption: based on homomorphic encryption PSI, microsoft uses the technology to solve the problem of password leakage inspection with large difference between the data set sizes of both parties. Such PSI techniques protect the data set privacy of the interacting parties by converting the computation of PSI into polynomial computation and further encrypting using homomorphic algorithms.
The PSI of the latter three methods reduces public key calculation, and the performance is greatly improved compared with PSI calculated based on the public key, and the PSI has an efficient scheme aiming at unbalanced scene privacy intersection, but the performance still cannot be compared with the PSI method based on hash.
On the other hand, at present, based on the hash PSI of the semi-honest third party, the party with a large data set needs to calculate hash values of all elements in the large set and transmit the hash values, and the overall calculation complexity and the communication complexity are in linear relation with the size of the large data set. Meanwhile, the calculation complexity of the third party for calculating the hash values of the two parties is also in a linear relation with the size of the large data set.
Disclosure of Invention
In order to solve the PSI efficiency problem based on the Hash and the semi-honest third party under the unbalanced condition, the invention provides a bloom filter and Hash based efficient unbalanced PSI.
In order to achieve the above object, the present invention provides a bloom filter and hash-based efficient unbalanced PSI, including a B-party with a large data set and an a-party with a small data set; the PSI calculation method comprises the following steps:
s1, the B party sends bloom filter parameters to the A party;
s2, according to the data set of the A party, calculating an index set of which the bit position of the data set in the bloom filter needs to be 1, and sending the index set to the B party;
s3, initializing a bloom filter by the party B according to the bloom filter parameters and the index set of the party B;
s4, screening by using a bloom filter, and screening to obtain an element set which is possibly in the privacy intersection in the data set, wherein the screened set is used as a new privacy data set for subsequently solving the privacy intersection;
s5, respectively calculating hash values of each data in the private data set by the A party and the B party through a hash algorithm, and storing the hash values;
s6, respectively transmitting the stored hash values to a semi-honest third party by the A party and the B party;
s7, the semi-honest third party compares the hash values of the two parties, screens out hash values with equal values from the hash value sets of the two parties, and sends the hash values with equal values to the two parties participating in PSI calculation;
and S8, after receiving the filtered hash values which are sent by the third party, the A party and the B party respectively compare the hash values with the locally stored hash values, and output the original data corresponding to the local hash values which are equal to the hash values forwarded by the third party as the final output of the PSI algorithm.
Further, in the efficient unbalanced PSI based on bloom filter and hash described above: the preparation step is also included before the step S1:
and initializing bloom filter parameters according to the bloom filter fault tolerance rate by the B side according to the size of the data set.
Further, in the efficient unbalanced PSI based on bloom filter and hash described above:
the fault tolerance of the bloom filter is set to 0.25.
Further, in the efficient unbalanced PSI based on bloom filter and hash described above: the bloom filter parameters include bloom filter length, hash function set.
Further, in the efficient unbalanced PSI based on bloom filter and hash described above: in the step S5, the salt value is generated by the following steps:
the A side and the B side generate a shared random key through a key exchange protocol, and the shared random key is used as hash to calculate the salt value.
Further, in the efficient unbalanced PSI based on bloom filter and hash described above: parties a and B negotiate a key exchange using Diffie-Hellman.
Further, in the efficient unbalanced PSI based on bloom filter and hash described above: the parameters adopted when the A side and the B side carry out key exchange are as follows: 2048-bit MODP group with 224-bit prime order subgroups.
Further, in the efficient unbalanced PSI based on bloom filter and hash described above: in the step S5:
if either party a or party B does not require PSI, the dishonest third party may not send these filtered hash values to the party that does not require PSI.
In the invention, a bloom filter is used to first perform one round of screening on a large data set to reduce the overall complexity. After screening, the communication complexity is greatly reduced, meanwhile, the calculation complexity of a third party is also reduced, and after the reduction, the complexity is in a linear relation with the size of a small data set. The party with the larger data set only needs to transmit the bloom filter parameters to the party with the smaller data set, and the party with the smaller data set also only needs to transmit the bloom filter index to the party with the larger data set. In terms of computational complexity, the large dataset side is also focused on computing the bloom filter index. PSI efficiency based on hash and semi-honest third parties is greatly improved.
The invention will be further described with reference to the drawings and detailed description.
Drawings
Fig. 1 is a flowchart of efficient unbalanced PSI calculation based on bloom filters and hashing in an embodiment of the present invention.
Description of the embodiments
Embodiment 1, which is a bloom filter and hash based efficient unbalanced PSI, is a data set larger party a and a data set smaller party B.
As shown in fig. 1, the specific steps are as follows:
(1) Let two parties involved in PSI calculation be A party and B party respectively, and the semi-honest third party be C party. Wherein party B has a larger data SET SET B Party a has a smaller dataset SET A
(2) Party a and party B negotiate to use Diffie-Hellman for key exchange, the parameters of the algorithm are (2048-bit MODP Group with 224-bit Prime Order Subgroup), the subgroup generator in the algorithm is denoted as g, and the parameters specify that details can be found in some web articles such as links are: https:// tools. In this embodiment, the key exchange algorithm may be any security standard key exchange algorithm.
(3) The A side generates 160bit random number a, the B side generates 160bit random number B, and the A calculates and sends g a For the B side, the B side calculates and sends g b To the A side.
(4) A receives the data g sent by B b And then further calculating to obtain a shared key ss= (g) b ) a =g ab The B party also performs a similar calculation to obtain the shared key ss= (g) a ) b =g ab
(5) The B side is according to SET B The size, and bloom filter fault tolerance p, e.g., p=0.25, initializes bloom filter parameters including bloom filter length m, hash function set h1, h2, h3, …. The hash function set may be generated as a different random salt value on a secure hash function band, e.g., h1 is SHA256, salt 36787. In practice, p may be set to a larger value, which may reduce the amount of communication transmission data.
(6) Party B sends bloom filter parameters { m, h1, h2, h3, … } to party a.
(7) After the A side obtains bloom filter parameters, for all SET A Index SET for calculating bit position 1 in bloom filter index . If there is one element x in the dataset, i1=h1 (x), i2=h2 (x), i3=h3 (x), …, and add i1, i2, i3 to the index set. Final SET index Is SET A Is a bloom filter index set.
(8) Party A will index SET SET index And sending to the B side.
(9) Party B based on bloom filter parameters { m, h1, h2, h3, … } and SET index The bloom filter BF is initialized.
(10) Party B uses BF to SET B Screening to obtain an element SET SET possibly in privacy intersection C Screened SET C As a new subsequent privacy intersection of the privacy data sets.
(11) SET of hypothesis a A = { x }, a calculates SET A Hash value hash of (a) x =SHA-256 (x+ss), and will { hash } x The set is sent to C. And (3) injection: SHA256 is a selected hash function, which may be a hash function of any security standard.
(12) Suppose the SET after B screening C = { y }, B calculates SET B Hash value hash of (a) y =sha-256 (y+ss), and will { hash y The set is sent to C. And (3) injection: where SHA256 is the hash function selected, it must be consistent with the hash algorithm selected by a.
(13) After receiving the hash values sent by A and B, the C side compares the hash values and outputs the hash values existing in both sets: { hash z If hash z ⊂{hash x } and hash z ⊂{hash y }. C will { hash ] z And transmitting to the A side and the B side. And (3) injection: if either A or B does not require the final PSI, C may not hash z And transmitted to the party.
(14) A receives { hash } z After } and { hash calculated in (11) x Comparison of { hash over all presence } z Hash value in }, i.e. hash x’ =hash z Its original { x' } is output as the final PSI output.
(15) B receives { hash } z After } and { hash calculated in (12) y Comparison of { hash over all presence } z Hash value in }, i.e. hash y’ =hash z Its original y' is output as the final PSI output.

Claims (8)

1. A bloom filter and hash-based efficient unbalanced PSI comprises a B party with a large data set and an A party with a small data set; the method is characterized in that: the PSI calculation method comprises the following steps:
s1, the B party sends bloom filter parameters to the A party;
s2, according to the data set of the A party, calculating an index set of which the bit position of the data set in the bloom filter needs to be 1, and sending the index set to the B party;
s3, initializing a bloom filter by the party B according to the bloom filter parameters and the index set of the party A;
s4, screening by using a bloom filter, and screening to obtain an element set which is possibly in the privacy intersection in the data set, wherein the screened set is used as a new privacy data set for subsequently solving the privacy intersection;
s5, respectively calculating hash values of each data in the private data set by the A party and the B party through a hash algorithm, and storing the hash values;
s6, respectively transmitting the stored hash values to a semi-honest third party by the A party and the B party;
s7, the semi-honest third party compares the hash values of the two parties, screens out hash values with equal values from the hash value sets of the two parties, and sends the hash values with equal values to the two parties participating in PSI calculation;
and S8, after receiving the filtered hash values which are sent by the third party, the A party and the B party respectively compare the hash values with the locally stored hash values, and output the original data corresponding to the local hash values which are equal to the hash values forwarded by the third party as the final output of the PSI algorithm.
2. The bloom filter and hash-based efficient unbalanced PSI of claim 1, wherein: the preparation step is also included before the step S1:
and initializing bloom filter parameters according to the bloom filter fault tolerance rate by the B side according to the size of the data set.
3. The bloom filter and hash-based efficient unbalanced PSI of claim 2, wherein:
the bloom filter fault tolerance is set to be less than 0.25.
4. The bloom filter and hash-based efficient unbalanced PSI of claim 2, wherein: the bloom filter parameters include bloom filter length, hash function set.
5. The bloom filter and hash-based efficient unbalanced PSI of claim 1, wherein: in the step S5, the salt value is generated by the following steps:
the A side and the B side generate a shared random key through a key exchange protocol, and the shared random key is used as hash to calculate the salt value.
6. The bloom filter and hash-based efficient unbalanced PSI of claim 5, wherein: parties a and B negotiate a key exchange using Diffie-Hellman.
7. The bloom filter and hash-based efficient unbalanced PSI as recited in claim 6, wherein: the parameters adopted when the A side and the B side carry out key exchange are as follows: 2048-bit MODP group with 224-bit prime order subgroups.
8. The bloom filter and hash-based efficient unbalanced PSI of claim 1, wherein: in the step S5:
if either party a or party B does not require PSI, the dishonest third party may not send these filtered hash values to the party that does not require PSI.
CN202310254758.7A 2023-03-16 2023-03-16 Efficient unbalanced PSI (program specific information) based on bloom filter and hash Pending CN116361649A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310254758.7A CN116361649A (en) 2023-03-16 2023-03-16 Efficient unbalanced PSI (program specific information) based on bloom filter and hash

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310254758.7A CN116361649A (en) 2023-03-16 2023-03-16 Efficient unbalanced PSI (program specific information) based on bloom filter and hash

Publications (1)

Publication Number Publication Date
CN116361649A true CN116361649A (en) 2023-06-30

Family

ID=86935035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310254758.7A Pending CN116361649A (en) 2023-03-16 2023-03-16 Efficient unbalanced PSI (program specific information) based on bloom filter and hash

Country Status (1)

Country Link
CN (1) CN116361649A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628758A (en) * 2023-07-21 2023-08-22 北京信安世纪科技股份有限公司 Data processing method, device and system and electronic equipment
CN116881521A (en) * 2023-08-08 2023-10-13 北京火山引擎科技有限公司 Data acquisition method, device and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628758A (en) * 2023-07-21 2023-08-22 北京信安世纪科技股份有限公司 Data processing method, device and system and electronic equipment
CN116628758B (en) * 2023-07-21 2023-09-22 北京信安世纪科技股份有限公司 Data processing method, device and system and electronic equipment
CN116881521A (en) * 2023-08-08 2023-10-13 北京火山引擎科技有限公司 Data acquisition method, device and storage medium

Similar Documents

Publication Publication Date Title
CN109495465B (en) Privacy set intersection method based on intelligent contracts
Wang et al. One-time proxy signature based on quantum cryptography
CN110113203B (en) Method and equipment for security assessment of encryption model
CN116361649A (en) Efficient unbalanced PSI (program specific information) based on bloom filter and hash
CN110719159A (en) Multi-party privacy set intersection method for resisting malicious enemies
Zhang et al. Verifiable private multi-party computation: ranging and ranking
CN115051791B (en) Efficient three-party privacy set intersection method and system based on key agreement
CN105046234A (en) Invisible recognition method used for human face image in cloud environment and based on sparse representation
CN109756893A (en) A kind of intelligent perception Internet of Things anonymous authentication method based on chaotic maps
CN104468612A (en) Privacy protection type attribute matching method based on symmetrical encryption
CN112417489B (en) Digital signature generation method and device and server
CN114584294A (en) Method and device for careless scattered arrangement
CN113556225A (en) Efficient PSI (program specific information) method based on Hash and key exchange
CN115630713A (en) Longitudinal federated learning method, device and medium under condition of different sample identifiers
CN117171779B (en) Data processing device based on intersection protection
Zhu et al. A Provably Password Authenticated Key Exchange Scheme Based on Chaotic Maps in Different Realm.
Agrawal et al. Game-set-MATCH: Using mobile devices for seamless external-facing biometric matching
CN109889329A (en) Anti- quantum calculation wired home quantum communications method and system based on quantum key card
CN117353912A (en) Three-party privacy set intersection base number calculation method and system based on bilinear mapping
CN108599923A (en) The implementation method of data efficient safe transmission between cloud computing server
CN115001651A (en) Multi-party computing method based on fully homomorphic encryption and suitable for semi-honest model
Zhang et al. Blockchain Multi-signature Wallet System Based on QR Code Communication
Zhu et al. A Novel Biometrics-based One-Time Commitment Authenticated Key Agreement Scheme with Privacy Protection for Mobile Network.
Xu et al. Efficient and Privacy-Preserving Federated Learning with Irregular Users
EP3883178A1 (en) Encryption system and method employing permutation group-based encryption technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination