WO2023069631A1 - Memory and communications efficient protocols for private data intersection - Google Patents

Memory and communications efficient protocols for private data intersection Download PDF

Info

Publication number
WO2023069631A1
WO2023069631A1 PCT/US2022/047294 US2022047294W WO2023069631A1 WO 2023069631 A1 WO2023069631 A1 WO 2023069631A1 US 2022047294 W US2022047294 W US 2022047294W WO 2023069631 A1 WO2023069631 A1 WO 2023069631A1
Authority
WO
WIPO (PCT)
Prior art keywords
client
private
server
protocol
message
Prior art date
Application number
PCT/US2022/047294
Other languages
French (fr)
Inventor
Melissa Chase
Sanjam GARG
Mohammad HAJIABADI
Peihan MIAO
Original Assignee
Nttr Research, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nttr Research, Inc. filed Critical Nttr Research, Inc.
Publication of WO2023069631A1 publication Critical patent/WO2023069631A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing

Definitions

  • PSI Private set intersection
  • PSI and its variants have found many real-world applications including online advertising, password breach alert, mobile private contact discovery, and privacy-preserving contact tracing.
  • Oblivious transfer is a foundational primitive in cryptography. We are interested in two-message OT protocols between: (i) a receiver with an input bit b who sends the ⁇ rst message otr of the protocol, and (ii) a sender with input two (equal length) strings m 0 ;m 1 who sends the second message ots.
  • Rate-1 OT enables powerful applications such as (i) semi-compact homomorphic encryption for branching programs (where the ciphertext grows only with the depth but not the size of the program) as well as (ii) communication-e ⁇ cient private-information retrieval (PIR) protocols.
  • the rate-1 property is crucial in realizing these applications, allowing a sender to compress a large database for a receiver who is interested only in a small portion of it.
  • m : (m 00 ;m 01 ;m 10 ;m 11 ).
  • the receiver on an input uw ⁇ ⁇ 0; 1 ⁇ 2 will send two messages otr and otr ′ , the ⁇ rst one for choice bit u and the second one for w.
  • the sender will use otr ′ once against (m 00 ;m 01 ) and once against (m 10 ;m 11 ) to get two outgoing messages ots 0 and ots 1 .
  • the receiver is only interested in ots u , but the sender does not know which one it is
  • the sender compresses (ots 0 ; ots 1 ) using otr, allowing the receiver to learn ots u , and consequently m uw .
  • the above construction employs a self-eating process, where a pair of ots messages is used as the sender input for the next OT, and so on.
  • Employing a low rate 1-out-of-2 OT to build 1-out-of-n OT will blow up the communication, falling short for PIR. To see this, suppose
  • n 2 k
  • the size of the resulting ots message (which either packs two previous ots messages, or two leaf messages) doubles, resulting in a ⁇ nal message of size at least O(2 k u), where u is the size of each initial individual message of the sender.
  • the protocol is a 1-out-of-n OT, it is not a sublinear PIR, because the size of the sender's protocol message is not sublinear in its total input size, nu.
  • rate-1 OT In one known DDH-based rate-1 OT con- struction, for a sender with (m 0 ⁇ ⁇ 0; 1 ⁇ n ;m 1 ⁇ ⁇ 0; 1 ⁇ n ), the receiver should send a linear (O(n)) number of group elements for each bit of the sender, resulting in overall O(n 2 ) group elements. This incurs high receiver communication in the respective applications.
  • Other solutions have obtained rate-1 OT for which otr consists of only a linear O(n) number of group elements in total, as opposed to O(n 2 ).
  • One limitation of the prior work is that it only improves the communication efficiency of the base rate-1 OT, but still requires the receiver to send a fresh otr message for each new OT execution.
  • rate-1 OT Most applications of rate-1 OT require executing it multiple times, resulting in large communication costs for the receiver. This constitutes a prohibitive overhead for the receiver in applications in which the depth of the branching program is large, and the receiver needs to engage with a sender holding a branching program BP on many di ⁇ erent inputs x 1 ; : : : ; x n (e.g., PSI). Addressing this communication bottleneck is desired and herein is described a new primitive that called receiver-amortized (or amortized, for short) rate-1 OT. BRIEF SUMMARY OF THE INVENTION We introduce a new technique for amortizing the cost of multiple rate-1 OTs.
  • PSI with unbalanced inputs We apply our techniques to private set intersection with unbalanced set sizes (where the receiver has a smaller set) and achieve receiver communication of O((m+ ⁇ ) logN) group elements where m;N are the sizes of the receiver and sender sets, respectively. Similarly, after a one-time setup (or one PSI instance), any following PSI instance only requires communication cost O(m ⁇ logN) number of group elements. All previous sublinear-communication non-FHE based PSI protocols for the above unbalanced setting were also based on rate-1 OT, but incurred at least O( ⁇ 2 m logN) group elements. In various embodiments, a computer-implemented method, computing device, and computer-readable storage media are disclosed.
  • the computer-implemented method, computing device, and computer-readable storage media can comprise: storing on a remote server a data set X of N elements; storing on a client a single data element y, wherein all of the elements in X and y are lambda-bit strings; at the client: establishing g as a cryptographically secure hash function; executing the cryptographically secure hash function on the single data element y to generate a hash result b; computing a client message of a private set intersection protocol with the single data element y, and computing a client message of private information retrieval with query b, transmitting the client message of a private set intersection protocol and the client message of private information retrieval to the remote server; at the remote server: computing hashes of all N elements of data set X using the secure hash function g; partitioning the N elements of data set X into multiple sets based on
  • the private set intersection protocol is a two-round safe function evaluation where the client input is x and server input is X, and at the end of the protocol the client learns whether x is in X.
  • the method makes only a limited use of expensive cryptographic group operations, wherein the number of operations is smaller than the size of X.
  • the private information retrieval is a two-round safe function evaluation where the client input is index i and server input is X and at the end of the protocol the client learns the i-th element of X.
  • the method uses limited communication, wherein the total number of bits sent over a channel is smaller than the size of X.
  • the hash function g is selected to have an output size that is optimized for communication efficiency or to minimize use of expensive cryptographic group operations, such that a larger output size has improved communication efficiency and a smaller output size has reduced computation cost and less communication efficiency, wherein the efficiency is measured by the number of bits exchanged over a channel.
  • the data set X is provided by the client, or the data set X is provided by a third-party, or the data set belongs to the remote cloud service.
  • the data set X of N elements represents aggregated password data
  • the single data element y represents a client password.
  • the data set X of N elements represents aggregated image information
  • the single data element y represents a client image.
  • the data set X of N elements represents aggregated contact list or personal information
  • the single data element y represents an instance of client contact information.
  • DETAILED DESCRIPTION We put forth a cryptographic primitive that we call amortized rate-1 OT, and show how to realize it using standard assumptions on bilinear groups. As applications we obtain significant efficiency improvements, shaving a factor of poly( ⁇ ) off the receiver communication in various protocols involving secure branching program computation (e.g., unbalanced PSI).
  • An amortized rate-1 OT breaks up the computation of a receiver into an o ⁇ ine and online phase. The offline phase is performed by the receiver once and for all, prior to receiving any choice bits.
  • the state str used by OT 1 and OT 3 is the same as the initial state outputted by e
  • Sender rate-1 communication
  • n+ poly( ⁇ ), where poly is a ⁇ xed polynomial (e.g., the size of a group element) independent of how large n is. 2.
  • poly ′ ( ⁇ ), where poly ′ ( ⁇ ) is independent of n. 3.
  • Receiver privacy We specify indistinguishability security for the receiver against adaptive adversaries. If , an adaptive adversary who is given prm and who sends many pairs of choice bits in an adaptive fashion cannot determine whether his received otr messages (all made relative to str) were built using the ⁇ rst choice bits or the second choice bits of his submitted pairs. Notice that since otr messages are all produced based on the same private state str, we should give the adversary the ability to submit many pairs. 4.
  • Sender privacy Standard indistinguishability security against honest receivers. For applications involving non-oblivious branching programs we need to strengthen sender privacy. For oblivious branching programs, from which all our applications are obtained, the stated requirement su ⁇ ces.
  • a deterministic k-bit input branching program BP is a directed acyclic graph, where every leaf node has a label 0 or 1 (reject or accept), and every non- leaf node v has a label lb(v) .
  • the root node is labeled with 1. Every non-leaf node has two outgoing edges labeled 0 and 1.
  • BP(x) b if the underlying computation path ends in a b-labeled leaf node.
  • the size of a branching program is the number of nodes, and the depth, ⁇ , is the length of the longest path.
  • the standard de ⁇ nition of oblivious branching programs is more general than what we give here, but we stick to our own de ⁇ nition since it captures our application needs. As an example, consider a client who wants to know whether her input x ⁇ ⁇ 0; 1 ⁇ ⁇ is in the set D ⁇ ⁇ 0; 1 ⁇ of a server.
  • PSI is constructed as follows: for every string a ⁇ ⁇ 0; 1 ⁇ ⁇ ⁇ ⁇ ⁇ 0; 1 ⁇ ⁇ such that a is a pre ⁇ x of a string in D, we put a node v a in the graph.
  • v ⁇ as the root node, and all v a such that a ⁇ ⁇ 0; 1 ⁇ ⁇ as accept leaf nodes.
  • a secret key is an n-bit tuple of exponents sk and the public key is pk
  • Shrink and ShrinkDec where Shrink(pk; ct) shrinks ct ⁇ G n+1 to obtain Shrink(ct) ⁇
  • Shrinkcorrectness Approach.
  • G be a group of prime order p with a generator g. We let denote a vector which has g in its ith position, and the identity element 1 everywhere else.
  • the receiver on a choice bit b samples and for every i ⁇ [n] samples a nd sets where denotes entry-wise exponentiation, and ( ⁇ ) denotes entry-wise group multiplication. She sends otr to the sender. et ( 0 ; 1 ) ⁇ 0; ⁇ n be a vector concatenating the two strings of the sender. Let where we overload the ( ⁇ ) notation ⁇ to define we have (g ′ ; g 1 ′ ; : : ; g′ n) ⁇ Enc(pk;m b ), where Enc denotes n-bit packed ElGamal.
  • each vector is a ⁇ exponentiation of but with a bump on its (n+ ib)'s location: namely, we multiply its (n+ b)'s location by g. SXDH.
  • the sender uses the pairing to computes the inner product of with all the vectors in the left-hand side of and the inner product of with all the vectors in the right-hand side o f the ; ; ; That is, using the notation above, letting the sender will compute
  • the sender has now built ( ; ) that satis ⁇ es the bump structure explained in the ⁇ rst paragraph. Namely, think of the ith row of as e i in that paragraph.
  • the sender can perform the step explained in the ⁇ rst paragraph to send a rate-1 message ots, and the receiver will be able to use sk to decrypt it to obtain m b .
  • the protocol has rate-1 sender communication, and that otr consist of only 4 group elements in G 2 .
  • an adversary A cannot distinguish between a world in which otr always encrypts the bit 0 from a world in which otr encrypts 1; the proof for the case where the adversary can submits adaptively- chosen pairs of choice bits will be similar.
  • the receiver samples a non-reusable message for a choice bit b exactly as in the SXDH case
  • A is a randomized algorithm
  • a 1 ; : : : ; a n denotes the random variable obtained by sampling random coins r uniformly De ⁇ nition 2.1 (Pairings and SXDH hardness).
  • a bilinear map is given by , where p is a prime number and is the order of G 1 , G 2 and G T , and g and h are random generators of G 1 and G 2 , respectively.
  • the function e is a non-degenerate map, satisfying for all exponents a and b.
  • the Symmetric External Di ⁇ e-Hellman (SXDH) assumption says G 1 and G 2 , sampled as above, are DDH-hard.
  • the sender will use (prm; otr) to complete an OT transfer for any pair of messages Takes as input a security parameter and n, denoting the maximum length of each of the sender's messages, and outputs a private state str and a reusable message prm. • Takes as input a security parameter and a choice bit and outputs a a protocol message otr. We refer to otr as a fresh receiver's message, to distinguish it from the reusable message prm. ots: Takes as input a reusable message prm, a fresh message otr and a pair of messages and outputs ots.
  • the challenger samples and (str; prm) ( ; ) and gives prm to A. Then, A adaptively submits queries (s 0 ; s 1 ) ⁇ ⁇ 0; 1 ⁇ 2 , and receives OT 1 (str; s b ). A has to guess the value of b.
  • the new sender on a pair of messages (m 0 ;m 1 ) ⁇ ⁇ 0; 1 ⁇ n ⁇ ⁇ 0; 1 ⁇ n samples two seeds (r 0 ; r 1 ) whose length is su ⁇ - ciently larger than poly( ⁇ ) but independent of n.
  • the sender sends (ots ′ ′ 1; ots 2 ) to the re- DCver, where (prm; otr)) and (prm; otr)), where and and and Ext is a randomness ex- tractor.
  • the protocol is still sender rate-1.
  • pk : (g; g 1 ; : : : ; g n )
  • Enc(m 1 ; : : : ;m n ) as ct :
  • We have a shrinking procedure for n-bit ElGamal encryption that will shrink a ciphertext into one group element plus n bits, while allowing for e ⁇ cient decryption.
  • Rate-1 sender communication and receiver amortized compactness We have Receiver Privacy In the following we say a vector is non-orthogonal to 1. This is an abuse of terminology (because non-orthogonality refers to any non-zero inner product), but we stick to it below.
  • receiver OT security we should argue that a fresh receiver protocol message otr does not reveal the receiver's underlying choice bit.
  • the main di ⁇ culty is that all otr values depend on the vector u.
  • the core of our argument is in showing that the vector remains hidden in the following sense. Given a sequence of an adversary cannot determine the order of orthogonality/non-orthogonality in any given pair, with respect to .
  • any receiver's future fresh message otr may be simulated by the underlying choice bit b and a pair of vectors which are orthogonal/non-orthogonal to , in a way that if the joint distribution of ( ; ) is pseudorandom, then the entire simulated view will be pseudorandom as well, masking the choice bits.
  • We will then show that the distribution of a random subject to them being orthogonal/non-orthogonal to a random is uniformly random. Taken all together, receiver security will follow. De ⁇ nition 3.4 (Distribution Dual).
  • m 3n ⁇ 1 suffices construction 4.1 (Amortized rate-1 OT: Bilinear Power DDH).
  • OT : (PreP;OT 1 ;OT 2 ;OT 3 ) is built as follows. 3. Return private state and reusable message 5 Optimization In this section, we discuss some techniques to improve the concrete computational e ⁇ - ciency and lower the communication cost in amortized rate-1 OT. These optimizations work for both the basic amortized rate-1 OT from bilinear SXDH and the sliding-window construction from bilinear power DDH. In Section 6 when we describe the applications of amortized rate-1 OT, we will discuss further optimizations speci ⁇ c to these applications.
  • Delayed Pairing Recall that when the sender computes her response message, she needs to compute the hash-key vector , which requires 4n pairing operations. In addition, she needs to compute the matrix IK, which requires 4n 2 pairing operations in the basic construction and 6n pairing operations in the sliding-window construction. Since paring operations are orders of magnitude more expensive than the other group operations, we introduce a technique to minimize it.
  • Basic Construction The high-level idea is that we can leverage the bilinear property to delay the pairing operations. Instead of ⁇ rst performing the pairing operations and then computing inner products in the target group, we can ⁇ rst compute the inner products in G 1 and then perform the pairings. In more detail, in the basic construction, let Let be the sender messages.
  • receiver message otr the inner product of can be computed as Here computes inner products for each vector component of M 0 and results in a vector of two group elements in G 1 , and takes the inner product on the exponent of the two vectors. is computed in the same way.
  • the same approach can be applied to compute
  • the computational cost of in the basic construction includes 4n pairing op- erations and 4n multiplications in G T . By using the above technique, this cost can be reduced to 4 pairing operations, 4n multiplications in G 1 , and 3 multiplications in G T . The same improvement applies to each inner product .
  • the total computational cost of the sender is reduced to 4n pairing operations, 4n 2 multipli- cations in G 1 , and 3n multiplications in G T .
  • Sliding-Window Construction The same technique can be applied on the sliding- window construction and the improvements on is the same as above.
  • the total cost of computing in the sliding-window construction includes 6n pairing operations and (2n 2 + 3n) multiplications in G T . This can be improved to 4n pairing operations, 4n 2 multiplications in G 1 , and 3n multiplications in G T .
  • Increasing Vector Dimension Reducing Hash Value Size The hash value currently contains a single group element in G T .
  • the base hash key M is the same as before except that each p is of dimension 3.
  • the receiver's reusable message is rede ⁇ ned by where all p i 's are random exponents and ; p For a choice bit b, the receiver samples a single random vector , and sends a single vector Next the sender computes by taking the inner product in the exponent of M and The matrix IK can be computed by taking the inner product in the exponent of 's and We can use delayed pairing to compute by Again, we can reduce the hash value size by sending 3 group elements in the vector and postpone the pairing operations to the receiver side.
  • the receiver's non-reusable message is increased from (4n 2 +4n) to (6n 2 +6n) group elements in G 1 , but the non-reusable message is reduced from 4 to 3 group elements in G 2 .
  • the hash value in the sender's message is reduced from 1 group element in G T to 3 group elements in G 1 .
  • Sliding-Window Construction The same technique can be applied on the sliding- window construction and the improvements on the communication is the same as above.
  • the receiver's reusable message is increased from 10n to 15n group elements in G 1 , but the non-reusable message is reduced from 4 to 3 group elements in G 2 .
  • the hash value in the sender's message is reduced from 1 group element in G T to 3 group elements in G 1 .
  • 6 Applications In this section, we discuss several applications of our amortized rate-1 OT and focus on the communication improvements over prior work. For certain applications, we will discuss optimizations that further improve the communication and/or computational complexity.
  • Secure Function Evaluation on Branching Programs The work of Ishai and Paskin presents an approach to two-round secure function evaluation (SFE) on (oblivious) branch- ing program (BP) from rate-1 OT where the communication complexity only grows with the depth of the branching program instead of its size. In particular, consider a sender holding a private branching program P and a receiver holding a private input x.
  • the client has a single data point and would like to perform a secure inference with the server on the decision tree.
  • the decision tree can be formalized as a branching program and two-round secure inference can be achieved by two-round SFE described above, where the communication only grows with the depth of the tree.
  • PSI and PIR In this section, we illustrate several useful applications that can be viewed as special cases of SFE on oblivious BP, hence they achieve the same improvements over prior work.
  • a dummy node of depth d is connected to two dummy nodes with depth d ⁇ 1.
  • the client only needs to performs m instances of SFE on the oblivious BP to learn the intersection X ⁇ ⁇ y ⁇ for every y ⁇ Y .
  • PIR Private Set Intersection
  • PIR-with-Default Consider a PIR variant where the server holds N binary strings s 1 ; : : : ; s N ⁇ ⁇ 0; 1 ⁇ t along with N values v 1 ; : : : ; v N ⁇ ⁇ 0; 1 ⁇ k .
  • the server additionally holds a default value v d ⁇ t ⁇ ⁇ 0; 1 ⁇ k .
  • PIR-with-Default the default value v d i is ⁇ ⁇ t sampled at random such that all the default values sum up to 0, namely All the non-default values in a single instance are set to .
  • the client sums up all the values retrieved from the PIR-with-Default instances. Similar to PSI, we should prune the full binary tree to obtain an oblivious BP with depth ⁇ and polynomial size. Optimization for PSI and PSI-Cardinality We design optimizations for unbal- anced PSI and PSI-Cardinality so as to achieve better communication than the above generic approaches. Optimized PSI Note that the aforementioned oblivious BP for PSI has depth . To further improve the communication complexity, we replace small subtrees by small instances of two-round PSI (e.g. DDH-based PSI), which we denote by ⁇ PSI .
  • DDH-based PSI two-round PSI
  • the server ⁇ rst hashes hisN elements intoN random bins. We know that each bin has at most O(logN) elements.
  • the client computes the same hash on y to identify the bin b that could possibly contain an element y. Now the client queries the server with PIR-with-Default on a string b. The client additionally sends the round-1 message of the two-round PSI protocol ⁇ PSI on a single element y. The server then computes a round-2 message of ⁇ PSI for each bin with elements in that bin.
  • the server views his database for PIR-with-Default as all the N indices of the bins along with the associated values being the round-2 messages of ⁇ PSI , and generates the response for PIR-with-Default.
  • the client ⁇ rst recovers the round-2 message of ⁇ PSI from PIR-with-Default, and then recovers the output of ⁇ PSI , namely X ⁇ ⁇ y ⁇ .
  • the receiver's reusable communication is reduced from O( ⁇ 2 ) to O( ⁇ ⁇ logN) group elements in G 1 .
  • her online communication is reduced from O( ⁇ ) to O(logN) group elements in G 2 .
  • PSI-Cardinality We can optimize the PSI-Cardinality protocol by replacing small sub- trees by small instances of two-round PSI-Cardinality (e.g. DDH-based PSI-Cardinality), similarly as in the above PSI protocol. However, this would reveal which elements are in the intersection and which are not. Nonetheless, we notice that in our reusable rate-1 OT protocol, any OT response from the sender can be decrypted by the receiver using the same secret state str, and the receiver cannot distinguish between di ⁇ erent responses. Therefore, the server can randomly shuffle the responses for all the PIR-with-Default instances so that the client can only learn the cardinality of the intersection.
  • PSI-Cardinality e.g. DDH-based PSI-Cardinality
  • PSI-Sum from PIR-with-Default similarly as in the PSI-Cardinality protocol except that all the non-default values v j in a single instance are set to vi d ⁇ t +w j where w j is the corresponding weight. Note that this approach additionally hides the PSI-Cardinality and only reveals the PSI-Sum.
  • ⁇ .
  • the client additionally sends Enc(w) to the server (in the online phase) where Enc is an additively homomorphic encryption scheme.
  • the server picks a random value ⁇ as his output of Extended-PIR-with-Default and replaces each value v in a leaf node of the PIR-with-Default tree by Enc(v ⁇ w ⁇ ).
  • the client needs to decrypt her output from PIR-with-Default to recover her output for Extended-PIR-with-Default.
  • OT provides strong sender privacy if there exists a PPT algorithm OTSim such that for any bit b and any pair of messages (m 0 ;m 1 ), sampling PreP (str; b), the two distributions OT 2 ((prm; otr); (m 0 ;m 1 )) and OTSim(prm;m b ) are statistically close.
  • Our amortized rate-1 OT constructions do not provide strong sender privacy, because OT 2 is deterministic.
  • a randomized OT 2 version of these constructions obtained by using random extractors and PRGs, as explained in Section 2.
  • OTSim The simulation algorithm OTSim, which is only given m b , should somehow sample from OT 2 ((prm; otr); (m 0 ;m 1 )).
  • OTSim may, instead, sample from OT 2 ((prm; otr); (m b ;m b )).
  • the main challenge in doing so is that OTSim is only given (prm;m b ), and not otr, which in turn is sampled based on str, not known to OTSim.
  • OT 2 ((prm; otr); (m b ;m b )
  • cloud computing infrastructure 100 comprises a data store 110 that hosts a corpus repository, typically with access controls.
  • the cloud provider executes a cloud provider Private Set Intersection (PSI) manager 105, in some embodiments, in association with a service that provides responsive actions, such as one or more of: alerting, redaction, tokenization, labelling, sandboxing, and the like.
  • PSI Private Set Intersection
  • the data store 110 stores an entire set of content (information), although the tool 105 itself may just operates on an index of that set of content.
  • the client computing environment 115 which typically is hosted in an client private network, comprises a database 125 of sensitive data (e.g., PII, PHI, or the like), as well as an instances of both the PSI manager 120 and a response service.
  • Client-based resources communicate with cloud provider-based resources via client-server based communications, such as described herein.
  • Each side of the communication link is implemented in one or more data processing systems, such as described and depicted herein.
  • the cloud computing infrastructure may be implemented as described herein, and it may utilize one or more additional services.
  • the PSI managers (105 and 120) interoperate with one another to implement a PSI protocol exchange, with the cloud-based manager evaluating the index of the set of content stored in the cloud data store.
  • the response services can execute as software (one or more computer systems, programs, processes, etc.) executing in hardware or virtual machines.
  • the Private Set Intersection protocol which is a form of secure multi-party computation (MPC)
  • MPC secure multi-party computation
  • the index of an arbitrarily large corpus 110 in the cloud computing environment 100 is examined, preferably in an automated manner, and the response service(s) flag or redact anything that is in the client's full, de ⁇ nitive list of sensitive data stored in the client data store 125 (e.g., patient names and record numbers, or any other piece of information that the client considers sensitive) without revealing to the service provider any new information that is not already present on the cloud.
  • this approach thus provides for a ⁇ zero knowledge"-based proof regarding whether sensitive data is or is not present on the cloud (in other words, in the index), all without disclosing such information to facilitate the evaluation process itself.
  • the sensitive data never leaves the client premises 115; rather, the database 125 containing the sensitive data connects to the client-side agent 125, which performs Private Set Intersection (PSI) interactively with the cloud-supported PSI agent 105 (which, as noted above, preferably examines its index of the information stored in the cloud, rather than examining that entire set of information itself), thereby detecting, for example, whether sensitive data ⁇ elds or any API field that client users populate through a client application (not shown) from the client-side database 125 are present in any document or other object the cloud provider is permitted or allowed to access.
  • the cloud provider PSI manager (agent) 105 connects to the corpus repository 110 containing an indexed corpus.
  • the PSI protocol then is per- formed on the contents of the index.
  • This operation may include only a corpus speci ⁇ c to a particular client, or a broader corpus to which the client has access for sensitive information detection.
  • This embodiment allows clients to determine whether their sen- sitive information exists, even in a corpus to which they do not have full (or even any) access, a provider-owned or curated corpus.
  • the cloud provider-based PSI agent 105 integrates directly with APIs, performing PSI with the client's PSI agent in real-time to detect the passing of sensitive information in text fields as information enters the system.
  • the APIs thus allows the APIs to provide a real-time indication of apparent entry of sensitive data so that the client application can use the response service (or the like) to warn the client or the end user and/or redact the data before it is stored on the cloud.
  • the PSI interaction is carried out between the cloud provider and a trusted third party (e.g., law enforcement, an intelligence agency, a con- tracted security organization, company auditors, authorized partners, etc.), where the trusted third party has a legitimate interest in detecting the presence of certain sensitive information, e.g., in a cognitive system, typically on behalf of the client.
  • a trusted third party e.g., law enforcement, an intelligence agency, a con- tracted security organization, company auditors, authorized partners, etc.
  • the third party is not granted full access to the corpus or API, but still has a legitimate interest in detecting, for example, certain sensitive data (e.g., the names of persons of interest) in the cognitive system.
  • the access controls on the repository may be varied and will depend on the nature of the access limitation. Access controls may be role-based, user-based, or otherwise.
  • the technique of this disclosure provides signi ⁇ cant advantages. As has been de- scribed, the approach herein provides for a way to detect whether specific sensitive data of a client is present in a cloud computing infrastructure without requiring that data be shared with the cloud provider, or that the cloud provider provide the client access to all (or even any) data in the cloud.
  • each side of the communication preferably executes a PSI agent (tool), which is readily implemented in software.
  • a PSI agent typically is implemented in software, e.g., as a set of computer program instructions executed by one or more hardware processors.
  • a par- ticular tool may comprise any number of programs, processes, execution threads, and the like, together with appropriate interfaces and databases to support data used or created by the tool.
  • the tool may be configured or administered with a web-based front- end, via a command line, or the like.
  • the tool may include one or more functions that are implemented programmatically, or that interoperate with other computing entities or software systems via an application programming interface (API), or any convenient request-response protocol
  • API application programming interface
  • the described approach is preferably web- or cloud-based, thereby avoiding traditional installation and deployment issues that often accompany DLP systems.
  • the techniques provide for lightweight tooling (the client-server based PSI tool) to interact with the corpus (cloud-based) and the database (client-based) to detect potential sensitive data leakage.
  • the approach thus promotes simple and effective cross-organization collabora- tion with sufficient privacy to alleviate or ameliorate security concerns.
  • This subject matter may be implemented as-a-service.
  • the subject matter may be implemented within or in association with a cloud deployment platform system or appliance, or using any other type of deployment systems, products, devices, programs or processes.
  • the PSI tool and related response system functionality may be provided as a standalone function, or it may leverage functionality from other products and services.
  • a representative cloud application platform with which the technique may be imple- mented includes, without limitation, any cloud-supported application framework, product or service.
  • the techniques herein may be implemented as a management solution, service, product, appliance, device, process, program, execution thread, or the like.
  • the techniques are implemented in software, as one or more computer programs executed in hardware processing elements, in association with data stored in one or more data sources, such as a problems database.
  • Some or all of the processing steps described may be automated and operate autonomously in association with other systems.
  • the automation may be full- or partial, and the operations (in whole or in part) may be synchronous or asynchronous, demand-based, or otherwise.
  • These above-described components typically are each implemented as software, i.e., as a set of computer program instructions executed in one or more hardware processors.
  • the components are shown as distinct, but this is not a requirement, as the components may also be integrated with one another in whole or in part.
  • One or more of the components may execute in a dedicated location, or remote from one another.
  • One or more of the components may have sub-components that execute together to provide the functionality.
  • the applications on the data processing system provide native support for Web and other known services and protocols including, without limitation, support for HTTP, FTP, SMTP, SOAP, XML, WSDL, UDDI, and WSFL, among others.
  • SOAP, WSDL, UDDI and WSFL is available from the World Wide Web Consortium (W3C), which is responsible for developing and maintaining these standards; further information regarding HTTP, FTP, SMTP and XML is available from Internet Engineering Task Force (IETF).
  • the techniques described herein may be implemented in or in conjunction with various server-side architectures including simple n-tier architectures, web portals, federated systems, and the like. Still more generally, the subject matter described herein can take the form of an en- tirely hardware embodiment, an entirely software embodiment or an embodiment contain- ing both hardware and software elements.
  • the sensitive data detection service (or any component thereof) is implemented in software, which includes but is not limited to ⁇ rmware, resident software, microcode, and the like.
  • a computer-usable or computer-readable medium can be any apparatus that can contain or store the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or a semiconductor system (or apparatus or device). Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
  • the computer-readable medium is a tangible, non-transitory item.
  • the computer program product may be a product having program instructions (or program code) to implement one or more of the described functions. Those instructions or code may be stored in a computer readable storage medium in a data processing system after being downloaded over a network from a remote data processing system. Or, those instructions or code may be stored in a computer readable storage medium in a server data processing system and adapted to be downloaded over a network to a remote data processing system for use in a computer readable storage medium within the remote system.
  • the techniques are implemented in a special purpose computing platform, preferably in software executed by one or more processors.
  • the software is maintained in one or more data stores or memories associated with the one or more processors, and the software may be implemented as one or more computer programs.
  • this special-purpose hardware and software comprises the func- tionality described above. While the above describes a particular order of operations performed by certain em- bodiments of the invention, it should be understood that such order is exemplary, as alternative embodiments may perform the operations in a di ⁇ erent order, combine cer- tain operations, overlap certain operations, or the like.
  • a Private Search protocol may be used.
  • the corpus is indexed on the cloud and a check is performed to determine if one or more terms of interest to a requesting client are in the index.
  • the techniques herein provide for improvements to another technology or technical ⁇ eld, namely, data detection security analysis tools and systems, and cloud-based systems, as well as improvements to the functioning of automated sensitive data detection tools and methods.
  • Fig. 2 illustrates an example optimized two-round PSI protocol with a single element on the client side.
  • Figs. 3 and 4 depict example computer systems useful for implementing various embodiments described in the present disclosure.
  • Various embodiments may be imple- mented, for example, using one or more computer systems, such as computer system 500 shown in Fig. 3.
  • Computer system 500 may be used, for exam- ple, to implement any of the embodiments discussed herein, as well as combinations and sub-combinations thereof.
  • Computer system 500 may include one or more processors (also called central pro- cessing units, processing devices, or CPUs), such as a processor 504.
  • Processor 504 may be connected to a communication infrastructure 506 (e.g., such as a bus).
  • Computer system 500 may also include user input/output device(s) 503, such as mon- itors, keyboards, pointing devices, etc., which may communicate with communication infrastructure 506 through user input/output interface(s) 502.
  • processors 504 may be a graphics processing unit (GPU).
  • GPU graphics processing unit
  • a GPU may be a pro- cessor that is a specialized electronic circuit designed to process mathematically intensive applications.
  • the GPU may have a parallel structure that is e ⁇ cient for parallel process- ing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos, etc.
  • Computer system 500 may also include a main memory 508, such as random-access memory (RAM).
  • Main memory 508 may include one or more levels of cache.
  • Main memory 508 may have stored therein control logic (i.e., computer software, instructions, etc.) and/or data.
  • Computer system 500 may also include one or more secondary storage devices or secondary memory 510.
  • Secondary memory 510 may include, for example, a hard disk drive 512 and/or a removable storage device or removable storage drive 514.
  • Removable storage drive 514 may interact with a removable storage unit 518.
  • Removable storage unit 518 may include a computer-usable or readable storage device having stored thereon computer software (control logic) and/or data.
  • Removable storage drive 514 may read from and/or write to removable storage unit 518.
  • Secondary memory 510 may include other means, devices, components, instrumen- talities, or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 500. Such means, devices, components, instrumentalities, or other approaches may include, for example, a removable storage unit 522 and an interface 520.
  • Examples of the removable storage unit 522 and the interface 520 may include a program cartridge and cartridge interface, a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
  • Computer system 500 may further include communications interface 524 (e.g., network interface). Communications interface 524 may enable computer system 500 to communi- cate and interact with any combination of external devices, external networks, external entities, etc. (individually and collectively referenced as remote device(s), network(s), entity(ies) 528).
  • communications interface 524 may allow computer sys- tem 500 to communicate with external or remote device(s), network(s), entity(ies) 528 over communications path 526, which may be wired and/or wireless (or a combination thereof), and which may include any combination of LANs, WANs, the Internet, etc. Control logic and/or data may be transmitted to and from computer system 500 via communications path 526.
  • Computer system 500 may also be any of a personal digital assistant (PDA), desktop workstation, laptop or notebook computer, netbook, tablet, smartphone, smartwatch or other wearable devices, appliance, part of the Internet-of-Things, and/or embedded system, to name a few non-limiting examples, or any combination thereof.
  • PDA personal digital assistant
  • Computer system 500 may be a client or server computing device, accessing or hosting any applications and/or data through any delivery paradigm, including but not limited to remote or distributed cloud computing solutions; local or on-premises software ( ⁇ on- premise" cloud-based solutions); ⁇ as a service” models (e.g., content as a service (CaaS), digital content as a service (DCaaS), software as a service (SaaS), managed software as a service (MSaaS), platform as a service (PaaS), desktop as a service (DaaS), framework as a service (FaaS), backend as a service (BaaS), mobile backend as a service (MBaaS), infrastructure as a service (IaaS), etc.); and/or a hybrid model including any combination of the foregoing examples or other services or delivery paradigms.
  • ⁇ as a service” models e.g., content as a service (CaaS), digital content as a service (DCaa
  • Fig. 4 illustrates an example machine of a computer system 900 within which a set of instructions, for causing the machine to perform any one or more of the operations discussed herein, may be executed.
  • the machine may be connected (e.g., networked) to other machines in a LAN, an intranet, an extranet, and/or the Internet.
  • the machine may operate in the capacity of a server or a client machine in a client-server network environment, as a peer machine in a peer-to-peer (or distributed) network environment, or as a server or a client machine in a cloud computing infrastructure or environment.
  • the machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a net- work router, a switch or bridge, a specialized application or network security appliance or device, or any machine capable of executing a set of instructions (sequential or other- wise) that specify actions to be taken by that machine.
  • PC personal computer
  • PDA Personal Digital Assistant
  • STB set-top box
  • a cellular telephone a web appliance
  • server a net- work router, a switch or bridge, a specialized application or network security appliance or device, or any machine capable of executing a set of instructions (sequential or other- wise) that specify actions to be taken by that machine.
  • ⁇ machine shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
  • the example computer system 900 includes a processing device 902, a main memory 904 (e.g., read-only memory (ROM), flash memory, dynamic random-access memory (DRAM) such as synchronous DRAM (SDRAM), etc.), a static memory 906 (e.g., flash memory, static random-access memory (SRAM), etc.), and a data storage device 918, which communicate with each other via a bus 930.
  • Processing device 902 represents one or more processing devices such as a micropro- cessor, a central processing unit, or the like.
  • the processing device may be complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) micropro- cessor, or processor implementing other instruction sets, or processors implementing a combination of instruction sets.
  • Processing device 902 may also be one or more special- purpose processing devices such as an application-speci ⁇ c integrated circuit (ASIC), a field-programmable gate array (FPGA), a digital signal processor (DSP), network pro- cessor, or the like.
  • the processing device 902 is configured to execute instructions 926 for performing the operations and steps discussed herein.
  • the computer system 900 may further include a network interface device 908 to communicate over the network 920.
  • the computer system 900 also may include a video display unit 910, an alphanumeric input device 912 (e.g., a keyboard), a cursor control device 914 (e.g., a mouse), a graphics processing unit 922, a signal generation device 916 (e.g., a speaker), graphics processing unit 922, video processing unit 928, and audio processing unit 932.
  • the data storage device 918 may include a machine-readable medium 924 (also known as a computer-readable storage medium) on which is stored one or more sets of instruc- tions 926 (e.g., software instructions) embodying any one or more of the operations de- scribed herein.
  • the instructions 926 may also reside, completely or at least partially, within the main memory 904 and/or within the processing device 902 during execution thereof by the computer system 900, where the main memory 904 and the processing device 902 also constitute machine-readable storage media.
  • the instructions 926 include instructions to implement operations and functionality corresponding to the disclosed subject matter.
  • the machine-readable storage medium 924 is shown in an example implementation to be a single medium, the term ⁇ machine-readable storage medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions 926.
  • ⁇ machine-readable storage medium shall also be taken to include any medium that is capable of storing or encoding a set of instructions 926 for execution by the machine and that cause the machine to perform any one or more of the operations of the present disclosure.
  • the term ⁇ machine-readable storage medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.
  • Such a computer program may be stored in a computer-readable storage medium, such as but not limited to, any type of disk including ⁇ oppy disks, opti- cal disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
  • ROMs read-only memories
  • RAMs random access memories
  • EPROMs EPROMs
  • EEPROMs electrically erasable programmable read-only memory
  • magnetic or optical cards or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
  • the present disclosure is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.
  • the present disclosure may be provided as a computer program product, or software, that may include a machine-readable medium having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure.
  • a machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer).
  • a machine-readable (e.g., computer-readable) medium includes a machine (e.g., a computer) readable storage medium such as read-only memory ( ⁇ ROM”), ran- dom access memory ( ⁇ RAM”), magnetic disk storage media, optical storage media, flash memory devices, etc.
  • a tangible, non-transitory apparatus or article of manufacture comprising a tangible, non-transitory computer useable or readable medium having con- trol logic (software) stored thereon may also be referred to herein as a computer program product or program storage device.
  • embodiments have signi ⁇ cant utility to fields and applications beyond the examples described herein.
  • Embodiments have been described herein with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof.
  • the boundaries of these functional building blocks have been arbitrarily de ⁇ ned herein for the convenience of the description. Alternate boundaries can be de ⁇ ned as long as the specified functions and relationships (or equivalents thereof) are appropriately performed.
  • alternative embodiments can perform functional blocks, steps, operations, methods, etc. using orderings di ⁇ erent than those described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)

Abstract

Techniques for amortizing the cost of multiple rate-1 oblivious transfers are disclosed. Specifically, based on standard pairing assumptions, a two-message rate-1 OT protocol is described for which the amortized cost per string-oblivious transfer is asymptotically reduced to only four group elements. The results lead to significant communication improvements in private set intersection and private information retrieval.

Description

Memory and Communications E^cient Protocols for Private Data Intersection FIELD OF THE INVENTION The disclosure relates to oblivious transfer protocols with signi^cant reductions in com- munication and processing overhead, including applications for private set intersection. BACKGROUND OF THE INVENTION Private set intersection (PSI) enables two parties, each holding a private set of elements, to compute the intersection of the two sets while revealing nothing. PSI and its variants have found many real-world applications including online advertising, password breach alert, mobile private contact discovery, and privacy-preserving contact tracing. In the recent years, there has been tremendous progress made towards realizing PSI e^ciently in various settings, including Di^e-Hellman-based, RSA-based, OT-extension-based, FHE- based, circuit-based, and Vector-OLE-based approaches. One might wonder about the di^erence between amortized rate-1 OT and OT ex- tension. The primary goal of OT extension is to minimize the number of public-key operations: Performing n := n(^) OTs at the cost of doing a fewer, ^, number of OTs and some private key operations. On the other hand, we are concerned with amortiz- ing receiver communication for rate-1 OT; doing t rate-1 OTs, but in a way that the receiver total communication is less than the sum of t individual rate-1 OT executions. OT extension techniques do not provide this feature. Moreover, OT extension techniques destroy the rate-1 property of the sender. For example, Beaver's protocol, which is round preserving, results in sender's OT protocol messages which are larger than |m0| + |m1|, where (m0;m1) is the sender's initial input pair. Most of the existing approaches require the communication complexity to grow with the size of the larger set, the only exception being an FHE-based protocol (where com- munication grows linearly in the receiver set and logistically in the sender set) and an RSA-based protocol (where the receiver has the bigger set and the communication grows linearly in the smaller, sender set). We consider a dual setting, meaning that in our case the receiver has the smaller set. In many real-world applications such as password breach alert and mobile private contact discovery we need to perform unbalanced PSI between a constrained device (e.g. cellphone) holding a small set and a service provider holding a large set, thus having communication grow the larger set (especially the sender set) is a big concern. Herein is presented unbalanced PSI with communication complexity linear in the size of the receiver set and logarithmic in the sender set. Furthermore, our approach is easily adapted to PSI with advanced functionalities such as PSI-Cardinality, PSI-Sum, PSI-Test, etc., which could only be achieved from Di^e-Hellman-based or circuit-based approaches. Oblivious transfer (OT) is a foundational primitive in cryptography. We are interested in two-message OT protocols between: (i) a receiver with an input bit b who sends the ^rst message otr of the protocol, and (ii) a sender with input two (equal length) strings m0;m1 who sends the second message ots. Correctness requires that at the end of execution, the receiver should learn mb, while security requires that the receiver does not learn m1−b and that the sender does not learn the bit b. Over the years, signi^cant progress has been made in constructing two-message OT protocols, either from general assumptions, or from speci^c assumptions but with enhanced security/functionality/efficiency, such as OT based on DDH, CDH, factoring related and LWE. We are interested in constructing rate-1 two-message OT protocols. We say that an OT protocol is rate-1 if the ratio
Figure imgf000004_0001
approaches 1, as n grows. Rate-1 OT enables powerful applications such as (i) semi-compact homomorphic encryption for branching programs (where the ciphertext grows only with the depth but not the size of the program) as well as (ii) communication-e^cient private-information retrieval (PIR) protocols. The rate-1 property is crucial in realizing these applications, allowing a sender to compress a large database for a receiver who is interested only in a small portion of it. To give some intuition, suppose we want to use a rate-1 OT to implement a 1-out-of- 4 OT for a sender with four elements m := (m00;m01;m10;m11). Thinking about the corresponding binary tree, the receiver on an input uw ∈ {0; 1}2 will send two messages otr and otr, the ^rst one for choice bit u and the second one for w. The sender will use otr once against (m00;m01) and once against (m10;m11) to get two outgoing messages ots0 and ots1. The receiver is only interested in otsu, but the sender does not know which one it is | sending both will be costly. So, the sender compresses (ots0; ots1) using otr, allowing the receiver to learn otsu, and consequently muw. The above construction employs a self-eating process, where a pair of ots messages is used as the sender input for the next OT, and so on. Employing a low rate 1-out-of-2 OT to build 1-out-of-n OT will blow up the communication, falling short for PIR. To see this, suppose |ots| ≥ 2|m0|, as is the case with most 1-out-of-2 OT protocols. Then, if n = 2k, as the sender packs up the tree from bottom-up, in each OT invocation the size of the resulting ots message (which either packs two previous ots messages, or two leaf messages) doubles, resulting in a ^nal message of size at least O(2ku), where u is the size of each initial individual message of the sender. While the protocol is a 1-out-of-n OT, it is not a sublinear PIR, because the size of the sender's protocol message is not sublinear in its total input size, nu. Moreover, as we will see later, in some applications involving branching programs, such as Private Set Intersection (PSI) with unbalanced set sizes, the sender will need to pack a tree of depth polynomial in the security parameter (as opposed to logarithmic size as in PIR), so using low rate 1-out-of-2 OT will result in an exponential size blow-up. An overlooked aspect of rate-1 OT is the receiver communication cost. This is an important metric because the self eating process involve producing many otr messages (proportional to the depth of the tree/program), and hence sending a fresh otr for each depth results in large ^rst-round messages. In one known DDH-based rate-1 OT con- struction, for a sender with (m0 ∈ {0; 1}n;m1 ∈ {0; 1}n), the receiver should send a linear (O(n)) number of group elements for each bit of the sender, resulting in overall O(n2) group elements. This incurs high receiver communication in the respective applications. Other solutions have obtained rate-1 OT for which otr consists of only a linear O(n) number of group elements in total, as opposed to O(n2). One limitation of the prior work is that it only improves the communication efficiency of the base rate-1 OT, but still requires the receiver to send a fresh otr message for each new OT execution. Most applications of rate-1 OT require executing it multiple times, resulting in large communication costs for the receiver. This constitutes a prohibitive overhead for the receiver in applications in which the depth of the branching program is large, and the receiver needs to engage with a sender holding a branching program BP on many di^erent inputs x1; : : : ; xn (e.g., PSI). Addressing this communication bottleneck is desired and herein is described a new primitive that called receiver-amortized (or amortized, for short) rate-1 OT. BRIEF SUMMARY OF THE INVENTION We introduce a new technique for amortizing the cost of multiple rate-1 OTs. Specifically, based on standard pairing assumptions, we obtain a two-message rate-1 OT protocol for which the amortized cost per string-OT is asymptotically reduced to only four group elements. Our results lead to signi^cant communication improvements in PSI and PIR, special cases of SFE for branching programs. 1. PIR: We obtain a rate-1 PIR scheme with client communication cost of O(λ · logN) group elements for security parameter ^ and database size N . Notably, after a one-time setup (or one PIR instance), any following PIR instance only requires communication cost O(logN) number of group elements. 2. PSI with unbalanced inputs : We apply our techniques to private set intersection with unbalanced set sizes (where the receiver has a smaller set) and achieve receiver communication of O((m+^) logN) group elements where m;N are the sizes of the receiver and sender sets, respectively. Similarly, after a one-time setup (or one PSI instance), any following PSI instance only requires communication cost O(m·logN) number of group elements. All previous sublinear-communication non-FHE based PSI protocols for the above unbalanced setting were also based on rate-1 OT, but incurred at least O(λ2m logN) group elements. In various embodiments, a computer-implemented method, computing device, and computer-readable storage media are disclosed. In one example embodiment for deter- mining if a remote cloud service contains a certain data element without exposing the data element to the cloud service, the computer-implemented method, computing device, and computer-readable storage media can comprise: storing on a remote server a data set X of N elements; storing on a client a single data element y, wherein all of the elements in X and y are lambda-bit strings; at the client: establishing g as a cryptographically secure hash function; executing the cryptographically secure hash function on the single data element y to generate a hash result b; computing a client message of a private set intersection protocol with the single data element y, and computing a client message of private information retrieval with query b, transmitting the client message of a private set intersection protocol and the client message of private information retrieval to the remote server; at the remote server: computing hashes of all N elements of data set X using the secure hash function g; partitioning the N elements of data set X into multiple sets based on the computed hashes, such that each partition represents a unique hash value; adding dummy elements to each partition to make them the same size; for each partition, generating a server response for the private set intersection protocol using the corresponding partition as the server input; computing a server message of private in- formation retrieval using the generated server responses for the private set intersection protocol as input, and; transmitting the server message of private information retrieval to the client; at the client: computing a client output of the private information retrieval protocol to compute output z; computing a client output of the private set intersection protocol on input z to determine whether y is in X; and outputting the result of the determination of whether y is in X to a user at the client. In further embodiments, the private set intersection protocol is a two-round safe function evaluation where the client input is x and server input is X, and at the end of the protocol the client learns whether x is in X. In further embodiments, the method makes only a limited use of expensive cryptographic group operations, wherein the number of operations is smaller than the size of X. In further embodiments, the private information retrieval is a two-round safe function evaluation where the client input is index i and server input is X and at the end of the protocol the client learns the i-th element of X. In further embodiments, the method uses limited communication, wherein the total number of bits sent over a channel is smaller than the size of X. In further embodiments, the hash function g is selected to have an output size that is optimized for communication efficiency or to minimize use of expensive cryptographic group operations, such that a larger output size has improved communication efficiency and a smaller output size has reduced computation cost and less communication efficiency, wherein the efficiency is measured by the number of bits exchanged over a channel. In further embodiments, the data set X is provided by the client, or the data set X is provided by a third-party, or the data set belongs to the remote cloud service. In further embodiments, the data set X of N elements represents aggregated password data, and the single data element y represents a client password. In further embodiments, the data set X of N elements represents aggregated image information, and the single data element y represents a client image. In further embodiments, the data set X of N elements represents aggregated contact list or personal information, and the single data element y represents an instance of client contact information. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included to provide further understanding and are incorporated in and constitute a part of this specification, illustrate disclosed embod- iments, and together with the description, serve to explain the principles of the disclosed embodiments. In the drawings: Fig. 1 illustrates an example embodiment of the present invention for a distributed computing environment for performing private data intersection. Fig. 2 illustrates an example optimized two-round PSI protocol with a single element on the client side. Fig. 3 illustrates an example computer system architecture for implementing the claimed systems and methods. Fig. 4 illustrates further details of an example computer system architecture for implementing the claimed systems and methods. DETAILED DESCRIPTION We put forth a cryptographic primitive that we call amortized rate-1 OT, and show how to realize it using standard assumptions on bilinear groups. As applications we obtain significant efficiency improvements, shaving a factor of poly(λ) off the receiver communication in various protocols involving secure branching program computation (e.g., unbalanced PSI). An amortized rate-1 OT breaks up the computation of a receiver into an o^ine and online phase. The offline phase is performed by the receiver once and for all, prior to receiving any choice bits. Speci^cally, we have an algorithm PreP(1λ), run by the receiver, which outputs a private state str for the receiver, and a reusable parameter prm. Next, we have an algorithm OT1 run by the receiver on a choice bit b to obtain
Figure imgf000007_0001
A sender with messages m := (m0 ∈ {0; 1}n;m1 ∈ {0; 1}n) runs OT2((prm; otr);m) to obtain ots. Finally, the receiver can recover mb by running OT3(str; ots). One notable aspect is that the state str used by OT1 and OT3 is the same as the initial state outputted by
Figure imgf000008_0002
e | the state is not updated as a result of OT1 executions. This property is in fact exploited in some of our applications, such as PSI cardinality. Also, the message prm is reused across all communications, so the receiver may send it only once. We specify the following properties: 1. Sender rate-1 communication: |ots| = n+ poly(^), where poly is a ^xed polynomial (e.g., the size of a group element) independent of how large n is. 2. Receiver non-reusable compactness: |otr| = poly(λ), where poly(λ) is independent of n. 3. Receiver privacy: We specify indistinguishability security for the receiver against adaptive adversaries. If
Figure imgf000008_0001
, an adaptive adversary who is given prm and who sends many pairs of choice bits in an adaptive fashion cannot determine whether his received otr messages (all made relative to str) were built using the ^rst choice bits or the second choice bits of his submitted pairs. Notice that since otr messages are all produced based on the same private state str, we should give the adversary the ability to submit many pairs. 4. Sender privacy: Standard indistinguishability security against honest receivers. For applications involving non-oblivious branching programs we need to strengthen sender privacy. For oblivious branching programs, from which all our applications are obtained, the stated requirement su^ces. Assuming an SXDH-hard bilinear map e :
Figure imgf000008_0003
on prime-order groups, we give a construction of amortized rate-1 OT in which prm consists of O(n2) group elements in G1 and otr consists of 4 group elements in G2. Recall that the SXDH assumption (Symmetric External Diffie-Hellman) states that both G1 and G2 are DDH hard. Our construction is based on a new re-randomization trick that allows us to obtain a structured matrix, as required for rate-1 OT, from a reusable initial matrix and and a re-randomizing term involving four group elements. The above reusable parameter prm is still quite large, even though it can be amortized among many OT executions. We show by relying on a stronger assumption on G1, called 2n-power-DDH, we can make prm consist only of O(n) group elements in G1. We achieve this by relying on a sliding window technique that implicitly builds a Toeplitz matrix in the exponent using a linear number of group elements. The t-power DDH assumption a at says the distribution (g; g ; : : : ; g ) is pseudorandom. For performing t rate-1 OTs where the size of each message of the sender is n, our receiver communication consists of O(n2) reusable group elements in G1 and 4t group ele- ments in G2, relying on SXDH. Assuming power DDH on G1 the receiver communication becomes O(n) group elements in G1 and 4t group elements in G2. In comparison, the √ most receiver compact bilinear SXDH-based rate-1 OT involves sending O
Figure imgf000009_0003
both in √ G1 and G2. As descreibed herein, in many applications of rate-1 OT, we have
Figure imgf000009_0004
allowing us to cut o^ large multiplicative polynomial factors from the receiver communi- cation. We only include receiver communication, since the sender communication in all these protocols is the same (rate-1 for each instance of the OT). Our results allow us to realize SFE for branching programs with signi^cantly lower receiver communication. To illustrate our improvements, we ^rst review the concept of branching programs. A deterministic k-bit input branching program BP is a directed acyclic graph, where every leaf node has a label 0 or 1 (reject or accept), and every non- leaf node v has a label lb(v)
Figure imgf000009_0001
. The root node is labeled with 1. Every non-leaf node has two outgoing edges labeled 0 and 1. An input x
Figure imgf000009_0002
induces a unique computation path from the root to a leaf node, where the computation from a node v will branch out to one of its two children depending on the value of xi, where i = lb(v). We say BP(x) = b if the underlying computation path ends in a b-labeled leaf node. The size of a branching program is the number of nodes, and the depth, `, is the length of the longest path. A branching program is oblivious if ^ = ` and if all nodes at level i (where the root is considered level 1) are labeled i. The standard de^nition of oblivious branching programs is more general than what we give here, but we stick to our own de^nition since it captures our application needs. As an example, consider a client who wants to know whether her input x ∈ {0; 1}λ is in the set D ⊂ {0; 1}λ of a server. This reduces to evaluating an oblivious branching program PSI on x where PSI is constructed as follows: for every string a ∈ {λ}∪{0; 1}∪ · · · {0; 1}λ such that a is a pre^x of a string in D, we put a node va in the graph. We designate v^ as the root node, and all va such that a ∈ {0; 1}λ as accept leaf nodes. The label of a node va for |a| < λ is lb(va) = |a| + 1. For a node va, for |a| < λ, and for b ∈ {0; 1}, if a node vab exists, we put a b-labeled edge from va to vab; otherwise, we create a new reject leaf node and put a b-labeled edge from va to this node. The depth of PSI is ^ and its size is O(λ|D|). Now if a client wants to learn the intersection of her set S = {x1; : : : ; xm} with D, she needs to learn the values of all PSI(xi) for i ∈ [m], leading to m evaluations of PSI. Shorter client communication for PSI. Prior works have considered a construction of SFE for branching programs from rate-1 OT, where, for an oblivious branching program BP of depth d, the receiver sends d otr messages, each prepared for a sender whose input messages are of size O(d^). Returning to the PSI problem for a client with set S = {x1; : : : ; xm} and a server with set D, we need to evaluate the oblivious branching program PSI m times. Recall that the depth of PSI is λ. Hence, setting t = mλ and n = λ2, our PSI-client communication consists of O(mλ) non-reusable group elements in G2 (in either SXDH or power-DDH cases) and O(λ4) reusable group elements in G1 (in the case of SXDH), and O(λ2) reusable group elements in G1 (in the case of bilinear power DDH). In contrast, other work results in O(mλ4) group elements in both G1 and G2. Thus, we drop a multiplicative factor of m by relying on the same SXDH assumption, and a factor of mλ2 by relying on bilinear power DDH. The results of prior work give O(mλ3) group elements for the receiver using (pairing-free) power DDH. This is again significantly larger than what we achieve. Herein, we describe some PSI optimization techniques that further reduce the client communication, replacing a multiplicative factor of λ with logN , where N = |D|. These techniques may be of independent interest. Also disclosed are more applications, involving PSI/PIR. SFE for non-oblivious branching programs. Prior work shows how to realize SFE for non-oblivious branching programs (in which at any given level the program might branch over several variables, not known to the receiver) by relying on a stronger sender privacy notion for the underlying rate-1 OT. Informally, the stronger property requires that a sender's response message should hide the previous protocol message of the receiver, even for the receiver herself. Herein, we show that simple variants of our amortized rate- 1 OT satisfy the stronger sender security requirement, without a^ecting the efficiency parameters. All our applications are obtained based on oblivious branching programs, however. 1 Technical Overview One tool used in our constructions is a compressed version of n-bit packed ElGamal en- cryption. A secret key is an n-bit tuple of exponents sk and the public key is pk Given pk we can encrypt an n-bit mes- sage
Figure imgf000010_0008
We have two additional algorithms Shrink and ShrinkDec, where Shrink(pk; ct) shrinks ct ∈ Gn+1 to obtain Shrink(ct) → We have shrinking correctness:
Figure imgf000010_0009
Figure imgf000010_0007
Approach. Let G be a group of prime order p with a generator g. We let
Figure imgf000010_0006
denote a vector which has g in its ith position, and the identity element 1 everywhere else. The receiver on a choice bit b samples
Figure imgf000010_0003
and for every i ∈ [n] samples
Figure imgf000010_0005
and sets where denotes entry-wise exponentiation, and (·) denotes
Figure imgf000010_0001
Figure imgf000010_0002
entry-wise group multiplication. She sends otr
Figure imgf000010_0004
to the sender.
Figure imgf000010_0010
et ( 0; 1) {0; } n be a vector concatenating the two strings of the sender. Let where we overload the (·) notation
Figure imgf000010_0011
∏ to define
Figure imgf000011_0001
we have (g; g1 ; : : : ; g′ n) ∈ Enc(pk;mb), where Enc denotes n-bit packed ElGamal. With this in mind, the sender sends ots = Shrink(ct) to the receiver, and the receiver, who has sk := (^1; : : : ; ^n) can recover mb as ShrinkDec(sk; ots). We have ots ∈ G×{0; 1}^+n, so the OT is sender rate-1. In the above, each vector
Figure imgf000011_0002
is a
Figure imgf000011_0013
^ exponentiation of
Figure imgf000011_0003
but with a bump on its (n+ ib)'s location: namely, we multiply its (n+ b)'s location by g. SXDH. We now give a new technique based on pairings that allows us to produce many bumpy vectors
Figure imgf000011_0004
's in the target group, using only 4 group elements and a reusable initial parameter in the source groups. The receiver samples 2n vectors
Figure imgf000011_0005
and let M contain all these vectors in the exponent in G1, namely
Figure imgf000011_0006
Receiver's non-reusable messages. To send a short otr message for a choice bit b, the receiver samples two random vectors
Figure imgf000011_0007
The receiver sends otr
Figure imgf000011_0008
Sender's protocol messages. Given prm and otr
Figure imgf000011_0009
the sender uses the pairing to computes the inner product of with all the vectors in the left-hand side of and the inner product of with all the vectors in the right-hand side o
Figure imgf000011_0010
f the ; ; ; That is, using the notation above, letting
Figure imgf000011_0011
the sender will compute
Figure imgf000011_0012
Figure imgf000012_0001
The sender has now built
Figure imgf000012_0002
( ; ) that satis^es the bump structure explained in the ^rst paragraph. Namely, think of the ith row of
Figure imgf000012_0003
as e i in that paragraph. Moreover, the receiver knows all the underlying exponent values sk := (p1; : : : ; pn). Now the sender can perform the step explained in the ^rst paragraph to send a rate-1 message ots, and the receiver will be able to use sk to decrypt it to obtain mb. Notice that the protocol has rate-1 sender communication, and that otr consist of only 4 group elements in G2. To argue about receiver privacy, let us, for simplicity, argue that an adversary A cannot distinguish between a world in which otr always encrypts the bit 0 from a world in which otr encrypts 1; the proof for the case where the adversary can submits adaptively- chosen pairs of choice bits will be similar. We should show that A for a random pair ([f~ ]2; [~h]2) of vectors cannot tell which one is orthogonal to ~u and which one has inner product one. This should be argued in the presence of prm, known to A. We will ^rst remove the presence of ~u from prm, relying on DDH for G1. Let prm be the same as prm but with u ~ removed. By DDH, (u~; prm) ≡ c (u~; prm). If we want to replace prm with prm for A, we should be able to reply to A's subsequent OT1 queries. The reason this can be done is because OT1 responses are produced based on only ~u and the underlying choice bit, and ~u is included in both distributions. Thus, we can remove u ~ from the prm view of A. Once this is done, we will then show that the entire otr view of A can be simulated without knowing ~u, but by knowing a pair of vectors (~v;w ~) where ~v is orthogonal to u ~ and w ~ has inner product one with ~u. In particular, to sample from OT1(str; b), we return (k1~v + (1− b)w~; k2~v + bw~), where k1 and k2 are random exponents. Next we show that the distribution of (~v;w ~) is identical to uniformly random vectors. This can be argued because information about ~u has been already removed from prm. Finally, we rely on DDH for G2 to show that by using a random (~v;w ~) in the above simulation, the entire otr view of A will be pseudorandom, masking the value of the choice bit b. Bilinear Power DDH. We sketch how to adapt our cancellation technique to a sliding window setting to reduce the size of prm into a linear number of group elements. The receiver samples a random exponent a and a vector ~r ←− $ Z2 p and sets ( [a~r]1; [a2~r]1; · · · ; [a2 ) M := n~r]1 where k is a
Figure imgf000013_0001
The receiver samples a non-reusable message for a choice bit b exactly as in the SXDH case | by sampling it based on
Figure imgf000013_0002
A sender given (prm; otr) builds n vectors
Figure imgf000013_0003
as follows. For
Figure imgf000013_0004
Figure imgf000013_0005
denotes the elements in positions i all the way up to j. Once the vectors
Figure imgf000013_0006
are formed, the sender will proceed exactly like the SXDH case. Correctness will then follow. The proof of receiver privacy follow similarly to the SXDH case, but we should replace DDH with power DDH in the appropriate places. We omit the details. 2 Preliminaries and De^nitions We use λ for the security parameter. We
Figure imgf000013_0007
for computational and statistical indistinguishability, respectively. We let
Figure imgf000013_0008
denote that two distributions are identical. For a distribution S we use
Figure imgf000013_0009
to mean x is sampled according to S and use
Figure imgf000013_0010
to mean y ∈ sup(S), where sup denotes the support of a distribution. For a set S we overload the notation to use
Figure imgf000013_0012
to indicate that x is chosen uniformly at random from S. If A
Figure imgf000013_0011
is a randomized algorithm, then
Figure imgf000013_0017
, for deterministic inputs a1; : : : ; an, denotes the random variable obtained by sampling random coins r uniformly
Figure imgf000013_0013
De^nition 2.1 (Pairings and SXDH hardness). A bilinear map is given by
Figure imgf000013_0014
, where p is a prime number and is the order of G1, G2 and GT , and g and h are random generators of G1 and G2, respectively. The function e is a non-degenerate map, satisfying
Figure imgf000013_0015
for all exponents a and b. The Symmetric External Di^e-Hellman (SXDH) assumption says G1 and G2, sampled as above, are DDH-hard.
Figure imgf000013_0016
Inner product with integer vectors. Given
Figure imgf000014_0001
Amortized Rate-1 OT: De^nition We de^ne our new notion of amortized rate-1 OT, which allows a receiver to reuse part of her protocol message across many independent OT executions. In the de^nition below, think of n as the maximum size of each input message of a sender. The receiver will generate a reusable parameter prm, based on n, which will allow her later to send a short protocol message otr whenever she wants to perform a new OT. The sender will use (prm; otr) to complete an OT transfer for any pair of messages
Figure imgf000014_0002
Figure imgf000014_0003
Figure imgf000014_0004
Takes as input a security parameter
Figure imgf000014_0014
and n, denoting the maximum length of each of the sender's messages, and outputs a private state str and a reusable message prm. •
Figure imgf000014_0005
Takes as input a security parameter
Figure imgf000014_0013
and a choice bit
Figure imgf000014_0012
and outputs a a protocol message otr. We refer to otr as a fresh receiver's message, to distinguish it from the reusable message prm.
Figure imgf000014_0006
ots: Takes as input a reusable message prm, a fresh message otr and a pair of messages
Figure imgf000014_0011
and outputs ots. Takes as input a private state str and ots and outputs m ∈
Figure imgf000014_0007
We require •
Figure imgf000014_0008
• Rate-1 sender communication: There exists a ^xed polynomial poly such that for all n and where ots is formed as above.
Figure imgf000014_0010
• Receiver amortized compactness: The length of otr is independent of n. There exists a ^xed polynomial poly such that for all polynomials n where otr is formed as above.
Figure imgf000014_0009
• Receiver privacy: An adaptive sender cannot determine the choice bits of a re- ceiver. Any PPT adversary A has at most 1=2 + negl( λ) advantage in the follow- ing game. The challenger samples
Figure imgf000015_0001
and (str; prm)
Figure imgf000015_0002
( ; ) and gives prm to A. Then, A adaptively submits queries (s0; s1) ∈ {0; 1}2, and receives OT1(str; sb). A has to guess the value of b. Sender privacy Notice that De^nition 2.2 does not impose any sender security re- quirements. The reason for this is that sender security can be generically realized for rate-1 OT using known techniques, as described below. Let poly be the polynomial defined in the rate-1 sender property of De^nition 2.2. The new sender on a pair of messages (m0;m1) ∈ {0; 1}n × {0; 1}n samples two seeds (r0; r1) whose length is su^- ciently larger than poly(^) but independent of n. The sender sends (ots ′ 1; ots 2) to the re- ceiver, where (prm; otr)) and (prm; otr)), where
Figure imgf000015_0007
and and Ext is a randomness ex- tractor. The protocol is still sender rate-1. It now provides computational sender privacy against honest receivers: This is because given ots 1 the value of Ext(r1−b) is statistically close to uniform, where b is the receiver's choice bit. Finally, we mention that we may modify our constructions so that they achieve sender privacy for free, without using the above generic randomness extraction method. 3 Amortized Rate-1 OT from SXDH Our amortized rate-1 OT protocol makes use of a shrinking algorithm, that allows one to shrink ciphertexts of ElGamal encryption, as long as the underlying plaintexts are coming from a small space, say, {0; 1}. An n-bit packed ElGamal encryption has a secret key sk := (x1; : : : ; xn) and a public key pk . Given pk := (g; g1; : : : ; gn) we can encrypt an n-bit message Enc(m1; : : : ;mn) as ct : We have
Figure imgf000015_0006
a shrinking procedure for n-bit ElGamal encryption that will shrink a ciphertext into one group element plus n bits, while allowing for e^cient decryption. The procedure below enables perfect decryption correctness, improving upon the previous procedures that had a decryption error. Lemma 3.1. There exists a pair of (expected) PPT algorithms (Shrink, ShrinkDec) such that if (pk; sk) is as above and c
Figure imgf000015_0005
is a packed ElGamal ciphertext encrypting a message m ∈ {0; 1}n,
Figure imgf000015_0003
(2) Pr [ShrinkDec(sk; Shrink(ct)) = m] = 1. Our amortized rate-1 OT makes us of the following procedure OrthSam that given a vector
Figure imgf000015_0004
, samples two random vectors and such that
Figure imgf000016_0001
= 1, and it outputs these two vectors in a shu^ed order based on the value of b. Definition 3.2. The algorithm OrthSam works as follows It Samples random vectors
Figure imgf000016_0003
; such that
Figure imgf000016_0004
, and returns
Figure imgf000016_0002
where
Figure imgf000016_0005
We now present our construction. For notational clarity, we assume the size of each message of the sender is exactly n, as opposed to an arbitrary value n1 ≤ n. Adapting the construction to work with respect to varying lengths for the sender messages will be immediate. Construction 3.3 (Amortized rate-1 OT: SXDH). Build OT := (PreP;OT1;OT2;OT3) as follows.
Figure imgf000016_0006
3. Return private state str
Figure imgf000016_0007
reusable message.
Figure imgf000016_0008
Figure imgf000017_0001
Rate-1 sender communication and receiver amortized compactness. We have
Figure imgf000017_0022
Receiver Privacy In the following we say a vector
Figure imgf000017_0020
is non-orthogonal to
Figure imgf000017_0021
1. This is an abuse of terminology (because non-orthogonality refers to any non-zero inner product), but we stick to it below. To prove receiver OT security, we should argue that a fresh receiver protocol message otr does not reveal the receiver's underlying choice bit. The main di^culty is that all otr values depend on the vector
Figure imgf000017_0023
u. The core of our argument is in showing that the vector
Figure imgf000017_0017
remains hidden in the following sense. Given a sequence of
Figure imgf000017_0019
an adversary cannot determine the order of orthogonality/non-orthogonality in any given pair, with respect to
Figure imgf000017_0018
. To this end, we will ^rst remove u from all vectors
Figure imgf000017_0016
given in Equation 6. Once
Figure imgf000017_0015
is removed from the reusable message prm, we will then show any receiver's future fresh message otr may be simulated by the underlying choice bit b and a pair of vectors
Figure imgf000017_0013
which are orthogonal/non-orthogonal to
Figure imgf000017_0010
, in a way that if the joint distribution of (
Figure imgf000017_0014
; ) is pseudorandom, then the entire simulated view will be pseudorandom as well, masking the choice bits. We will then show that the distribution of a random
Figure imgf000017_0012
subject to them being orthogonal/non-orthogonal to a random
Figure imgf000017_0011
is uniformly random. Taken all together, receiver security will follow. De^nition 3.4 (Distribution Dual). For
Figure imgf000017_0006
p the distribution Dual( ) returns
Figure imgf000017_0008
where and
Figure imgf000017_0009
are sampled uniformly subject to
Figure imgf000017_0007
v;u = 0 and
Figure imgf000017_0005
We now describe a way of simulating messages otr, for a given choice bit b, without knowing , but by knowing a pair
Figure imgf000017_0024
sampled according to
Figure imgf000017_0025
. Definition 3.5 (Simulator Sim). The algorithm
Figure imgf000017_0003
) samples
Figure imgf000017_0004
and returns
Figure imgf000017_0002
4 Amortized Rate-1 OT from Bilinear Power DDH We show how to shorten the reusable parameter using the circulant structure imposed by power-DDH assumptions. We assume G2 is DDH-hard, and G1 is m-power-DDH hard, meaning that
Figure imgf000018_0001
is pseudorandom. We will need to set m = O(n), where n is the bit length of each of the sender's messages. Concretely, m = 3n− 1 suffices construction 4.1 (Amortized rate-1 OT: Bilinear Power DDH). OT := (PreP;OT1;OT2;OT3) is built as follows.
Figure imgf000018_0002
3. Return private state
Figure imgf000018_0003
and reusable message
Figure imgf000018_0004
Figure imgf000018_0005
5 Optimization In this section, we discuss some techniques to improve the concrete computational e^- ciency and lower the communication cost in amortized rate-1 OT. These optimizations work for both the basic amortized rate-1 OT from bilinear SXDH and the sliding-window construction from bilinear power DDH. In Section 6 when we describe the applications of amortized rate-1 OT, we will discuss further optimizations speci^c to these applications. Delayed Pairing Recall that when the sender computes her response message, she needs to compute the hash-key vector
Figure imgf000019_0002
, which requires 4n pairing operations. In addition, she needs to compute the matrix IK, which requires 4n2 pairing operations in the basic construction and 6n pairing operations in the sliding-window construction. Since paring operations are orders of magnitude more expensive than the other group operations, we introduce a technique to minimize it. Basic Construction. The high-level idea is that we can leverage the bilinear property to delay the pairing operations. Instead of ^rst performing the pairing operations and then computing inner products in the target group, we can ^rst compute the inner products in G1 and then perform the pairings. In more detail, in the basic construction, let
Figure imgf000019_0001
Let be the sender messages. With receiver message otr =
Figure imgf000019_0003
the inner product of
Figure imgf000019_0004
can be computed as
Figure imgf000019_0005
Here
Figure imgf000019_0006
computes inner products for each vector component of M0 and results in a vector of two group elements in G1, and takes the inner product on the exponent of the two vectors. is computed in the same way. The same approach can be applied to compute The computational cost of
Figure imgf000019_0007
in the basic construction includes 4n pairing op- erations and 4n multiplications in GT . By using the above technique, this cost can be reduced to 4 pairing operations, 4n multiplications in G1, and 3 multiplications in GT . The same improvement applies to each inner product
Figure imgf000019_0008
. Therefore, the total computational cost of the sender is reduced to 4n pairing operations, 4n2 multipli- cations in G1, and 3n multiplications in GT . Sliding-Window Construction. The same technique can be applied on the sliding- window construction and the improvements on
Figure imgf000020_0012
is the same as above. The total cost of computing
Figure imgf000020_0013
in the sliding-window construction includes 6n pairing operations and (2n2 + 3n) multiplications in GT . This can be improved to 4n pairing operations, 4n2 multiplications in G1, and 3n multiplications in GT . Increasing Vector Dimension Reducing Hash Value Size. The hash value
Figure imgf000020_0011
currently contains a single group element in GT . Since the bit representation of group elements in GT is much longer than group elements in G1, we can reduce that by sending 4 group elements in G1, namely
Figure imgf000020_0014
and
Figure imgf000020_0015
1, and then let the receiver perform the remaining pairing operations. In applications such as PIR and PSI, the sender message grows with the tree depth and this saving in communication gets accumulated throughout all the levels of the tree. Another bene^t of this optimization is that it pushes the pairing operations in computing hashes to the receiver side, which signi^cantly reduces the computational cost in computing hashes because the sender had to compute hashes in every node of the tree while the receiver only needs to compute hashes along a single path of the tree. Next we discuss another technique to further reduce the cost to 3 group elements in G1. Basic Construction. At a high-level, we will unify
Figure imgf000020_0008
and
Figure imgf000020_0009
to a single vector by increasing the vector dimension from 2 to 3. In more detail, the base hash key M is the same as before except that each
Figure imgf000020_0010
p is of dimension 3. The receiver's reusable message is rede^ned by
Figure imgf000020_0001
where all pi's are random exponents and ; p For a choice bit b, the receiver samples a single random vector
Figure imgf000020_0002
, and sends a single vector
Figure imgf000020_0003
Next the sender computes
Figure imgf000020_0004
by taking the inner product in the exponent of M and The matrix IK can be computed by taking the inner product in the exponent of
Figure imgf000020_0007
's and We can use delayed pairing to compute
Figure imgf000020_0005
by
Figure imgf000020_0006
Again, we can reduce the hash value size by sending 3 group elements in the vector and postpone the pairing operations to the receiver side. It also reduces the receiver's non-reusable message from 4 group elements in G2 to 3. To summarize, the receiver's reusable message is increased from (4n2+4n) to (6n2+6n) group elements in G1, but the non-reusable message is reduced from 4 to 3 group elements in G2. The hash value in the sender's message is reduced from 1 group element in GT to 3 group elements in G1. Sliding-Window Construction. The same technique can be applied on the sliding- window construction and the improvements on the communication is the same as above. In particular, the receiver's reusable message is increased from 10n to 15n group elements in G1, but the non-reusable message is reduced from 4 to 3 group elements in G2. The hash value in the sender's message is reduced from 1 group element in GT to 3 group elements in G1. 6 Applications In this section, we discuss several applications of our amortized rate-1 OT and focus on the communication improvements over prior work. For certain applications, we will discuss optimizations that further improve the communication and/or computational complexity. Secure Function Evaluation on Branching Programs The work of Ishai and Paskin presents an approach to two-round secure function evaluation (SFE) on (oblivious) branch- ing program (BP) from rate-1 OT where the communication complexity only grows with the depth of the branching program instead of its size. In particular, consider a sender holding a private branching program P and a receiver holding a private input x. They can jointly compute P (x) in two rounds of communication, that is, the receiver ^rst sends an encryption c of the input x to the sender, and the the sender can compute a succinct ciphertext c which allows the receiver to decrypt P (x) without revealing any further information about P except its depth. The size of c depends polynomially on the size of x and the depth of P , but does not further depend on the size of P . In terms of concrete communication complexity, let
Figure imgf000021_0005
be the depth of the oblivious BP and h be the bit length of the output. The recent work of Garg et al. achieves receiver's communication complexity of
Figure imgf000021_0002
group elements and sender's communication complexity of
Figure imgf000021_0001
bits, where the group elements are from a pairing-free group where the power DDH assumption holds. This improves upon prior work of D^ottling et al. based on DDH with receiver's communication complexity of group elements and sender's communication complexity of
Figure imgf000021_0004
bits. Herein, we consider the problem in the reusable setting where the receiver ^rst sends a one-time reusable message to the sender consisting of
Figure imgf000021_0003
group elements in G1. Afterwards, for any oblivious BP with depth ` and output length h and any input x, the receiver's communication complexity is O group elements in G2 and the sender's communication complexity is O(h + λ ) bits. Note that the one-time messages can be reused for arbitrary polynomially many times. Example: Secure Inference of Decision Trees. As an example, we consider a server holding a machine learning model of a decision tree, which takes as input a data point with multiple features. Starting from the root, each node of the tree is a function on some feature (e.g. testing if x < 10, t = true) that determines whether to go to the left or right child. The client has a single data point and would like to perform a secure inference with the server on the decision tree. The decision tree can be formalized as a branching program and two-round secure inference can be achieved by two-round SFE described above, where the communication only grows with the depth of the tree. PSI and PIR In this section, we illustrate several useful applications that can be viewed as special cases of SFE on oblivious BP, hence they achieve the same improvements over prior work. Unbalanced Private Set Intersection (PSI) Consider the PSI problem between a server holding a private set X = {x1; : : : ; xN} and a client holding a private set Y = {y1; : : : ; ym}. They want to jointly compute the set intersection X ∩Y without revealing any other information. Without loss of generality we assume all the set elements xi; yj ∈ {0; 1}λ.1 We focus on the case with unbalanced set sizes, namely N ≫ m, and present a solution for two-round PSI. To learn the intersection X ∩ {y} for any y ∈ Y , we can construct an oblivious BP with depth ^ and size ^ ·N . To construct the oblivious BP, we can ^rst think of it as a full binary tree of depth ^ where each leaf node indicates whether the root-to-leaf path is an element in X. However, this branching program has exponential size. We can prune the full binary tree by replacing each subtree consisting of only 0's with a \dummy node" of the same depth. A dummy node of depth d is connected to two dummy nodes with depth d− 1. Following this approach, the client only needs to performs m instances of SFE on the oblivious BP to learn the intersection X ∩ {y} for every y ∈ Y . The oblivious BP has depth ` = λ, size λ ·N , and single-bit output. Private Set Intersection (PIR) Consider a server (sender) holding a large database D ∈ {0; 1}N and a client (receiver) who wants to retrieve D[i] for i ∈ [N ] without 1The set elements can be of arbitrary length, but the parties can ^rst apply a collision-resistant hash function on the elements to make them all have length λ. revealing i to the server. As is known, single-server two-round PIR can be viewed as two-round SFE on an oblivious BP with depth ` = logN and single-bit output. PIR-with-Default Consider a PIR variant where the server holds N binary strings s1; : : : ; sN ∈ {0; 1}t along with N values v1; : : : ; vN ∈ {0; 1}k. The server additionally holds a default value vd^t ∈ {0; 1}k. The client holds a binary string w ∈ {0; 1}t and wants to learn a value v such that if w = sj for some j ∈ [N ], then v = vj; otherwise v = vd^t, without revealing any information about w to the server. This problem is formalized by Lepoint et al. Two-round PIR-with-Default can be viewed as two-round SFE on a k-bit output oblivious BP with depth t and polynomial size. Hence the receiver and sender communication follow generically from oblivious BP with many-bit outputs. We mention this PIR variant because it will be used to construct PSI-Cardinality. PSI-Cardinality Consider a PSI variant where a server holding a private set X = {x1; : : : ; xN} and a client holding a private set Y = {y1; : : : ; ym} want to learn the cardi- nality of the intersection |X ∩ Y | instead of the intersection itself. We can achieve PSI-Cardinality by the client querying PIR-with-Default on every element in the her set, where in each PIR-with-Default instance, the default value vd i is ∑ ^t sampled at random such that all the default values sum up to 0, namely
Figure imgf000023_0001
All the non-default values in a single instance are set to
Figure imgf000023_0002
. At the end, the client sums up all the values retrieved from the PIR-with-Default instances. Similar to PSI, we should prune the full binary tree to obtain an oblivious BP with depth λ and polynomial size. Optimization for PSI and PSI-Cardinality We design optimizations for unbal- anced PSI and PSI-Cardinality so as to achieve better communication than the above generic approaches. Optimized PSI Note that the aforementioned oblivious BP for PSI has depth
Figure imgf000023_0003
. To further improve the communication complexity, we replace small subtrees by small instances of two-round PSI (e.g. DDH-based PSI), which we denote by πPSI. In particular, to computeX∩{y}, the server ^rst hashes hisN elements intoN random bins. We know that each bin has at most O(logN) elements. The client computes the same hash on y to identify the bin b that could possibly contain an element y. Now the client queries the server with PIR-with-Default on a string b. The client additionally sends the round-1 message of the two-round PSI protocol πPSI on a single element y. The server then computes a round-2 message of πPSI for each bin with elements in that bin. The server views his database for PIR-with-Default as all the N indices of the bins along with the associated values being the round-2 messages of πPSI, and generates the response for PIR-with-Default. Finally, the client ^rst recovers the round-2 message of πPSI from PIR-with-Default, and then recovers the output of πPSI, namely X ∩ {y}. The receiver's reusable communication is reduced from O(λ2) to O(λ · logN) group elements in G1. Then for each X ∩ {y} query, her online communication is reduced from O(λ) to O(logN) group elements in G2. The sender's communication is reduced from O(λ2) to O(λ · logN). PSI-Cardinality We can optimize the PSI-Cardinality protocol by replacing small sub- trees by small instances of two-round PSI-Cardinality (e.g. DDH-based PSI-Cardinality), similarly as in the above PSI protocol. However, this would reveal which elements are in the intersection and which are not. Nonetheless, we notice that in our reusable rate-1 OT protocol, any OT response from the sender can be decrypted by the receiver using the same secret state str, and the receiver cannot distinguish between di^erent responses. Therefore, the server can randomly shuffle the responses for all the PIR-with-Default instances so that the client can only learn the cardinality of the intersection. This achieves the same improvement as in the above PSI protocol. Other Variants of PSI and PIR In this section, we discuss a few more useful variants of PSI and PIR problems. PIR-by-Keywords Consider a PIR variant where the server holds N binary strings s1; : : : ; sN ∈ {0; 1}t. The client holds a binary string w ∈ {0; 1}t, who wants to learn whether w = sj for some j ∈ [N ] without revealing any information about w to the server. This problem was introduced by Chor et al. As is known, two-round PIR-by- Keywords can be viewed as two-round SFE on a branching program with depth ` = t and single-bit-output. PSI-Sum Consider a server holding a set with weights (X;W ) = {(x1; w1); : : : ; (xN ; wN)} and a client holding a set Y = {y1; : : : ; ym}. They want to jointly compute the PSI- Cardinality along with the sum of the weights associated with the elements in the inter- ∑ section, namely
Figure imgf000024_0001
This functionality is a generalization of PSI-Cardinality. We can achieve PSI-Sum from PIR-with-Default similarly as in the PSI-Cardinality protocol except that all the non-default values vj in a single instance are set to vi d^t +wj where wj is the corresponding weight. Note that this approach additionally hides the PSI-Cardinality and only reveals the PSI-Sum. PSI-Test Consider a PSI variant where a server holding a private set X = {x1; : : : ; xN} and a client holding a private set Y = {y1; : : : ; ym} want to learn whether the two sets intersect or not, namely whether |X ∩ Y | = ∅. We can achieve this from PIR-with-Default similarly as in PSI-Cardinality but all the non-default values in a single instance are all set to for some random ri. At the
Figure imgf000025_0001
end, the client checks if all the values obtained from the PIR-with-Default instances sum up to 0. The sum equals 0 if and only if |X ∩ Y | = ∅ except with negligible probability. Extended-PIR-with-Default An extension to PIR-with-Default enables two parties to learn random shares of the PIR-with-Default answer multiplied with a weight w sup- plied from the client. By using known techniques, we can achieve the same complexity as PIR-with-Default with additively homomorphic encryption. In particular, we make the following changes to the PIR-with-Default protocol. The client additionally sends Enc(w) to the server (in the online phase) where Enc is an additively homomorphic encryption scheme. The server picks a random value ^ as his output of Extended-PIR-with-Default and replaces each value v in a leaf node of the PIR-with-Default tree by Enc(v ·w−^). Fi- nally the client needs to decrypt her output from PIR-with-Default to recover her output for Extended-PIR-with-Default. We mention this PIR variant because it will be useful in the following application. Private Join and Compute (PJC) for Inner Product Consider a server holding a set with weights (X;W ) = {(x1; w1); : : : ; (xN ; wN)} and a client also holding a set with ∑ weights Y = {(y1; v1); : : : ; (ym; vm)}. They want to jointly compute the
Figure imgf000025_0002
This functionality, introduced by Lepoint et al., is a generalization of PSI-Sum. We can achieve this by the client querying Extended-PIR-with-Default on every ele- ment in her set, where in each Extended-PIR-with-Default instance, the default values are set to 0 and the two parties learn a secret share of wi · vj if X ∩ {yj} ̸= ∅. From this the two parties can sum up their own shares to obtain a secret sharing of the the inner product result. The server only needs to additionally send the sum of his shares to the client, from which the client can recover the output. Note that this approach additionally hides the PSI-Cardinality and only reveals the result of the inner product. 7 Amortized Rate-1 OT with Strong Sender Privacy We will now show that variants of our amortized rate-1 OT constructions satisfy a stronger sender privacy requirement, essential for secure computation on non-oblivious branching programs. De^nition 7.1 (Strong sender privacy). Let OT := (PreP;OT1;OT2;OT3) be as in Def- inition 2.2. We say OT provides strong sender privacy if there exists a PPT algorithm OTSim such that for any bit b and any pair of messages (m0;m1), sampling PreP
Figure imgf000026_0001
(str; b), the two distributions OT2((prm; otr); (m0;m1)) and OTSim(prm;mb) are statistically close. Our amortized rate-1 OT constructions, as presented in Sections 3,4, do not provide strong sender privacy, because OT2 is deterministic. Thus, we will consider a randomized OT2 version of these constructions, obtained by using random extractors and PRGs, as explained in Section 2. Under these new OT2 algorithms of our constructions, the follow- ing holds: for any choice b and any two pairs (m0;m1) and (m′ ′ ′ 0;m 1) such that mb = m b, any otr ∈ OT1(str; b), otr ∈ OT1(str; b), the two distributions OT2((prm; otr); (m0;m1)) and OT2((prm; otr); (m ′ 0;m 1)) are statistically close. The simulation algorithm OTSim, which is only given mb, should somehow sample from OT2((prm; otr); (m0;m1)). By what just mentioned, OTSim may, instead, sample from OT2((prm; otr); (mb;mb)). The main challenge in doing so is that OTSim is only given (prm;mb), and not otr, which in turn is sampled based on str, not known to OTSim. Luckily, in our proofs we showed an oblivi- ous way of sampling from OT1 without knowing str :=
Figure imgf000026_0002
. In particular, assuming OTSim is given (
Figure imgf000026_0005
v;w) sampled as (Definition 3.4), then
Figure imgf000026_0004
S (v;w; b)
Figure imgf000026_0003
(Definition 3.5) samples an output statistically close to the output of OT1(str; b). We may include
Figure imgf000026_0006
in prm without harming security, as argued in the security of the constructions. Once
Figure imgf000026_0007
( ; ) is included as part of prm, the output of OTSim(prm;mb) is formed as follows: sample
Figure imgf000026_0008
and return OT2((prm; otr); (mb;mb)). In terms of efficiency, the size of otr remains the same, and the size of prm is increased by four group elements in G1. System Implementations Fig. 1 depicts the basic technique of this disclosure. In this example embodiment, cloud computing infrastructure 100 comprises a data store 110 that hosts a corpus repository, typically with access controls. The cloud provider executes a cloud provider Private Set Intersection (PSI) manager 105, in some embodiments, in association with a service that provides responsive actions, such as one or more of: alerting, redaction, tokenization, labelling, sandboxing, and the like. Preferably, and as will be described, the data store 110 stores an entire set of content (information), although the tool 105 itself may just operates on an index of that set of content. The client computing environment 115, which typically is hosted in an client private network, comprises a database 125 of sensitive data (e.g., PII, PHI, or the like), as well as an instances of both the PSI manager 120 and a response service. Client-based resources communicate with cloud provider-based resources via client-server based communications, such as described herein. Each side of the communication link is implemented in one or more data processing systems, such as described and depicted herein. The cloud computing infrastructure may be implemented as described herein, and it may utilize one or more additional services. The PSI managers (105 and 120) interoperate with one another to implement a PSI protocol exchange, with the cloud-based manager evaluating the index of the set of content stored in the cloud data store. The response services can execute as software (one or more computer systems, programs, processes, etc.) executing in hardware or virtual machines. As described herein, the Private Set Intersection protocol, which is a form of secure multi-party computation (MPC), enables the two parties (the cloud provider, on the one hand, and the client, on the other hand) to learn if they have a piece of information in common, and without either party having to reveal the compared information to the other party. With this approach, the index of an arbitrarily large corpus 110 in the cloud computing environment 100 is examined, preferably in an automated manner, and the response service(s) flag or redact anything that is in the client's full, de^nitive list of sensitive data stored in the client data store 125 (e.g., patient names and record numbers, or any other piece of information that the client considers sensitive) without revealing to the service provider any new information that is not already present on the cloud. In effect, this approach thus provides for a \zero knowledge"-based proof regarding whether sensitive data is or is not present on the cloud (in other words, in the index), all without disclosing such information to facilitate the evaluation process itself. In this approach, preferably the sensitive data never leaves the client premises 115; rather, the database 125 containing the sensitive data connects to the client-side agent 125, which performs Private Set Intersection (PSI) interactively with the cloud-supported PSI agent 105 (which, as noted above, preferably examines its index of the information stored in the cloud, rather than examining that entire set of information itself), thereby detecting, for example, whether sensitive data ^elds or any API field that client users populate through a client application (not shown) from the client-side database 125 are present in any document or other object the cloud provider is permitted or allowed to access. In a preferred embodiment, the cloud provider PSI manager (agent) 105 connects to the corpus repository 110 containing an indexed corpus. The PSI protocol then is per- formed on the contents of the index. This operation may include only a corpus speci^c to a particular client, or a broader corpus to which the client has access for sensitive information detection. This embodiment allows clients to determine whether their sen- sitive information exists, even in a corpus to which they do not have full (or even any) access, a provider-owned or curated corpus. In this embodiment preferably the cloud provider-based PSI agent 105 integrates directly with APIs, performing PSI with the client's PSI agent in real-time to detect the passing of sensitive information in text fields as information enters the system. This embodiment thus allows the APIs to provide a real-time indication of apparent entry of sensitive data so that the client application can use the response service (or the like) to warn the client or the end user and/or redact the data before it is stored on the cloud. In an alternative embodiment, the PSI interaction is carried out between the cloud provider and a trusted third party (e.g., law enforcement, an intelligence agency, a con- tracted security organization, company auditors, authorized partners, etc.), where the trusted third party has a legitimate interest in detecting the presence of certain sensitive information, e.g., in a cognitive system, typically on behalf of the client. In this scenario, preferably the third party is not granted full access to the corpus or API, but still has a legitimate interest in detecting, for example, certain sensitive data (e.g., the names of persons of interest) in the cognitive system. Thus, as used herein, the access controls on the repository may be varied and will depend on the nature of the access limitation. Access controls may be role-based, user-based, or otherwise. The technique of this disclosure provides signi^cant advantages. As has been de- scribed, the approach herein provides for a way to detect whether specific sensitive data of a client is present in a cloud computing infrastructure without requiring that data be shared with the cloud provider, or that the cloud provider provide the client access to all (or even any) data in the cloud. The approach enables sensitive data detection that does not require DLP or other complex systems to be supported at the client, nor the training of a statistical classifier. The PSI-based approach is highly-secure, computationally- efficient, and ensures that sensitive data detection is facilitated with respect to those entities that have authorized rights to access the client database for the data detection. To this end, and as has been described each side of the communication preferably executes a PSI agent (tool), which is readily implemented in software. As used herein, a PSI agent typically is implemented in software, e.g., as a set of computer program instructions executed by one or more hardware processors. A par- ticular tool may comprise any number of programs, processes, execution threads, and the like, together with appropriate interfaces and databases to support data used or created by the tool. The tool may be configured or administered with a web-based front- end, via a command line, or the like. The tool may include one or more functions that are implemented programmatically, or that interoperate with other computing entities or software systems via an application programming interface (API), or any convenient request-response protocol The described approach is preferably web- or cloud-based, thereby avoiding traditional installation and deployment issues that often accompany DLP systems. The techniques provide for lightweight tooling (the client-server based PSI tool) to interact with the corpus (cloud-based) and the database (client-based) to detect potential sensitive data leakage. The approach thus promotes simple and effective cross-organization collabora- tion with sufficient privacy to alleviate or ameliorate security concerns. This subject matter may be implemented as-a-service. As previously noted, and without limitation, the subject matter may be implemented within or in association with a cloud deployment platform system or appliance, or using any other type of deployment systems, products, devices, programs or processes. As has been described, the PSI tool and related response system functionality may be provided as a standalone function, or it may leverage functionality from other products and services. A representative cloud application platform with which the technique may be imple- mented includes, without limitation, any cloud-supported application framework, product or service. Generalizing, the techniques herein may be implemented as a management solution, service, product, appliance, device, process, program, execution thread, or the like. Typ- ically, the techniques are implemented in software, as one or more computer programs executed in hardware processing elements, in association with data stored in one or more data sources, such as a problems database. Some or all of the processing steps described may be automated and operate autonomously in association with other systems. The automation may be full- or partial, and the operations (in whole or in part) may be synchronous or asynchronous, demand-based, or otherwise. These above-described components typically are each implemented as software, i.e., as a set of computer program instructions executed in one or more hardware processors. The components are shown as distinct, but this is not a requirement, as the components may also be integrated with one another in whole or in part. One or more of the components may execute in a dedicated location, or remote from one another. One or more of the components may have sub-components that execute together to provide the functionality. There is no requirement that particular functions of the generator service be executed by a particular component as named above, as the functionality herein (or any aspect thereof) may be implemented in other or systems. The tool and response functionality can interact or interoperate with security analytics systems or services. As has been described, the functionality described above may be implemented as a standalone approach, e.g., one or more software-based functions executed by one or more hardware processors, or it may be available as a managed service (including as a web ser- vice via a SOAP/XML interface). The particular hardware and software implementation details described herein are merely for illustrative purposes are not meant to limit the scope of the described subject matter. More generally, computing devices within the context of the disclosed subject matter are each a data processing system (such as shown in FIGs. 3 and 4) comprising hardware and software, and these entities communicate with one another over a network, such as the Internet, an intranet, an extranet, a private network, or any other communications medium or link. The applications on the data processing system provide native support for Web and other known services and protocols including, without limitation, support for HTTP, FTP, SMTP, SOAP, XML, WSDL, UDDI, and WSFL, among others. Infor- mation regarding SOAP, WSDL, UDDI and WSFL is available from the World Wide Web Consortium (W3C), which is responsible for developing and maintaining these standards; further information regarding HTTP, FTP, SMTP and XML is available from Internet Engineering Task Force (IETF). As noted, and in addition to the cloud-based environment, the techniques described herein may be implemented in or in conjunction with various server-side architectures including simple n-tier architectures, web portals, federated systems, and the like. Still more generally, the subject matter described herein can take the form of an en- tirely hardware embodiment, an entirely software embodiment or an embodiment contain- ing both hardware and software elements. In a preferred embodiment, the sensitive data detection service (or any component thereof) is implemented in software, which includes but is not limited to ^rmware, resident software, microcode, and the like. Furthermore, the download and delete interfaces and functionality can take the form of a computer pro- gram product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain or store the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or a semiconductor system (or apparatus or device). Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only mem- ory (CD-ROM), compact disk-read/write (CD-R/W) and DVD. The computer-readable medium is a tangible, non-transitory item. The computer program product may be a product having program instructions (or program code) to implement one or more of the described functions. Those instructions or code may be stored in a computer readable storage medium in a data processing system after being downloaded over a network from a remote data processing system. Or, those instructions or code may be stored in a computer readable storage medium in a server data processing system and adapted to be downloaded over a network to a remote data processing system for use in a computer readable storage medium within the remote system. In a representative embodiment, the techniques are implemented in a special purpose computing platform, preferably in software executed by one or more processors. The software is maintained in one or more data stores or memories associated with the one or more processors, and the software may be implemented as one or more computer programs. Collectively, this special-purpose hardware and software comprises the func- tionality described above. While the above describes a particular order of operations performed by certain em- bodiments of the invention, it should be understood that such order is exemplary, as alternative embodiments may perform the operations in a di^erent order, combine cer- tain operations, overlap certain operations, or the like. References in the speci^cation to a given embodiment indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Finally, while given components of the system have been described separately, one of ordinary skill will appreciate that some of the functions may be combined or shared in given instructions, program sequences, code portions, and the like. The Garbled circuit protocol (e.g., using oblivious transfer) as described herein is not intended to be limiting. Any cryptographic protocol that enables two-party secure computation in which two potentially mistrusting parties can jointly evaluate a function over their private inputs without the presence of a trusted third party may be used. Further, Private Set Intersection is just a representative cryptographic protocol. As an alternative, a Private Search protocol may be used. In this embodiment, the corpus is indexed on the cloud and a check is performed to determine if one or more terms of interest to a requesting client are in the index. The techniques herein provide for improvements to another technology or technical ^eld, namely, data detection security analysis tools and systems, and cloud-based systems, as well as improvements to the functioning of automated sensitive data detection tools and methods. Fig. 2 illustrates an example optimized two-round PSI protocol with a single element on the client side. Figs. 3 and 4 depict example computer systems useful for implementing various embodiments described in the present disclosure. Various embodiments may be imple- mented, for example, using one or more computer systems, such as computer system 500 shown in Fig. 3. One or more computer system(s) 500 may be used, for exam- ple, to implement any of the embodiments discussed herein, as well as combinations and sub-combinations thereof. Computer system 500 may include one or more processors (also called central pro- cessing units, processing devices, or CPUs), such as a processor 504. Processor 504 may be connected to a communication infrastructure 506 (e.g., such as a bus). Computer system 500 may also include user input/output device(s) 503, such as mon- itors, keyboards, pointing devices, etc., which may communicate with communication infrastructure 506 through user input/output interface(s) 502. One or more of processors 504 may be a graphics processing unit (GPU). In an embodiment, a GPU may be a pro- cessor that is a specialized electronic circuit designed to process mathematically intensive applications. The GPU may have a parallel structure that is e^cient for parallel process- ing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos, etc. Computer system 500 may also include a main memory 508, such as random-access memory (RAM). Main memory 508 may include one or more levels of cache. Main memory 508 may have stored therein control logic (i.e., computer software, instructions, etc.) and/or data. Computer system 500 may also include one or more secondary storage devices or secondary memory 510. Secondary memory 510 may include, for example, a hard disk drive 512 and/or a removable storage device or removable storage drive 514. Removable storage drive 514 may interact with a removable storage unit 518. Removable storage unit 518 may include a computer-usable or readable storage device having stored thereon computer software (control logic) and/or data. Removable storage drive 514 may read from and/or write to removable storage unit 518. Secondary memory 510 may include other means, devices, components, instrumen- talities, or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 500. Such means, devices, components, instrumentalities, or other approaches may include, for example, a removable storage unit 522 and an interface 520. Examples of the removable storage unit 522 and the interface 520 may include a program cartridge and cartridge interface, a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface. Computer system 500 may further include communications interface 524 (e.g., network interface). Communications interface 524 may enable computer system 500 to communi- cate and interact with any combination of external devices, external networks, external entities, etc. (individually and collectively referenced as remote device(s), network(s), entity(ies) 528). For example, communications interface 524 may allow computer sys- tem 500 to communicate with external or remote device(s), network(s), entity(ies) 528 over communications path 526, which may be wired and/or wireless (or a combination thereof), and which may include any combination of LANs, WANs, the Internet, etc. Control logic and/or data may be transmitted to and from computer system 500 via communications path 526. Computer system 500 may also be any of a personal digital assistant (PDA), desktop workstation, laptop or notebook computer, netbook, tablet, smartphone, smartwatch or other wearable devices, appliance, part of the Internet-of-Things, and/or embedded system, to name a few non-limiting examples, or any combination thereof. Computer system 500 may be a client or server computing device, accessing or hosting any applications and/or data through any delivery paradigm, including but not limited to remote or distributed cloud computing solutions; local or on-premises software (\on- premise" cloud-based solutions); \as a service" models (e.g., content as a service (CaaS), digital content as a service (DCaaS), software as a service (SaaS), managed software as a service (MSaaS), platform as a service (PaaS), desktop as a service (DaaS), framework as a service (FaaS), backend as a service (BaaS), mobile backend as a service (MBaaS), infrastructure as a service (IaaS), etc.); and/or a hybrid model including any combination of the foregoing examples or other services or delivery paradigms. Fig. 4 illustrates an example machine of a computer system 900 within which a set of instructions, for causing the machine to perform any one or more of the operations discussed herein, may be executed. In alternative implementations, the machine may be connected (e.g., networked) to other machines in a LAN, an intranet, an extranet, and/or the Internet. The machine may operate in the capacity of a server or a client machine in a client-server network environment, as a peer machine in a peer-to-peer (or distributed) network environment, or as a server or a client machine in a cloud computing infrastructure or environment. The machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a net- work router, a switch or bridge, a specialized application or network security appliance or device, or any machine capable of executing a set of instructions (sequential or other- wise) that specify actions to be taken by that machine. Further, while a single machine is illustrated, the term \machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. The example computer system 900 includes a processing device 902, a main memory 904 (e.g., read-only memory (ROM), flash memory, dynamic random-access memory (DRAM) such as synchronous DRAM (SDRAM), etc.), a static memory 906 (e.g., flash memory, static random-access memory (SRAM), etc.), and a data storage device 918, which communicate with each other via a bus 930. Processing device 902 represents one or more processing devices such as a micropro- cessor, a central processing unit, or the like. More particularly, the processing device may be complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) micropro- cessor, or processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 902 may also be one or more special- purpose processing devices such as an application-speci^c integrated circuit (ASIC), a field-programmable gate array (FPGA), a digital signal processor (DSP), network pro- cessor, or the like. The processing device 902 is configured to execute instructions 926 for performing the operations and steps discussed herein. The computer system 900 may further include a network interface device 908 to communicate over the network 920. The computer system 900 also may include a video display unit 910, an alphanumeric input device 912 (e.g., a keyboard), a cursor control device 914 (e.g., a mouse), a graphics processing unit 922, a signal generation device 916 (e.g., a speaker), graphics processing unit 922, video processing unit 928, and audio processing unit 932. The data storage device 918 may include a machine-readable medium 924 (also known as a computer-readable storage medium) on which is stored one or more sets of instruc- tions 926 (e.g., software instructions) embodying any one or more of the operations de- scribed herein. The instructions 926 may also reside, completely or at least partially, within the main memory 904 and/or within the processing device 902 during execution thereof by the computer system 900, where the main memory 904 and the processing device 902 also constitute machine-readable storage media. In an example, the instructions 926 include instructions to implement operations and functionality corresponding to the disclosed subject matter. While the machine-readable storage medium 924 is shown in an example implementation to be a single medium, the term \machine-readable storage medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions 926. The term \machine-readable storage medium" shall also be taken to include any medium that is capable of storing or encoding a set of instructions 926 for execution by the machine and that cause the machine to perform any one or more of the operations of the present disclosure. The term \machine-readable storage medium" shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media. Some portions of the detailed description have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the ways used by those skilled in the data processing arts to most e^ectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self- consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels ap- plied to these quantities. Unless speci^cally stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as \identifying" or \determining" or \executing" or \performing" or \collecting" or \creating" or \sending" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage devices. The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the intended purposes, or it may comprise a computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer-readable storage medium, such as but not limited to, any type of disk including ^oppy disks, opti- cal disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus. The operations and illustrations presented herein are not inherently related to any particular computer or other apparatus. Various types of systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the operations. The structure for a variety of these systems will appear as set forth in the description herein. In addition, the present disclosure is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein. The present disclosure may be provided as a computer program product, or software, that may include a machine-readable medium having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure. A machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer). For example, a machine-readable (e.g., computer-readable) medium includes a machine (e.g., a computer) readable storage medium such as read-only memory (\ROM"), ran- dom access memory (\RAM"), magnetic disk storage media, optical storage media, flash memory devices, etc. In some embodiments, a tangible, non-transitory apparatus or article of manufacture comprising a tangible, non-transitory computer useable or readable medium having con- trol logic (software) stored thereon may also be referred to herein as a computer program product or program storage device. This includes, but is not limited to, computer system 500, main memory 508, secondary memory 510, and removable storage units 518 and 522, as well as tangible articles of manufacture embodying any combination of the fore- going. Such control logic, when executed by one or more data processing devices (such as computer system 500), may cause such data processing devices to operate as described herein. Based on the teachings contained in this disclosure, it will be apparent to persons skilled in the relevant art(s) how to make and use embodiments of this disclosure using data processing devices, computer systems, and/or computer architectures other than that shown in Figs. 3 and 4. In particular, embodiments can operate with software, hardware, and/or operating system implementations other than those described herein. It is to be appreciated that the Detailed Description section, and not any other section, is intended to be used to interpret the claims. Other sections can set forth one or more but not all exemplary embodiments as contemplated by the inventor(s), and thus, are not intended to limit this disclosure or the appended claims in any way. While this disclosure describes exemplary embodiments for exemplary ^elds and ap- plications, it should be understood that the disclosure is not limited thereto. Other embodiments and modi^cations thereto are possible and are within the scope and spirit of this disclosure. For example, and without limiting the generality of this paragraph, embodiments are not limited to the software, hardware, ^rmware, and/or entities illus- trated in the ^gures described herein. Further, embodiments (whether or not explicitly described herein) have signi^cant utility to fields and applications beyond the examples described herein. Embodiments have been described herein with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily de^ned herein for the convenience of the description. Alternate boundaries can be de^ned as long as the specified functions and relationships (or equivalents thereof) are appropriately performed. Also, alternative embodiments can perform functional blocks, steps, operations, methods, etc. using orderings di^erent than those described herein. References herein to \one embodiment," \an embodiment," \an example embodi- ment," or similar phrases, indicate that the embodiment described can include a par- ticular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of persons skilled in the relevant art(s) to incorporate such feature, structure, or characteristic into other embodiments whether or not explicitly mentioned or described herein. Additionally, some embodiments can be described using the ex- pression \coupled" and \connected" along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, some embodiments can be described using the terms \connected" and/or \coupled" to indicate that two or more elements are in direct physical or electrical contact with each other. The term \coupled," however, can also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The breadth and scope of this disclosure should not be limited by any of the above- described exemplary embodiments but should be de^ned only in accordance with the following claims and their equivalents. In the foregoing speci^cation, implementations of the disclosure have been described with reference to speci^c example implementations thereof. It will be evident that various modi^cations may be made thereto without departing from the broader spirit and scope of implementations of the disclosure as set forth in the following claims. The speci^cation and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.

Claims

CLAIMS 1. A computerized method for determining if a remote cloud service contains a cer- tain data element without exposing the data element to the cloud service, the method comprising: storing on a remote server a data set X of N elements; storing on a client a single data element y, wherein all of the elements in X and y are lambda-bit strings; at the client: establishing g as a cryptographically secure hash function; executing the cryptographically secure hash function on the single data element y to generate a hash result b; computing a client message of a private set intersection protocol with the single data element y, and computing a client message of private information retrieval with query b, transmitting the client message of a private set intersection protocol and the client message of private information retrieval to the remote server; at the remote server: computing hashes of all N elements of the data set X using the secure hash function g; partitioning the N elements of the data set X into multiple sets based on the computed hashes, such that each partition represents a unique hash value; adding dummy elements to each partition to make them the same size; for each partition, generating a server response for the private set intersection protocol using the corresponding partition as the server input; computing a server message of private information retrieval using the generated server responses for the private set intersection protocol as input, and; transmitting the server message of private information retrieval to the client; at the client: computing a client output of the private information retrieval protocol to compute output z; computing a client output of the private set intersection protocol on input z to deter- mine whether y is in X; and outputting the result of the determination of whether y is in X to a user at the client.
2. The method of claim 1, wherein the private set intersection protocol is a two-round safe function evaluation where the client input is x and server input is X, and at the end of the protocol the client learns whether x is in X.
3. The method of claim 2, wherein the method makes only a limited use of expensive cryptographic group operations, wherein the number of operations is smaller than the size of X.
4. The method of claim 1, wherein the private information retrieval is a two-round safe function evaluation where the client input is index i and server input is X and at the end of the protocol the client learns the i-th element of X.
5. The method of claim 4, wherein the method uses limited communication, wherein the total number of bits sent over a channel is smaller than the size of X.
6. The method of claim 1, wherein the hash function g is selected to have an output size that is optimized for communication efficiency or to minimize use of expensive cryp- tographic group operations, such that a larger output size has improved communication efficiency and a smaller output size has reduced computation cost and less communica- tion efficiency, wherein the efficiency is measured by the number of bits exchanged over a channel.
7. The method of claim 1, wherein the data set X is provided by the client, or the data set X is provided by a third-party, or the data set belongs to the remote cloud service.
8. The method of claim 1 for wherein the data set X of N elements represents aggregated password data, and the single data element y represents a client password.
9. The method of claim 1 for wherein the data set X of N elements represents aggregated image information, and the single data element y represents a client image.
10. The method of claim 1 for wherein the data set X of N elements represents aggre- gated contact list or personal information, and the single data element y represents an instance of client contact information.
11. A system for encrypting for a functional encryption scheme, wherein the processor is configured for determining if a remote cloud service contains a certain data element without exposing the data element to the cloud service, the system comprising: a cloud computing infrastructure configured for: storing on a remote server a data set X of N elements; storing on a client a single data element y, wherein all of the elements in X and y are lambda-bit strings; a client environment configured for, at the client: establishing g as a cryptographically secure hash function; executing the cryptographically secure hash function on the single data element y to generate a hash result b; computing a client message of a private set intersection protocol with the single data element y, and computing a client message of private information retrieval with query b, transmitting the client message of a private set intersection protocol and the client message of private information retrieval to the remote server; the remote server being further configured for: computing hashes of all N elements of the data set X using the secure hash function g; partitioning the N elements of the data set X into multiple sets based on the computed hashes, such that each partition represents a unique hash value; adding dummy elements to each partition to make them the same size; for each partition, generating a server response for the private set intersection protocol using the corresponding partition as the server input; computing a server message of private information retrieval using the generated server responses for the private set intersection protocol as input, and; transmitting the server message of private information retrieval to the client; the client being further configured for: computing a client output of the private information retrieval protocol to compute output z; computing a client output of the private set intersection protocol on input z to deter- mine whether y is in X; and outputting the result of the determination of whether y is in X to a user at the client.
12. The system of claim 11, wherein the private set intersection protocol is a two-round safe function evaluation where the client input is x and server input is X, and at the end of the protocol the client learns whether x is in X.
13. The system of claim 12, wherein the private set intersection protocol makes only a limited use of expensive cryptographic group operations, wherein the number of opera- tions is smaller than the size of X.
14. One or more tangible, non-transitory, machine-readable media comprising instruc- tions configured to cause a processor to encrypt for a private set intersection scheme, wherein processing the private set intersection scheme comprises: storing on a remote server a data set X of N elements; storing on a client a single data element y, wherein all of the elements in X and y are lambda-bit strings; at the client: establishing g as a cryptographically secure hash function; executing the cryptographically secure hash function on the single data element y to generate a hash result b; computing a client message of a private set intersection protocol with the single data element y, and computing a client message of private information retrieval with query b, transmitting the client message of a private set intersection protocol and the client message of private information retrieval to the remote server; at the remote server: computing hashes of all N elements of the data set X using the secure hash function g; partitioning the N elements of the data set X into multiple sets based on the computed hashes, such that each partition represents a unique hash value; adding dummy elements to each partition to make them the same size; for each partition, generating a server response for the private set intersection protocol using the corresponding partition as the server input; computing a server message of private information retrieval using the generated server responses for the private set intersection protocol as input, and; transmitting the server message of private information retrieval to the client; at the client: computing a client output of the private information retrieval protocol to compute output z; computing a client output of the private set intersection protocol on input z to deter- mine whether y is in X; and outputting the result of the determination of whether y is in X to a user at the client.
15. The one or more machine-readable media of claim 14, wherein the private set inter- section protocol is a two-round safe function evaluation where the client input is x and server input is X, and at the end of the protocol the client learns whether x is in X.
PCT/US2022/047294 2021-10-21 2022-10-20 Memory and communications efficient protocols for private data intersection WO2023069631A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163270443P 2021-10-21 2021-10-21
US63/270,443 2021-10-21

Publications (1)

Publication Number Publication Date
WO2023069631A1 true WO2023069631A1 (en) 2023-04-27

Family

ID=86058541

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/047294 WO2023069631A1 (en) 2021-10-21 2022-10-20 Memory and communications efficient protocols for private data intersection

Country Status (1)

Country Link
WO (1) WO2023069631A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117728965A (en) * 2023-06-30 2024-03-19 荣耀终端有限公司 Method and server for obtaining information security degree

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140081980A1 (en) * 2012-09-17 2014-03-20 Nokia Corporation Method and apparatus for accessing and displaying private user information
US20190325082A1 (en) * 2018-04-19 2019-10-24 Microsoft Technology Licensing, Llc Private information retrieval with probabilistic batch codes
US20190342270A1 (en) * 2018-05-07 2019-11-07 Microsoft Technology Licensing, Llc Computing a private set intersection
US20200250296A1 (en) * 2019-02-05 2020-08-06 Shape Security, Inc. Detecting compromised credentials by improved private set intersection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140081980A1 (en) * 2012-09-17 2014-03-20 Nokia Corporation Method and apparatus for accessing and displaying private user information
US20190325082A1 (en) * 2018-04-19 2019-10-24 Microsoft Technology Licensing, Llc Private information retrieval with probabilistic batch codes
US20190342270A1 (en) * 2018-05-07 2019-11-07 Microsoft Technology Licensing, Llc Computing a private set intersection
US20200250296A1 (en) * 2019-02-05 2020-08-06 Shape Security, Inc. Detecting compromised credentials by improved private set intersection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PINKAS ET AL.: "PSI from PaXoS: Fast, Malicious Private Set Intersection", 39TH ANNUAL INTERNATIONAL 'CONFERENCE ON THE THEORY AND APPLICATIONS OF CRYPTOGRAPHIC TECHNIQUES, May 2020 (2020-05-01), pages 739 - 767, XP047659268, Retrieved from the Internet <URL:https://eprint.iacr.org/2020/193.pdf> [retrieved on 20230206], DOI: 10.1007/978-3-030-45724-2_25 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117728965A (en) * 2023-06-30 2024-03-19 荣耀终端有限公司 Method and server for obtaining information security degree

Similar Documents

Publication Publication Date Title
Demmler et al. PIR-PSI: scaling private contact discovery
Döttling et al. Trapdoor hash functions and their applications
CN108200063B (en) Searchable public key encryption method, system and server adopting same
Aloufi et al. Blindfolded evaluation of random forests with multi-key homomorphic encryption
Gajek Dynamic symmetric searchable encryption from constrained functional encryption
Dong et al. Fuzzy keyword search over encrypted data in the public key setting
Albrecht et al. Tightly secure ring-LWE based key encapsulation with short ciphertexts
Kissel et al. Verifiable phrase search over encrypted data secure against a semi-honest-but-curious adversary
Garimella et al. Structure-aware private set intersection, with applications to fuzzy matching
Chase et al. Amortizing rate-1 OT and applications to PIR and PSI
Rauthan et al. Homomorphic encryption approach for exploration of sensitive information retrieval
Moran et al. Incompressible encodings
Chen et al. Password-authenticated searchable encryption
WO2023069631A1 (en) Memory and communications efficient protocols for private data intersection
Rong et al. Privacy‐Preserving k‐Means Clustering under Multiowner Setting in Distributed Cloud Environments
Yang et al. Secure and efficient parallel hash function construction and its application on cloud audit
KR100951034B1 (en) Method of producing searchable keyword encryption based on public key for minimizing data size of searchable keyword encryption and method of searching data based on public key through that
Wang et al. Verifiable single-server private information retrieval
Samadani et al. Secure pattern matching based on bit parallelism: Non-interactive protocols for non-deterministic string matching automata evaluation
Guo et al. Order‐Revealing Encryption Scheme with Comparison Token for Cloud Computing
Mihailescu et al. Software engineering and applied cryptography in cloud computing and big data
Ben-Sasson et al. On public key encryption from noisy codewords
Hou et al. Public-key searchable encryption from lattices
Sun et al. Confidentiality‐Preserving Publicly Verifiable Computation Schemes for Polynomial Evaluation and Matrix‐Vector Multiplication
Prihandoko et al. Stream-keys generation based on graph labeling for strengthening Vigenere encryption.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22884482

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE