EP3031165A2 - A method and system for privacy preserving matrix factorization - Google Patents

A method and system for privacy preserving matrix factorization

Info

Publication number
EP3031165A2
EP3031165A2 EP14731436.3A EP14731436A EP3031165A2 EP 3031165 A2 EP3031165 A2 EP 3031165A2 EP 14731436 A EP14731436 A EP 14731436A EP 3031165 A2 EP3031165 A2 EP 3031165A2
Authority
EP
European Patent Office
Prior art keywords
records
recsys
csp
processor
matrix factorization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP14731436.3A
Other languages
German (de)
French (fr)
Inventor
Efstratios Ioannidis
Ehud WEINSBERG
Nina Anne TAFT
Marc Joye
Valeria NIKOLAENKO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thomson Licensing SAS
Original Assignee
Thomson Licensing SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US2013/076353 external-priority patent/WO2014137449A2/en
Application filed by Thomson Licensing SAS filed Critical Thomson Licensing SAS
Publication of EP3031165A2 publication Critical patent/EP3031165A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/46Secure multiparty computation, e.g. millionaire problem
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/50Oblivious transfer

Definitions

  • the present principles relate to privacy-preserving recommendation systems and secure multi-party computation, and in particular, to performing a collaborative filtering technique known as matrix factorization securely, in a privacy-preserving fashion in order to profile items.
  • Figure 1 illustrates the components of a general recommendation system 100: a number of users 110 representing a Source and a Recomender System (RecSys) 130 which processes the user's inputs 120 and outputs recommendations 140.
  • RecSys Recomender System
  • users supply substantial personal information about their preferences (users' inputs), trusting that the recommender will manage this data appropriately.
  • the present principles propose a method for performing a collaborative filtering technique known as matrix factorization securely, in a privacy-preserving fashion in order to profile items.
  • the method receives as inputs the ratings users gave to items (e.g., movies, books) and creates a profile for each item that can be subsequently used to predict what rating a user can give to each item.
  • the present principles allow a recommender system based on matrix factorization to perform this task without ever learning the ratings of a user, or even which item the user has rated.
  • a method for securely profiling items through matrix factorization including: receiving a set of records (220) from a Source, wherein a record contains a set of tokens and a set of items, and wherein each record is kept secret from parties other than said Source; receiving at least one separate item (360); and evaluating the set of records and the at least one separate item in a Recommender (RecSys) (230) by using a garbled circuit (395) based on matrix factorization, wherein the output of the garbled circuit are item profiles for the at least one separate item.
  • Recommender Recommender
  • the method can further include: designing the garbled circuit in a Crypto-System Provider (CSP) to perform matrix factorization on the set of records (380) and the at least one separate item (360), wherein the garbled circuit outputs the item profiles of the at least one separate item; and transferring the garbled circuit to the RecSys (385).
  • the step of designing in the method can include: designing a matrix factorization operation as a Boolean circuit (382).
  • the step of designing a matrix factorization circuit in the method can include: constructing an array of the set of records (410); and performing the operations of sorting (420, 440, 470, 490), copying (430, 450), updating (470, 480), comparing (480) and computing gradient contributions (460) on the array.
  • the method can further include: receiving a set of parameters for the design of the garbled circuit by said CSP, wherein the parameters were sent by the RecSys (330).
  • the method can further include: encrypting the set of records to create encrypted records (330), wherein the step of encrypting is performed prior to the step of receiving a set of records.
  • the method can be such that the public encryption keys are generated in the CSP and sent to the Source (320).
  • the method can further include: generating public encryption keys in the CSP; and sending the keys to the Source (320).
  • the encryption scheme can be a partially homomorphic encryption (330), and the method can further include: masking the encrypted records in the RecSys to create masked records (340); and decrypting the masked records in the CSP to create decrypted-masked records (350).
  • the step of designing (380) in the method can include: unmasking the decrypted-masked records inside the garbled circuit prior to processing them.
  • the method can further include: performing oblivious transfers (390) between the CSP and the RecSys (392), wherein the RecSys receives the garbled values of the decrypted-masked records and the records are kept private from the RecSys and the CSP.
  • the method can further include: receiving the number of tokens and items of each record (220, 310). Furthermore, the method can include: padding each record with null entries when the number of tokens of each record is smaller than a value representing a maximum value, in order to create records with a number of tokens equal to said value (312).
  • the Source of the set of records in the method can be one of a database and a set of users (210), wherein each user is a source of one record and each record is kept secret from parties other than its corresponding user.
  • a system for securely profiling items through matrix factorization including a Source which will provide a set of records, a Crypto-Service Provider (CSP) which will provide a secure matrix factorization circuit and a RecSys which will evaluate the records, such that the records are kept private from parties other than the Source, wherein the Source, the CSP and the RecSys each include a processor (602), for receiving at least one input/output (604); and at least one memory (606, 608) in signal communication with the processor, and wherein the RecSys processor is configured to: receive a set of records, wherein each record comprises a set of tokens and a set of items, and wherein each record is kept secret; receive at least one separate item; and evaluate the set of records and the at least one separate item with a garbled circuit based on matrix factorization, wherein the output of the garbled circuit are item profiles for the at least one separate item.
  • CSP Crypto-Service Provider
  • the CSP processor in the system can be configured to: design the garbled circuit to perform matrix factorization of the set of records and the at least one separate item, wherein the garbled circuit outputs the item profiles for the at least one separate item; and transfer the garbled circuit to the RecSys.
  • the CSP processor in the system can be configured to design the garbled circuit by being configured to: design a matrix factorization operation as a Boolean circuit.
  • the CSP processor in the system can be configured to design the matrix factorization circuit by being configured to: construct an array of said set of records; and perform the operations of sorting, copying, updating , comparing and computing gradient contributions on the array.
  • the CSP processor in the system can be further configured to: receive a set of parameters for the design of the garbled circuit, wherein the parameters were sent by said RecSys.
  • the Source processor in the system can be configured to: encrypt the set of records to create encrypted records prior to providing said set of records.
  • the CSP processor in the system can be further configured to: generate public encryption keys; and send the keys to the Source.
  • the encryption scheme can be a partially homomorphic encryption, and the RecSys processor can be further configured to: mask the encrypted records to create masked records; and the CSP processor can be further configured to: decrypt the masked records to create decrypted- masked records.
  • the CSP processor in the system can be configured to design the garbled circuit by being further configured to: unmask the decrypted- masked records inside the garbled circuit prior to processing them.
  • the RecSys processor and the CSP processor can be further configured to perform oblivious transfers, wherein said RecSys receives the garbled values of the decerypted-masked records and the records are kept private from the RecSys and the CSP.
  • the RecSys processor in the system can be further configured to: receive the number of tokens of each record, wherein the number of tokens were sent by said Source.
  • the Source processor in the system can be configured to: pad each record with null entries when the number of tokens of each record is smaller than a value representing a maximum value, in order to create records with a number of tokens equal to said value.
  • the Source of the set of records can be one of a database and a set of users, wherein if the Source is a set of users, each user comprises a processor (602), for receiving at least one input/output (604); and at least one memory (606, 608), and each user is a source of one record, wherein each record is kept secret from parties other than its corresponding user.
  • Figure 1 illustrates the components of a prior art recommendation system
  • Figure 2 illustrates the components of a recommendation system according to the present principles
  • Figure 3 illustrates a flowchart of a privacy-preserving method for profiling items through matrix factorization according to the present principles
  • Figure 4 illustrates a flowchart of the matrix factorization algorithm according to the present principles
  • Figure 5 illustrates the data structure S constructed by the matrix factorization algorithm according to the present principles
  • Figure 6 illustrates a block diagram of a computing environment utilized to implement the present principles.
  • a method for performing a collaborative filtering technique known as matrix factorization securely, in a privacy- preserving fashion in order to profile items.
  • the method of the present principles can serve as a service to profile at least one item in a corpus of records, each record comprising a set of tokens and items.
  • the set or records includes more than one record and the set of tokens includes at least one token.
  • a record could represent a user; the tokens could be a user' s ratings to the corresponding items in the record.
  • the tokens can also represent ranks, weights or measures associated with items, and the items can represent persons, tasks or jobs. For example, the ranks, weights or measures can be associated with the health of an individual, and a researcher is trying to correlate the health measures of a population.
  • the service wishes to do so without learning the contents of each record or any information extracted from the records other than the item profiles.
  • the service should not learn (a) in which records each token/item appeared or, a fortiori, (b) what tokens/items appear in each record and (c) the values of the tokens.
  • terms and words like "privacy-preserving", “private” and “secure” are used interchangeably to indicate that the information regarded as private by a user (record) is only known by the user.
  • matrix factorization should be performed without the recommender ever learning the users' ratings, or even which items they have rated. The latter requirement is key: earlier studies show that even knowing which movie a user has rated can be used to infer, e.g., their gender. Second, such a privacy- preserving algorithm ought to be efficient, and scale gracefully (e.g., linearly) with the number of ratings submitted by users. The privacy requirements imply that the matrix factorization algorithm ought to be data-oblivious: its execution ought to not depend on the user input.
  • n users rate a subset of m possible items (e.g., movies).
  • [n] ⁇ [1, ... , ri ⁇ the set of users
  • G M denote by r i - G Jl the rating generated by user i for item j.
  • both n and m are large numbers, typically ranging between 10 and 10 .
  • a recommender system wishes to predict the ratings for user/item pairs in [n] x [m] ⁇ M.
  • Matrix factorization performs this task by fitting a bi-linear model on the existing ratings.
  • G JT for some small dimension d G JT, it is assumed that there exist vectors U j G l d , i G [n] , and v j G l d , j G [m] , such that where are i.i.d. (independent and identically distributed) Gaussian random variables.
  • the vectors U j and vj are called the user and item profiles, respectively and (u j , vj ) is the inner product of the vectors.
  • the minimization in (2) corresponds to maximum likelihood estimation of U and V.
  • the regularized mean square error in (2) is not a convex function; several methods for performing this minimization have been proposed in literature.
  • the present principles focus on gradient descent, a popular method used in practice, which is described as follows. Denoting by F(U,V) the regularized mean square error in (2), gradient descent operates by iteratively adapting the profiles U and V through the adaptation rule:
  • U(0) and V(0) consist of uniformly random norm 1 rows (i.e., profiles are selected u.a.r. (uniformly at random) from the norm 1 ball).
  • Another aspect of the present principles is proposing a secure multi-party computation (MPC) algorithm for matrix factorization based on sorting networks and Yao's garbled circuits.
  • MPC secure multi-party computation
  • Yao's protocol a.k.a. garbled circuits
  • Yao's protocol is a generic method for secure multi -party computation.
  • the protocol is run between a set of n input owners, where a t denotes the private input of user i, 1 ⁇ i ⁇ n, an Evaluator, that wishes to evaluate /( ⁇ 3 ⁇ 4, ... , ⁇ 3 ⁇ 4), and a third party, the Crypto-Service Provider (CSP).
  • CSP Crypto-Service Provider
  • the Evaluator learns the value of /( ⁇ 3 ⁇ 4, ... , a n ) but no party learns more than what is revealed from this output value.
  • the protocol requires that the function / can be expressed as a Boolean circuit, e.g. as a graph of OR, AND, NOT and XOR gates, and that the Evaluator and the CSP do not collude.
  • any RAM program executable in bounded time T can be converted to a 0(T A 3) Turing machine (TM), which is a theoretical computing machine invented by Alan Turing to serve as an idealized model for mathematical calculation and wherein 0( ⁇ ⁇ 3) means that the complexity is proportional to T 3 .
  • TM Turing machine
  • any bounded T-time TM can be converted to a circuit of size 0(T log T), which is data-oblivious.
  • Sorting networks were originally developed to enable sorting parallelization as well as an efficient hardware implementation. These networks are circuits that sort an input sequence ( ⁇ 3 ⁇ 4, ⁇ 3 ⁇ 4, ... , n ) into a monotonically increasing sequence (a' 1( ' 2 , ... , a' n ). They are constructed by wiring together compare -and-swap circuits, their main building block.
  • Several works exploit the data-obliviousness of sorting networks for cryptographic purposes. However, encryption is not always enough to ensure privacy. If an adversary can observe your access patterns to encrypted storage, they can still learn sensitive information about what your applications are doing.
  • Oblivious RAM solves this problem by continuously shuffling memory as it is being accessed; thereby completely hiding what data is being accessed or even when it was previously accessed.
  • sorting is used as a means of generating data-oblivious random permutation. More recently, it has been used to perform data-oblivious computations of a convex hull, all-nearest neighbors, and weighted set intersection.
  • the present principles propose a method based on secure multi-party sorting which is close to weighted set intersection but which incorporates garbled circuits.
  • Figure 2 depicts the actors or parties in the privacy-preserving matrix factorization system, according to the present principles. They are as follows:
  • the Recommender System (RecSys) 230 an entity that performs the privacy- preserving matrix factorization operation.
  • the RecSys wishes to learn the item profiles V 240, as extracted from matrix factorization on user ratings without learning anything useful about the users or extracted from user data other than the item profiles.
  • CSP Crypto-Service Provider
  • a Source consisting of one or more users 210, each having a set of ratings to a set of items 220.
  • Each user i G [n] consents to the profiling of items based on their ratings j) G JVC through matrix factorization, but do not wish to reveal to the recommender their ratings or even which items they have rated.
  • the Source may represent a database containing the data of one or more users.
  • a protocol is proposed that allows the RecSys to execute matrix factorization to provide item profiles while neither the RecSys nor the CSP learn anything other than the item profiles, i.e., V, which is the sole output of RecSys in Figure 2. In particular, neither should learn a user's ratings, or even which items the user has actually rated.
  • V the sole output of RecSys in Figure 2.
  • the recommender can trivially infer a user's ratings from the inner product in (3).
  • the present principles propose a privacy-preserving protocol in which the recommender learns only the item profiles.
  • the item profile can be seen as a metric which defines an item as a function of the ratings of a set of users/records.
  • a user profile can be seen as a metric which defines a user as a function of the ratings of a set of users/records.
  • an item profile is a measure of approval/disapproval of an item, that is, a reflection of the features or charateristics of an item.
  • a user profile is a measure of the likes/dislikes of a user, that is, a reflection of the user's personality. If calculated based on a large set of users/records, an item or user profile can be seen as an independent measure of the item or user, respectively.
  • the embedding of items in l d through matrix factorization allows the recommender to infer (and encode) similarity: items whose profiles have small Euclidean distance are items that are rated similarly by users.
  • the task of learning the item profiles is of interest to the recommender beyond the actual task of recommendations.
  • the users may not need or wish to receive recommendations, as may be the case if the Source is a database.
  • the recommender can use them to provide relevant recommendations without any additional data revelation by users.
  • the recommender can send V to a user (or release it publicly); knowing her ratings per item, user i can infer her (private) profile, u i 5 by solving (2) with respect to u , for given V (this is a separable problem), and each user can obtain her profile by performing a ridge regression over her ratings. Having Uj and V the user can predict all her ratings to other items locally through (4).
  • the preferred embodiment of the present principles comprises a protocol satisfying the flowchart 300 in Figure 3 and described by the following steps:
  • the Source reports to the RecSys how many pairs of tokens (ratings) and items are going to be submitted for each participating record 310.
  • the set or records includes more than one record and the set of tokens per record includes at least one token.
  • the CSP generates a public encryption key for a partially homomorphic scheme, ⁇ , and sends it to all users (Source) 320.
  • homomorphic encryption is a form of encryption which allows specific types of computations to be carried out on ciphertext and obtain an encrypted result which decrypted matches the result of operations performed on the plaintext. For instance, one person could add two encrypted numbers and then another person could decrypt the result, without either of them being able to find the value of the individual numbers.
  • a partially homomorphic encryption is homomorphic with respect to one operation (addition or multiplication) on plaintexts.
  • a partially homomorphic encryption may be homomorphic with respect to addition and multiplication to a scalar.
  • Each user encrypts its data using its key and sends her encrypted data to the RecSys 330.
  • the user encrypts this pair using the public encryption key.
  • the RecSys ads a mask ⁇ to the encrypted data and sends the masked and encrypted data to the CSP 340.
  • a mask is a form of data obfuscation, and could be as simple as adding a random number generator or shuffling by a random number.
  • the CSP decrypts the masked data 350.
  • the RecSys receives or determines a separate set of items 360, on which to compute the matrix factorization.
  • This set of items may comprise all the items in the corpus, a subset of all the items, or even items not present in the records.
  • the Recsys sends to the CSP the complete specifications needed to build a garbled circuit 370, including the dimension of the user and item profiles (i.e., parameter d) 372, the total number of ratings (i.e., parameter M) 374, the total number of users and of items 376 and the number of bits used to represent the integer and fractional parts of a real number in the garbled circuit 378.
  • the separate set of items if not all the items present in the records, will be included in the parameters.
  • the CSP prepares what is known to the skilled artisan as a garbled circuit that performs matrix factorization 380 on the records with respect to the separate set of items.
  • a circuit is first written as a Boolean circuit 382.
  • the input to the circuit comprises the masks that the RecSys used to mask the user data. Inside the circuit, the mask is used to unmask the data, and then perform matrix factorization.
  • the output of the circuit is V, the item profiles. No knowledge is gained about the contents of any individual record and of any information extracted from the records other than the item profiles.
  • the CSP sends the garbled circuit for matrix factorization to the RecSys 385.
  • the CSP processes gates into garbled tables and transmits them to the RecSys in the order defined by circuit structure.
  • an oblivious transfer is a type of transfer in which a sender transfers one of potentially many pieces of information to a receiver, which remains oblivious as to what piece (if any) has been transferred.
  • the RecSys evaluates the garbled circuit that calculates the item profiles V and outputs the item profiles V 395.
  • this protocol leaks beyond V also the number of tokens provided by each user, This can be rectified through a simple protocol modification, e.g., by "padding" records submitted with appropriately “null” entries until reaching pre-set maximum number 312. For simplicity, the protocol was described without this "padding" operation.
  • a proxy oblivious transfer is an oblivious transfer is which 3 or more parties are involved. For this reason, the protocol of the present principles adopted the hybrid approach, combining public -key encryption with garbled circuits.
  • the CSP public-key encryption algorithm is partially homomorphic: a constant can be applied to an encrypted message without the knowledge of the corresponding decryption key.
  • an additively homomorphic scheme such as Paillier or Regev can also be used to add a constant, but hash-ElGamal, which is only partially homomorphic, suffices and can be implemented more efficiently in this case.
  • the RecSys sends them to the CSP together with the complete specifications needed to build a garbled circuit.
  • the RecSys specifies the dimension of the user and item profiles (i.e., parameter d), the total number of ratings (i.e., parameter M), and the total number of users and of items, as well as the number of bits used to represent the integer and fractional parts of a real number in the garbled circuit.
  • the CSP may provide the RecSys with a garbled circuit that (a) decrypts the inputs and then (b) performs matrix factorization.
  • decryption within the circuit is avoided by using masks and homomorphic encryption.
  • the present principles utilize this idea to matrix factorization, but only require a partially homomorphic encryption scheme.
  • the CSP Upon receiving the encryptions, the CSP decrypts them and gets the masked values (i, (j, r ⁇ j) 0 17) . Then, using the matrix factorization as a blueprint, the CSP prepares a Yao's garbled circuit that:
  • the computation of matrix factorization by the gradient descent operations outlined in (4) and (5) involves additions, subtractions and multiplications of real numbers. These operations can be efficiently implemented in a circuit.
  • the K iterations of gradient decent (4) correspond to K circuit "layers", each computing the new values of profiles from values in the preceding layer.
  • the outputs of the circuit are the item profiles V, while the user profiles are discarded.
  • a circuit implementation is provided based on sorting networks whose complexity is ⁇ (( ⁇ + m + M)log 2 (n + m + M)), i.e., within a polylogarithmic factor of the implementation in the clear.
  • _ for both the user and item profiles are stored together in an array.
  • user or item profiles can be placed close to the input with which they share an identifier. Linear passes through the data allow the computation of gradients, as well as updates of the profiles.
  • the placeholder is treated as + ⁇ , i.e., larger than any other number.
  • the first n and m tuples of S serve as placeholders for the user and item profiles, respectively, while the remaining M tuples store the inputs Lj. More specifically, for each user i G [n] , the algorithm constructs a tuple (i, _
  • the algorithm constructs the tuple (_!_,_/, 0, _
  • the resulting array is as shown in Figure 5(A). Denoting by s i k the ⁇ -th element of the k-th tuple, these elements serve the following roles:
  • Sort tuples in increasing order with respect to the user ids (with respect to rows 1 and 3), 420. If two ids are equal, break ties by comparing tuple flags, i.e., the 3rd elements in each tuple. Hence, after sorting, each "user profile" tuple is succeeded by "input" tuples with the same id:
  • the above operations are to be repeated K times, that is, the number of desirable iterations of gradient descent.
  • the array is sorted with respect to the flags (i.e., s 3 k ) as a primary index, and the item ids (i.e., s 2,f c) as a secondary index. This brings all item profile tuples in the first m positions in the array, from which the item profiles can be outputted.
  • the array is sorted with respect to the flags (i.e., s 3 k ) as a primary index, and the user ids (i.e., s l k ) as a secondary index. This brings all user profile tuples to the first n positions in the array, from which the user profiles can be outputted.
  • each of the above operations is data- oblivious, and can be implemented as a circuit.
  • Copying and updating profiles requires n + m + M) gates, so the overall complexity is determined by sorting which, e.g., using Batcher's circuit yields a 0((n + m + M)log 2 (n + m + M)) cost.
  • Sorting and the gradient computation in step C6 of the algorithm are the most computationally intensive operations; inevitably, both are highly parallelizable.
  • sorting can be further optimized by reusing previously computed comparisons at each iteration.
  • this circuit can be implemented as a Boolean circuit (e.g., as a graph of OR, AND, NOT and XOR gates), which allows the implementation to be garbled, as previously explained.
  • the implementation of the matrix factorization algorithm described above together with the protocol previously described provides a novel method for matrix factorization, in a privacy-preserving fashion.
  • this solution yields a circuit with a complexity within a polylogarithmic factor of matrix factorization performed in the clear by using sorting networks.
  • an additional advantage of this implementation is that the garbling and the execution of this circuit are highly parallelizable.
  • the garbled circuit construction was based on FastGC, a publicly available garbled circuit framework.
  • FastGC is a Java-based open-source framework, which enables circuit definition using elementary XOR, OR and AND gates. Once the circuits are constructed, the framework handles garbling, oblivious transfer and the complete evaluation of the garbled circuit.
  • FastGC represents the entire ungarbled circuit in memory as a set of Java objects. These objects incur a significant memory overhead relative to the memory footprint that the ungarbled circuit should introduce, as only a subset of the gates is garbled and/or executed at any point in time.
  • the framework was modified to address these two issues, reducing the memory footprint of FastGC but also enabling parallelized garbling and computation across multiple processors.
  • a layer is created in memory only when all its inputs are ready. Once it is garbled and evaluated, the entire layer is removed from memory, and the following layer can be constructed, thus limiting the memory footprint to the size of the largest layer.
  • the execution of a layer is performed using a scheduler that assigns its slices to threads, enabling them to run in parallel.
  • parallelization was implemented on a single machine with multiple cores, the implementation can be extended to run across different machines in a straightforward manner since no shared state between slices is assumed.
  • the basic building block of a sorting network is a compare-and-swap circuit, that compares two items and swaps them if necessary, so that the output pair is ordered.
  • the sorting operations (lines C4 and C8) of the matrix factorization algorithm perform identical comparisons between tuples at each of the K gradient descent iterations, using exactly the same inputs per iteration. In fact, each sorting permutes the tuples in array S in exactly the same manner, at each iteration. This property is exploited by performing the comparison operations for each of these sortings only once.
  • sortings of tuples of the form are performed in the beginning of the computation (without the payload of user or item profiles), e.g., with respect to i and the flag first, j and the flag, and back to i and the flag.
  • the outputs of the comparison circuits are reused in each of these sortings as input to the swap circuits used during gradient descent.
  • the "sorting" network applied at each iteration does not perform any comparisons, but simply permutes tuples (i.e., it is a "permutation" network);
  • Precomputing all comparisons allows us to also drastically reduce the size of tuples in S.
  • the rows corresponding to user or item ids are only used in matrix factorization algorithm as input to comparisons during sorting.
  • Flags and ratings are used during copy and update phases, but their relative positions are identical at each iteration.
  • these positions can be computed as outputs of the sorting of the tuples (i, j, flag, rating) at the beginning of our computation.
  • the "permutation" operations performed at each iteration need only be applied to the user and item profiles; all other rows can be removed from array S.
  • One more improvement reduces the cost of permutations by an additional factor of 2: to fix one set of profiles, e.g., users, and permute only item profiles. Then, item profiles rotate between two states, each one reachable from the other through permutation: one in which they are aligned with user profiles and partial gradients are computed, and one in which item profiles are updated and copied.
  • Sorting and gradient computations constitute the bulk of the computation in the matrix factorization circuit (copying and updating contribute no more than 3% of the execution time and 0.4% of the non-xor gates); these operations are parallelized through this extension of FastGC.
  • Gradient computations are clearly parallelizable; sorting networks are also highly parallelizable (parallelization is the main motivation behind their development).
  • the parallel slices in each sort are identical, the same FastGC objects defining the circuit slices are reused with different inputs, significantly reducing the need to repeatedly create and destroy objects in memory.
  • the present principles may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof.
  • the present principles are implemented as a combination of hardware and software.
  • the software is preferably implemented as an application program tangibly embodied on a program storage device.
  • the application program may be uploaded to, and executed by, a machine comprising any suitable architecture.
  • the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s).
  • CPU central processing units
  • RAM random access memory
  • I/O input/output
  • the computer platform also includes an operating system and microinstruction code.
  • various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof), which is executed via the operating system.
  • various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.
  • FIG. 6 shows a block diagram of a minimum computing environment 600 used to implement the present principles.
  • the computing environment 600 includes a processor 610, and at least one (and preferably more than one) I/O interface 620.
  • the I/O interface can be wired or wireless and, in the wireless implementation is pre-configured with the appropriate wireless communication protocols to allow the computing environment 600 to operate on a global network (e.g., internet) and communicate with other computers or servers (e.g., cloud based computing or storage servers) so as to enable the present principles to be provided, for example, as a Software as a Service (SAAS) feature remotely provided to end users.
  • SAAS Software as a Service
  • One or more memories 630 and/or storage devices (HDD) 640 are also provided within the computing environment 600.
  • the computing environment 600 or a plurality of computer environments 600 may implement the protocol Pl-Pl l ( Figure 3), for the matrix factorization CI -CI 2 ( Figure 4) according to one embodiment of the present principles.
  • a computing environment 600 may implement the RecSys 230; a separate computing environment 600 may implement the CSP 250 and a Source may contain one or a plurality of computer environments 600, each associated with a distinct user 210, including but not limited to desktop computers, cellular phones, smart phones, phone watches, tablet computers, personal digital assistant (PDA), netbooks and laptop computers, used to communicate with the RecSys 230 and the CSP 250.
  • the CSP 250 can be included in the Source, or equivalently, included in the computer environment of each User 210 of the Source.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Storage Device Security (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)

Abstract

A method and a system for securely profiling items through matrix factorization for use in recommendation systems commences by receiving as input a set of records including tokens and items, without learning the content of any individual record; designing and evaluating a garbled circuit based on matrix factorization on the set of records to generate item profiles in a privacy-preserving way about at least one item, without learning the content of any individual record or any information extracted from the records other than the item profiles. The system includes three parties: a plurality of users or a database representing a Source for the records; a Crypto-Service Provider which will design the garbled circuit and a Recommender System which will evaluate the circuit, such that the records and any information extracted from the records other than the item profiles are kept secret from parties other than their source.

Description

A METHOD AND SYSTEM FOR PRIVACY PRESERVING MATRIX
FACTORIZATION
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of and priority to the U.S. Provisional Patent Applications filed on August 9, 2013: Serial No. 61/864088 and titled "A METHOD AND SYSTEM FOR PRIVACY PRESERVING MATRIX FACTORIZATION"; Serial No. 61/864085 and titled "A METHOD AND SYSTEM FOR PRIVACY PRESERVING COUNTING"; Serial No. 61/864094 and titled "A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION TO RATING CONTRIBUTING USERS BASED ON MATRIX FACTORIZATION"; and Serial No. 61/864098 and titled "A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION BASED ON MATRIX FACTORIZATION AND RIDGE REGRESSION". In addition, this application claims the benefit of and priority to the PCT Patent Application filed on December 19, 2013, Serial No. PCT/US 13/76353 and titled "A METHOD AND SYSTEM FOR PRIVACY PRESERVING COUNTING" and to the U.S. Provisional Patent Application filed on March 4, 2013: Serial No. 61/772404 and titled "PRIVACY-PRESERVING LINEAR AND RIDGE REGRESSION". The provisional and PCT applications are expressly incorporated by reference herein in their entirety for all purposes.
TECHNICAL FIELD
[0001] The present principles relate to privacy-preserving recommendation systems and secure multi-party computation, and in particular, to performing a collaborative filtering technique known as matrix factorization securely, in a privacy-preserving fashion in order to profile items.
BACKGROUND
[0002] A great deal of research and commercial activity in the last decade has led to the wide-spread use of recommendation systems. Such systems offer users personalized recommendations for many kinds of items, such as movies, TV shows, music, books, hotels, restaurants, and more. Figure 1 illustrates the components of a general recommendation system 100: a number of users 110 representing a Source and a Recomender System (RecSys) 130 which processes the user's inputs 120 and outputs recommendations 140. To receive useful recommendations, users supply substantial personal information about their preferences (users' inputs), trusting that the recommender will manage this data appropriately.
[0003] Nevertheless, earlier studies, such as those by B. Mobasher, R. Burke, R. Bhaumik, and C. Williams: "Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness.", ACM Trans. Internet Techn., 7(4), 2007, and by E. A'imeur, G. Brassard, J. M. Fernandez, and F. S. M. Onana: "ALAMBIC: A privacy- preserving recommender system for electronic commerce", Int. Journal Inf. Sec, 7(5), 2008, have identified multiple ways in which recommenders can abuse such information or expose the user to privacy threats. Recommenders are often motivated to resell data for a profit, but also to extract information beyond what is intentionally revealed by the user. For example, even records of user preferences typically not perceived as sensitive, such as movie ratings or a person's TV viewing history, can be used to infer a user's political affiliation, gender, etc. The private information that can be inferred from the data in a recommendation system is constantly evolving as new data mining and inference methods are developed, for either malicious or benign purposes. In the extreme, records of user preferences can be used to even uniquely identify a user: A. Naranyan and V. Shmatikov strikingly demonstrated this by de-anonymizing the Netflix dataset in "Robust de-anonymization of large sparse datasets", in IEEE S&P, 2008. As such, even if the recommender is not malicious, an unintentional leakage of such data makes users susceptible to linkage attacks, that is, an attack which uses one database as auxiliary mfon.nat.ion to compromise privacy in a different database.
[0004] Because one cannot always foresee future inference threats, accidental information leakage, or insider threats (purposeful leakage), it is of interest to build a recommendation system in which users do not reveal their personal data in the clear. There are no practical recommendation systems today that operate on encrypted data. In addition, it is of interest to build a recommender which can profile items without ever learning the ratings that users provide, or even which items the users have rated. The present principles propose such a secure recommendation system. SUMMARY
[0005] The present principles propose a method for performing a collaborative filtering technique known as matrix factorization securely, in a privacy-preserving fashion in order to profile items. In particular, the method receives as inputs the ratings users gave to items (e.g., movies, books) and creates a profile for each item that can be subsequently used to predict what rating a user can give to each item. The present principles allow a recommender system based on matrix factorization to perform this task without ever learning the ratings of a user, or even which item the user has rated.
[0006] According to one aspect of the present principles, a method for securely profiling items through matrix factorization is provided, the method including: receiving a set of records (220) from a Source, wherein a record contains a set of tokens and a set of items, and wherein each record is kept secret from parties other than said Source; receiving at least one separate item (360); and evaluating the set of records and the at least one separate item in a Recommender (RecSys) (230) by using a garbled circuit (395) based on matrix factorization, wherein the output of the garbled circuit are item profiles for the at least one separate item. The method can further include: designing the garbled circuit in a Crypto-System Provider (CSP) to perform matrix factorization on the set of records (380) and the at least one separate item (360), wherein the garbled circuit outputs the item profiles of the at least one separate item; and transferring the garbled circuit to the RecSys (385). The step of designing in the method can include: designing a matrix factorization operation as a Boolean circuit (382). The step of designing a matrix factorization circuit in the method can include: constructing an array of the set of records (410); and performing the operations of sorting (420, 440, 470, 490), copying (430, 450), updating (470, 480), comparing (480) and computing gradient contributions (460) on the array. The method can further include: receiving a set of parameters for the design of the garbled circuit by said CSP, wherein the parameters were sent by the RecSys (330).
[0007] According to one aspect of the present principles, the method can further include: encrypting the set of records to create encrypted records (330), wherein the step of encrypting is performed prior to the step of receiving a set of records. The method can be such that the public encryption keys are generated in the CSP and sent to the Source (320). The method can further include: generating public encryption keys in the CSP; and sending the keys to the Source (320). The encryption scheme can be a partially homomorphic encryption (330), and the method can further include: masking the encrypted records in the RecSys to create masked records (340); and decrypting the masked records in the CSP to create decrypted-masked records (350). The step of designing (380) in the method can include: unmasking the decrypted-masked records inside the garbled circuit prior to processing them. The method can further include: performing oblivious transfers (390) between the CSP and the RecSys (392), wherein the RecSys receives the garbled values of the decrypted-masked records and the records are kept private from the RecSys and the CSP.
[0008] According to one aspect of the present principles, the method can further include: receiving the number of tokens and items of each record (220, 310). Furthermore, the method can include: padding each record with null entries when the number of tokens of each record is smaller than a value representing a maximum value, in order to create records with a number of tokens equal to said value (312). The Source of the set of records in the method can be one of a database and a set of users (210), wherein each user is a source of one record and each record is kept secret from parties other than its corresponding user.
[0009] According to one aspect of the present principles, a system for securely profiling items through matrix factorization is provided, including a Source which will provide a set of records, a Crypto-Service Provider (CSP) which will provide a secure matrix factorization circuit and a RecSys which will evaluate the records, such that the records are kept private from parties other than the Source, wherein the Source, the CSP and the RecSys each include a processor (602), for receiving at least one input/output (604); and at least one memory (606, 608) in signal communication with the processor, and wherein the RecSys processor is configured to: receive a set of records, wherein each record comprises a set of tokens and a set of items, and wherein each record is kept secret; receive at least one separate item; and evaluate the set of records and the at least one separate item with a garbled circuit based on matrix factorization, wherein the output of the garbled circuit are item profiles for the at least one separate item. The CSP processor in the system can be configured to: design the garbled circuit to perform matrix factorization of the set of records and the at least one separate item, wherein the garbled circuit outputs the item profiles for the at least one separate item; and transfer the garbled circuit to the RecSys. The CSP processor in the system can be configured to design the garbled circuit by being configured to: design a matrix factorization operation as a Boolean circuit. The CSP processor in the system can be configured to design the matrix factorization circuit by being configured to: construct an array of said set of records; and perform the operations of sorting, copying, updating , comparing and computing gradient contributions on the array. The CSP processor in the system can be further configured to: receive a set of parameters for the design of the garbled circuit, wherein the parameters were sent by said RecSys.
[0010] According to one aspect of the present principles, the Source processor in the system can be configured to: encrypt the set of records to create encrypted records prior to providing said set of records. The CSP processor in the system can be further configured to: generate public encryption keys; and send the keys to the Source. The encryption scheme can be a partially homomorphic encryption, and the RecSys processor can be further configured to: mask the encrypted records to create masked records; and the CSP processor can be further configured to: decrypt the masked records to create decrypted- masked records. The CSP processor in the system can be configured to design the garbled circuit by being further configured to: unmask the decrypted- masked records inside the garbled circuit prior to processing them. The RecSys processor and the CSP processor can be further configured to perform oblivious transfers, wherein said RecSys receives the garbled values of the decerypted-masked records and the records are kept private from the RecSys and the CSP.
[0011] According to one aspect of the present principles, the RecSys processor in the system can be further configured to: receive the number of tokens of each record, wherein the number of tokens were sent by said Source. The Source processor in the system can be configured to: pad each record with null entries when the number of tokens of each record is smaller than a value representing a maximum value, in order to create records with a number of tokens equal to said value. The Source of the set of records can be one of a database and a set of users, wherein if the Source is a set of users, each user comprises a processor (602), for receiving at least one input/output (604); and at least one memory (606, 608), and each user is a source of one record, wherein each record is kept secret from parties other than its corresponding user.
[0012] Additional features and advantages of the present principles will be made apparent from the following detailed description of illustrative embodiments which proceeds with reference to the accompanying figures. BRIEF DESCRIPTION OF THE DRAWINGS
[0013] The present principles may be better understood in accordance with the following exemplary figures briefly described below
Figure 1 illustrates the components of a prior art recommendation system;
Figure 2 illustrates the components of a recommendation system according to the present principles;
Figure 3 (A, B and C) illustrates a flowchart of a privacy-preserving method for profiling items through matrix factorization according to the present principles;
Figure 4 (A, B and C) illustrates a flowchart of the matrix factorization algorithm according to the present principles;
Figure 5 (A, B) illustrates the data structure S constructed by the matrix factorization algorithm according to the present principles;
Figure 6 illustrates a block diagram of a computing environment utilized to implement the present principles.
DETAILED DISCUSSION OF THE EMBODIMENTS
[0014] In accordance with the present principles, a method is provided for performing a collaborative filtering technique known as matrix factorization securely, in a privacy- preserving fashion in order to profile items.
[0015] The method of the present principles can serve as a service to profile at least one item in a corpus of records, each record comprising a set of tokens and items. The set or records includes more than one record and the set of tokens includes at least one token. A skilled artisan will recognize in the example above that a record could represent a user; the tokens could be a user' s ratings to the corresponding items in the record. The tokens can also represent ranks, weights or measures associated with items, and the items can represent persons, tasks or jobs. For example, the ranks, weights or measures can be associated with the health of an individual, and a researcher is trying to correlate the health measures of a population. Or they can be associated with the productivity of an individual and a company is trying to predict schedules for certain jobs, based on prior history. However, to ensure the privacy of the individuals involved, the service wishes to do so without learning the contents of each record or any information extracted from the records other than the item profiles. In particular, the service should not learn (a) in which records each token/item appeared or, a fortiori, (b) what tokens/items appear in each record and (c) the values of the tokens. In the following, terms and words like "privacy-preserving", "private" and "secure" are used interchangeably to indicate that the information regarded as private by a user (record) is only known by the user.
[0016] There are several challenges associated with performing matrix factorization in a privacy-preserving way. First, to address the privacy concerns, matrix factorization should be performed without the recommender ever learning the users' ratings, or even which items they have rated. The latter requirement is key: earlier studies show that even knowing which movie a user has rated can be used to infer, e.g., their gender. Second, such a privacy- preserving algorithm ought to be efficient, and scale gracefully (e.g., linearly) with the number of ratings submitted by users. The privacy requirements imply that the matrix factorization algorithm ought to be data-oblivious: its execution ought to not depend on the user input. Moreover, the operations performed by matrix factorization are non-linear; thus it is not a-priori clear how to implement matrix factorization efficiently under both of these constraints. Finally, in a practical, real-world scenario, users have limited communication and computation resources, and should not be expected to remain online after they have supplied their data. Instead it is desirable to have a "send and forget" type solution that can operate in the presence of users that move back and forth between being online and offline from the recommendation service.
[0017] As an overview of matrix factorization, in the standard "collaborative filtering" setting, n users rate a subset of m possible items (e.g., movies). For [n] ·■= [1, ... , ri} the set of users, and [m]■= {1, ... , m} the set of items, denote by M _Ξ [n] x [m] the user/item pairs for which a rating has been generated, and by M = [M ] the total number of ratings. Finally, for G M , denote by ri - G Jl the rating generated by user i for item j. In a
4 6 practical setting, both n and m are large numbers, typically ranging between 10 and 10 . In addition, the ratings provided are sparse, that is, M = 0(n + m), which is much smaller than the total number of potential ratings n m. This is consistent with typical user behavior, as each user may rate only a finite number of items (not depending on m, the "catalogue" size).
[0018] Given the ratings in M , a recommender system wishes to predict the ratings for user/item pairs in [n] x [m]\M. Matrix factorization performs this task by fitting a bi-linear model on the existing ratings. In particular, for some small dimension d G JT, it is assumed that there exist vectors Uj G ld, i G [n] , and vj G ld, j G [m] , such that where are i.i.d. (independent and identically distributed) Gaussian random variables. The vectors Uj and vj are called the user and item profiles, respectively and (uj, vj ) is the inner product of the vectors. The used notation is U = [w ] ie [n] G R " n d, for the n x d matrix whose i-th row comprises the profile of user i, and V = G lmxd for the m x d
matrix whose j-th row comprises the profile of item j.
[0019] Given the ratings R = [Γ^ : G M , the recommender typically computes the profiles U and V performing the following regularized least squares minimization: for some positive λ, μ > 0. One skilled in the art will recognize that, assuming Gaussian priors on the profiles U and V, the minimization in (2) corresponds to maximum likelihood estimation of U and V. Note that, having the user and item profiles, the recommender can subsequently predict the ratings R = (fi - : i G [n], j G [m]} such that, for user i and item j: Ti = (ui, Vj ), i G [n], j G [m] (3)
[0020] The regularized mean square error in (2) is not a convex function; several methods for performing this minimization have been proposed in literature. The present principles focus on gradient descent, a popular method used in practice, which is described as follows. Denoting by F(U,V) the regularized mean square error in (2), gradient descent operates by iteratively adapting the profiles U and V through the adaptation rule:
Ui (t) = Ui (t - 1) - yFMiF(U(£ - 1), V(t - 1))
(4) Vi (t = Vi (t - l - yFw.F(U(t - 1), V(t - 1)) where y>0 is a small gain factor and
(5) where U(0) and V(0) consist of uniformly random norm 1 rows (i.e., profiles are selected u.a.r. (uniformly at random) from the norm 1 ball).
[0021] Another aspect of the present principles is proposing a secure multi-party computation (MPC) algorithm for matrix factorization based on sorting networks and Yao's garbled circuits. Secure multi-party computation (MPC) was initially proposed by A. Chi- Chih Yao in the 1980's. Yao's protocol (a.k.a. garbled circuits) is a generic method for secure multi -party computation. In a variant thereof, adapted from "Privacy-preserving Ridge Regression on Hundreds of millions of records", in IEEE S&P, 2013, by V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, and N. Taft, the protocol is run between a set of n input owners, where at denotes the private input of user i, 1 < i < n, an Evaluator, that wishes to evaluate /(<¾, ... , <¾), and a third party, the Crypto-Service Provider (CSP). At the end of the protocol, the Evaluator learns the value of /(<¾, ... , an) but no party learns more than what is revealed from this output value. The protocol requires that the function / can be expressed as a Boolean circuit, e.g. as a graph of OR, AND, NOT and XOR gates, and that the Evaluator and the CSP do not collude.
[0022] There are recently many frameworks that implement Yao's garbled circuits. A different approach to general purpose MPC is based on secret- sharing schemes and another is based on fully-homomorphic encryption (FHE). Secret-sharing schemes have been proposed for a variety of linear algebra operations, such as solving a linear system, linear regression, and auctions. Secret-sharing requires at least three non-colluding online authorities that equally share the workload of the computation, and communicate over multiple rounds; the computation is secure as long as no two of them collude. Garbled circuits assumes only two noncolluding authorities and far less communication which is better suited to the scenario where the Evaluator is a cloud service and the Crypto-Service Provider (CSP) is implemented in a trusted hardware component.
[0023] Regardless of the cryptographic primitive used, the main challenge in building an efficient algorithm for secure multi-party computation is in implementing the algorithm in a data-oblivious fashion, i.e., so that the execution path does not depend on the input. In general, any RAM program executable in bounded time T can be converted to a 0(TA3) Turing machine (TM), which is a theoretical computing machine invented by Alan Turing to serve as an idealized model for mathematical calculation and wherein 0(ΤΛ3) means that the complexity is proportional to T3. In addition, any bounded T-time TM can be converted to a circuit of size 0(T log T), which is data-oblivious. This implies that any bounded T-time executable RAM program can be converted to a data-oblivious circuit with a 0(ΤΛ3 log T) complexity. Such complexity is too high and is prohibitive in most applications. A survey of algorithms for which efficient data-oblivious implementations are unknown can be found in "Secure multi-party computation problems and their applications: A review and open problems", in New Security Paradigms Workshop, 2001, by W. Du and M. J. Atallah - the matrix factorization problem broadly falls into the category of Data Mining summarization problems.
[0024] Sorting networks were originally developed to enable sorting parallelization as well as an efficient hardware implementation. These networks are circuits that sort an input sequence (<¾, <¾, ... , n) into a monotonically increasing sequence (a'1( '2, ... , a'n). They are constructed by wiring together compare -and-swap circuits, their main building block. Several works exploit the data-obliviousness of sorting networks for cryptographic purposes. However, encryption is not always enough to ensure privacy. If an adversary can observe your access patterns to encrypted storage, they can still learn sensitive information about what your applications are doing. Oblivious RAM solves this problem by continuously shuffling memory as it is being accessed; thereby completely hiding what data is being accessed or even when it was previously accessed. In oblivious RAM, sorting is used as a means of generating data-oblivious random permutation. More recently, it has been used to perform data-oblivious computations of a convex hull, all-nearest neighbors, and weighted set intersection.
[0025] The present principles propose a method based on secure multi-party sorting which is close to weighted set intersection but which incorporates garbled circuits. Figure 2 depicts the actors or parties in the privacy-preserving matrix factorization system, according to the present principles. They are as follows:
I. The Recommender System (RecSys) 230, an entity that performs the privacy- preserving matrix factorization operation. In particular, the RecSys wishes to learn the item profiles V 240, as extracted from matrix factorization on user ratings without learning anything useful about the users or extracted from user data other than the item profiles.
II. A Crypto-Service Provider (CSP) 250, that will enable the secure computation without learning anything useful about the users or extracted from user data.
III. A Source, consisting of one or more users 210, each having a set of ratings to a set of items 220. Each user i G [n] consents to the profiling of items based on their ratings j) G JVC through matrix factorization, but do not wish to reveal to the recommender their ratings or even which items they have rated. Equivalently, the Source may represent a database containing the data of one or more users.
[0026] According to the present principles, a protocol is proposed that allows the RecSys to execute matrix factorization to provide item profiles while neither the RecSys nor the CSP learn anything other than the item profiles, i.e., V, which is the sole output of RecSys in Figure 2. In particular, neither should learn a user's ratings, or even which items the user has actually rated. A skilled artisan will clearly recognize that a protocol that allows the recommender to learn both user and item profiles reveals too much: in such a design, the recommender can trivially infer a user's ratings from the inner product in (3). As such, the present principles propose a privacy-preserving protocol in which the recommender learns only the item profiles.
[0027] The item profile can be seen as a metric which defines an item as a function of the ratings of a set of users/records. Similarly, a user profile can be seen as a metric which defines a user as a function of the ratings of a set of users/records. In this sense, an item profile is a measure of approval/disapproval of an item, that is, a reflection of the features or charateristics of an item. And a user profile is a measure of the likes/dislikes of a user, that is, a reflection of the user's personality. If calculated based on a large set of users/records, an item or user profile can be seen as an independent measure of the item or user, respectively. One with skill in the art will realize that there is a utility in learning the item profiles alone. First, the embedding of items in ld through matrix factorization allows the recommender to infer (and encode) similarity: items whose profiles have small Euclidean distance are items that are rated similarly by users. As such, the task of learning the item profiles is of interest to the recommender beyond the actual task of recommendations. In particular, the users may not need or wish to receive recommendations, as may be the case if the Source is a database. Second, having obtained the item profiles, there is a trivia: the recommender can use them to provide relevant recommendations without any additional data revelation by users. The recommender can send V to a user (or release it publicly); knowing her ratings per item, user i can infer her (private) profile, ui 5 by solving (2) with respect to u , for given V (this is a separable problem), and each user can obtain her profile by performing a ridge regression over her ratings. Having Uj and V the user can predict all her ratings to other items locally through (4). This is the subject of a co-pending application by the inventors filed on the same date as this application and titled "A METHOD AND SYSTEM FOR PRIVACY- PRESERVING RECOMMENDATION BASED ON MATRIX FACTORIZATION AND RIDGE REGRESSION".
[0028] Both of the scenarios discussed above presume that neither the recommender nor the users object to the public release of V. For the sake of simplicity, as well as on account of the utility of such a protocol to the recommender, the present principles allow the recommender to learn the item profiles. However, there is also a way to extend this design so that users learn their predicted ratings while the recommender does not learn anything useful about the users or extracted from user data, not even V, as described in co-pending applications by the inventors filed on the same date as this application and titled "A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION TO RATING CONTRIBUTING USERS BASED ON MATRIX FACTORIZATION" and "A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION BASED ON MATRIX FACTORIZATION AND RIDGE REGRESSION". [0029] One skilled in the art will understand that, in general, either the output of the profile V or the rating predictions for a user may reveal something about other users' ratings. In pathological cases where there are, e.g., only two users, both revelations may let the users discover each other's ratings. The present principles do not focus on such cases. When the privacy implications of the revelation of either item profiles or individual ratings are not tolerable, techniques such as differential privacy can be used to add noise to these outputs and protect against suck leaks.
[0030] According to the present principles, it is assumed that the security guarantees will hold under the honest but curious threat model. In other words, the RecSys and CSP follow the protocols as prescribed; however, these interested parties may elect to analyze protocol transcripts, even off-line, in order to infer some additional information. It is further assumed that the recommender and CSP do not collude.
[0031] The preferred embodiment of the present principles comprises a protocol satisfying the flowchart 300 in Figure 3 and described by the following steps:
PI. The Source reports to the RecSys how many pairs of tokens (ratings) and items are going to be submitted for each participating record 310. The set or records includes more than one record and the set of tokens per record includes at least one token.
P2. The CSP generates a public encryption key for a partially homomorphic scheme, ξ, and sends it to all users (Source) 320. A skilled artisan will appreciate that homomorphic encryption is a form of encryption which allows specific types of computations to be carried out on ciphertext and obtain an encrypted result which decrypted matches the result of operations performed on the plaintext. For instance, one person could add two encrypted numbers and then another person could decrypt the result, without either of them being able to find the value of the individual numbers. A partially homomorphic encryption is homomorphic with respect to one operation (addition or multiplication) on plaintexts. A partially homomorphic encryption may be homomorphic with respect to addition and multiplication to a scalar.
P3. Each user encrypts its data using its key and sends her encrypted data to the RecSys 330. In particular, for every pair (j, where j is the item id and ri - is the rating user i gave to j, the user encrypts this pair using the public encryption key.
P4. The RecSys ads a mask η to the encrypted data and sends the masked and encrypted data to the CSP 340. One skilled in the art will understand that a mask is a form of data obfuscation, and could be as simple as adding a random number generator or shuffling by a random number.
P5. The CSP decrypts the masked data 350.
P6. The RecSys receives or determines a separate set of items 360, on which to compute the matrix factorization. This set of items may comprise all the items in the corpus, a subset of all the items, or even items not present in the records.
P7. The Recsys sends to the CSP the complete specifications needed to build a garbled circuit 370, including the dimension of the user and item profiles (i.e., parameter d) 372, the total number of ratings (i.e., parameter M) 374, the total number of users and of items 376 and the number of bits used to represent the integer and fractional parts of a real number in the garbled circuit 378. The separate set of items, if not all the items present in the records, will be included in the parameters.
P8. The CSP prepares what is known to the skilled artisan as a garbled circuit that performs matrix factorization 380 on the records with respect to the separate set of items. In order to be garbled, a circuit is first written as a Boolean circuit 382. The input to the circuit comprises the masks that the RecSys used to mask the user data. Inside the circuit, the mask is used to unmask the data, and then perform matrix factorization. The output of the circuit is V, the item profiles. No knowledge is gained about the contents of any individual record and of any information extracted from the records other than the item profiles.
P9. The CSP sends the garbled circuit for matrix factorization to the RecSys 385.
Specifically, the CSP processes gates into garbled tables and transmits them to the RecSys in the order defined by circuit structure.
P10. Through oblivious transfer 390 between the RecSys and the CSP 392, the RecSys learns the garbled values of the decrypted and masked records, without either itself or the CSP learning the actual values. A skilled artisan will understand that an oblivious transfer is a type of transfer in which a sender transfers one of potentially many pieces of information to a receiver, which remains oblivious as to what piece (if any) has been transferred.
Pl l. The RecSys evaluates the garbled circuit that calculates the item profiles V and outputs the item profiles V 395.
[0032] Technically, this protocol leaks beyond V also the number of tokens provided by each user, This can be rectified through a simple protocol modification, e.g., by "padding" records submitted with appropriately "null" entries until reaching pre-set maximum number 312. For simplicity, the protocol was described without this "padding" operation.
[0033] As garbled circuits can only be used once, any future computation on the same ratings would require the users to re-submit their data through proxy oblivious transfer. A proxy oblivious transfer is an oblivious transfer is which 3 or more parties are involved. For this reason, the protocol of the present principles adopted the hybrid approach, combining public -key encryption with garbled circuits.
[0034] In the present principles, public-key encryption is used as follows: Each user i encrypts her respective inputs (j, rt ) under the public key, pkcsp, provided by the CSP with a semantically secure encryption algorithm ^Pkcsp, and, for each item j rated, the user submits a pair (i,c) with c = ^VkCSP (j > ri,j) to the RecSys, where M ratings are submitted in total. A user that submitted her ratings can go off-line.
[0035] The CSP public-key encryption algorithm is partially homomorphic: a constant can be applied to an encrypted message without the knowledge of the corresponding decryption key. Clearly, an additively homomorphic scheme such as Paillier or Regev can also be used to add a constant, but hash-ElGamal, which is only partially homomorphic, suffices and can be implemented more efficiently in this case.
[0036] Upon receiving M ratings from users - recalling that the encryption is partially homomorphic - the RecSys obscures them with random masks c = c © 17, where 17 is a random or pseudo-random variable and © is an XOR operation. The RecSys sends them to the CSP together with the complete specifications needed to build a garbled circuit. In particular, the RecSys specifies the dimension of the user and item profiles (i.e., parameter d), the total number of ratings (i.e., parameter M), and the total number of users and of items, as well as the number of bits used to represent the integer and fractional parts of a real number in the garbled circuit.
[0037] Whenever the RecSys wishes to perform matrix factorization over M accumulated ratings, it reports M to the CSP. The CSP may provide the RecSys with a garbled circuit that (a) decrypts the inputs and then (b) performs matrix factorization. In "Privacy-preserving ridge regression on hundreds of millions of records", in IEEE S&P, 2013, by V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, and N. Taft, decryption within the circuit is avoided by using masks and homomorphic encryption. The present principles utilize this idea to matrix factorization, but only require a partially homomorphic encryption scheme.
[0038] Upon receiving the encryptions, the CSP decrypts them and gets the masked values (i, (j, r^j) 0 17) . Then, using the matrix factorization as a blueprint, the CSP prepares a Yao's garbled circuit that:
(a) Takes as input the garbled values corresponding to the masks 17 ;
(b) Removes the masks 17 from to recover the corresponding tuples 7 j);
(c) Performs matrix factorization; and
(d) Outputs the item profiles V.
[0039] The computation of matrix factorization by the gradient descent operations outlined in (4) and (5) involves additions, subtractions and multiplications of real numbers. These operations can be efficiently implemented in a circuit. The K iterations of gradient decent (4) correspond to K circuit "layers", each computing the new values of profiles from values in the preceding layer. The outputs of the circuit are the item profiles V, while the user profiles are discarded.
[0040] One with skill in the art will observe that the time complexity of computing each iteration of gradient descent is O( ) , when operations are performed in the clear, e.g., in the RAM model. The computation of each gradient (5) involves adding 2M terms, and profile updates (4) can be performed in 0 n + m = 0 ( ) .
[0041] The main challenge in implementing gradient descent as a circuit lies in doing so efficiently. To illustrate this, one may consider the following naive implementation:
Ql. For each pair (i, j) £ [n] x [m] , generate a circuit that computes from input the indicators 5ί - = ;-)<=jvf, which is 1 if i rated j and 0 otherwise. Q2. At each iteration, using the outputs of these circuits, compute each item and user gradient as a summation over m and n products, respectively, where:
FMiF(U,V) = -2 ^ Sij x vj (ry - {uh Vj)) + 2AUi
j: i,j)eM
(6)
FW.F(U,V) = -2 ^ Stj x Ui (ry - (^, vy» + 2/iv,
i (i,j)eM
[0042] Unfortunately, this implementation is inefficient: every iteration of the gradient descent algorithm will have a circuit complexity of 0 n m . When M « n x m, as it is usually the case in practice, the above circuit is drastically less efficient than gradient descent in the clear. In fact, the quadratic cost 0 n m is prohibitive for most datasets. The inefficiency of the naive implementation arises from the inability to identify which users rate an item and which items are rated by a user at the time of the circuit design, mitigating the ability to leverage the inherent sparsity in the data.
[0043] Conversely, according to the preferred embodiment of the present principles, a circuit implementation is provided based on sorting networks whose complexity is θ((η + m + M)log2 (n + m + M)), i.e., within a polylogarithmic factor of the implementation in the clear. In summary, both the input data, corresponding to the tuples and placeholders _|_ for both the user and item profiles are stored together in an array. Through appropriate sorting operations, user or item profiles can be placed close to the input with which they share an identifier. Linear passes through the data allow the computation of gradients, as well as updates of the profiles. When sorting, the placeholder is treated as +∞, i.e., larger than any other number.
[0044] The matrix factorization algorithm according to a preferred embodiment of the present principles and satisfying the flowchart 400 in Figure 4 can be described by the following steps:
CI. Initialize matrix S , 410
The algorithm receives as input the sets Lj = {(/, ¾): G JVC], or equivalently, the tuples {( ,}, G M] and constructs an n + m + M array of tuples. The first n and m tuples of S serve as placeholders for the user and item profiles, respectively, while the remaining M tuples store the inputs Lj. More specifically, for each user i G [n] , the algorithm constructs a tuple (i, _|_, 0, _|_, uit _|_ ), where Uj G ld is the initial profile of user i, selected at random. For each item j G [m] , the algorithm constructs the tuple (_!_,_/, 0, _|_, _L, Vj , _L ), where Vj G R " d is the initial profile of item j, also selected at random. Finally, for each pair G M, the algorithm constructs the corresponding tuple _L, _L ), where ri - is the rating of user i to item j . The resulting array is as shown in Figure 5(A). Denoting by si k the ί-th element of the k-th tuple, these elements serve the following roles:
(a) sl k : user identifiers in [n] ;
(b) s2>k : item identifiers in [m] ;
(c) s3 k : a binary flag indicating if the tuple is a "profile" or "input" tuple;
(d) 54 fe : ratings in "input" tuples;
(e) sS k. user profiles in ld ;
(f ) s6 k : item profiles in ld .
C2. Sort tuples in increasing order with respect to the user ids (with respect to rows 1 and 3), 420. If two ids are equal, break ties by comparing tuple flags, i.e., the 3rd elements in each tuple. Hence, after sorting, each "user profile" tuple is succeeded by "input" tuples with the same id:
C3. Copy user profiles (left pass), 430:
,fe «- ¾fe * ,fe-i + (l - ¾fe) * ss,k. fo k = 2, ... , M + n
C4. Sort tuples in increasing order with respect to item ids (with respect to rows 2 and 3) 440. If two ids are equal, break ties by comparing tuple flags, i.e., the 3rd elements in each tuple.
C5. Copy item profiles (left pass), 450:
s&,k <- ¾fc * ¾fe-i + (l - s3ik) * s6ik, fo k = 2, ... , M + m
C6. Compute the gradient contributions 460 Vk<M:
¾fc * 2y¾fe(s4|fe - (s5ik, s6ik)) + (l - ¾fe) * s5ik
fo V/ < M s3,k * 2YSs,k(s4,k - (s5>k, s6>k)) + (l - s3>k) * s6ik
C7. Update item profiles (right pass), 470:
Se,k <- ¾fc + s3ik+1 * s6ik+1 + (l - s3ik) * 2y i56|fe, fo k = M + n - 1, ... 1 C8. Sort tuples with respect to rows 1 and 3, 475
C9. Update user profiles (right pass), 480:
s5,k «- s5,k + s3ik+1 * s5ik+1 + (l - s3ik) * 2rAs5 k, fo k = M + n -
CIO. If the number of iterations is less than K, goto C3, 485
Cl l. Sort tuples with respect to rows 3 and 2, 490
C12. Output item profiles s6 k for k = 1, ... , m, 495, wherein the output may be restricted to at least one item profile. 5] The gradient descent iterations comprise the following three major steps:
A. Copy profiles: At each iteration, the profiles Uj and vj of each respective user i and each item j are copied to the corresponding elements s5 fe and s6 k of each
"inpu 'tuple in which i and j appear. This is implemented in steps C2 to C5 of the algorithm. To copy, e.g., the user profiles, S is sorted using the user id (i.e., sl k) as a primary index and the flag (i.e., s3 k) as a secondary index. An example of such a sorting applied to the initial state of S can be found in Figure 5(B). Subsequently, the user ids are copied by traversing the array from left to right (a "left" pass), as described formally in step C3 of the algorithm. This copies s5 k from each "profile" tuple to its adjacent "input" tuples; item profiles are copied similarly.
B. Compute gradient contributions: After profiles are copied, each "input" tuple corresponding to, e.g., (.,_/ ), stores the rating ri - (in s4 fe) as well as the profiles ut and Vj (in s5 >k and s6 k, respectively), as computed in the last iteration. From these, the following quantities are computed: ^ (Γ^-— (it^ V )) and Uj (r iy-— (u^ v y)) , which can be seen as the "contribution" of the tuple in the gradients with respect to. Ui and Vj, as given by (5). These replace the s5 k and s6 k elements of the tuple, as indicated by step C6 of the algorithm. Through appropriate use of flags, this operation only affects "input" tuples, and leaves "profile" tuples unchanged.
C. Update profiles: Finally, the user and item profiles are updated, as shown in steps C7 to C9 of the algorithm. Through appropriate sorting, "profile" tuples are made again adjacent to the "input" tuples with which they share ids. The updated profiles are computed through a right-to-left traversing of the array (a "right pass"). This operation adds the contributions of the gradients as it traverses "input" tuples. Upon encountering a "profile" tuple, the summed gradient contributions are added to the profile, scaled appropriately. After passing a profile, the summation of gradient contributions restarts from zero, through appropriate use of the flags s3 k,s3 k+1.
[0046] The above operations are to be repeated K times, that is, the number of desirable iterations of gradient descent. Finally, at the termination of the last iteration, the array is sorted with respect to the flags (i.e., s3 k) as a primary index, and the item ids (i.e., s2,fc) as a secondary index. This brings all item profile tuples in the first m positions in the array, from which the item profiles can be outputted. Furthermore, in order to obtain the user profiles, at the termination of the last iteration, the array is sorted with respect to the flags (i.e., s3 k) as a primary index, and the user ids (i.e., sl k) as a secondary index. This brings all user profile tuples to the first n positions in the array, from which the user profiles can be outputted.
[0047] One with skill in the art will recognize that each of the above operations is data- oblivious, and can be implemented as a circuit. Copying and updating profiles requires n + m + M) gates, so the overall complexity is determined by sorting which, e.g., using Batcher's circuit yields a 0((n + m + M)log2 (n + m + M)) cost. Sorting and the gradient computation in step C6 of the algorithm are the most computationally intensive operations; fortunately, both are highly parallelizable. In addition, sorting can be further optimized by reusing previously computed comparisons at each iteration. In particular, this circuit can be implemented as a Boolean circuit (e.g., as a graph of OR, AND, NOT and XOR gates), which allows the implementation to be garbled, as previously explained.
[0048] According to the present principles, the implementation of the matrix factorization algorithm described above together with the protocol previously described provides a novel method for matrix factorization, in a privacy-preserving fashion. In addition, this solution yields a circuit with a complexity within a polylogarithmic factor of matrix factorization performed in the clear by using sorting networks. Furthermore, an additional advantage of this implementation is that the garbling and the execution of this circuit are highly parallelizable.
[0049] In an implementation of a system according to the present principles, the garbled circuit construction was based on FastGC, a publicly available garbled circuit framework. FastGC is a Java-based open-source framework, which enables circuit definition using elementary XOR, OR and AND gates. Once the circuits are constructed, the framework handles garbling, oblivious transfer and the complete evaluation of the garbled circuit. However, before garbling and executing the circuit, FastGC represents the entire ungarbled circuit in memory as a set of Java objects. These objects incur a significant memory overhead relative to the memory footprint that the ungarbled circuit should introduce, as only a subset of the gates is garbled and/or executed at any point in time. Moreover, although FastGC performs garbling in parallel to the execution process as described above, both operations occur in a sequential fashion: gates are processed one at a time, once their inputs are ready. A skilled artisan will clearly recognize that this implementation is not amenable to parallelization.
[0050] As a result, the framework was modified to address these two issues, reducing the memory footprint of FastGC but also enabling parallelized garbling and computation across multiple processors. In particular, we introduced the ability to partition a circuit horizontally into sequential "layers", each one comprising a set of vertical "slices" that can be executed in parallel. A layer is created in memory only when all its inputs are ready. Once it is garbled and evaluated, the entire layer is removed from memory, and the following layer can be constructed, thus limiting the memory footprint to the size of the largest layer. The execution of a layer is performed using a scheduler that assigns its slices to threads, enabling them to run in parallel. Although parallelization was implemented on a single machine with multiple cores, the implementation can be extended to run across different machines in a straightforward manner since no shared state between slices is assumed.
[0051] Finally, to implement the numerical operations outlined in the algorithm, FastGC was extended to support addition and multiplications over the reals with fixed-point number representation, as well as sorting. For sorting, Batcher's sorting network was used. Fixed- point representation introduced a tradeoff between the accuracy loss resulting from truncation and the size of circuit.
[0052] Furthermore, the implementation of the algorithm was optimized in multiple ways, in particular:
(a) It reduced the cost of sorting by reusing comparisons computed in the beginning of the circuit's execution:
The basic building block of a sorting network is a compare-and-swap circuit, that compares two items and swaps them if necessary, so that the output pair is ordered.
The sorting operations (lines C4 and C8) of the matrix factorization algorithm perform identical comparisons between tuples at each of the K gradient descent iterations, using exactly the same inputs per iteration. In fact, each sorting permutes the tuples in array S in exactly the same manner, at each iteration. This property is exploited by performing the comparison operations for each of these sortings only once. In particular, sortings of tuples of the form (i, j, flag, rating) are performed in the beginning of the computation (without the payload of user or item profiles), e.g., with respect to i and the flag first, j and the flag, and back to i and the flag. Subsequently, the outputs of the comparison circuits are reused in each of these sortings as input to the swap circuits used during gradient descent. As a result, the "sorting" network applied at each iteration does not perform any comparisons, but simply permutes tuples (i.e., it is a "permutation" network);
(b) It reduced the size of array S:
Precomputing all comparisons allows us to also drastically reduce the size of tuples in S. To begin with, one with skill in the art can observe that the rows corresponding to user or item ids are only used in matrix factorization algorithm as input to comparisons during sorting. Flags and ratings are used during copy and update phases, but their relative positions are identical at each iteration. Moreover, these positions can be computed as outputs of the sorting of the tuples (i, j, flag, rating) at the beginning of our computation. As such, the "permutation" operations performed at each iteration need only be applied to the user and item profiles; all other rows can be removed from array S. One more improvement reduces the cost of permutations by an additional factor of 2: to fix one set of profiles, e.g., users, and permute only item profiles. Then, item profiles rotate between two states, each one reachable from the other through permutation: one in which they are aligned with user profiles and partial gradients are computed, and one in which item profiles are updated and copied.
(c) It optimized swap operations by using XORs:
Given that XOR operations can be executed for "free", optimization of comparison, swap, update and copying operations is performed by using XORs wherever possible. One with skilled in the art will appreciate that free-XOR gates can be garbled without the associated garbled tables and the corresponding hashing or symmetric key operations, representing a marked improvement in computation and communication.
(d) It parallelized computations: Sorting and gradient computations constitute the bulk of the computation in the matrix factorization circuit (copying and updating contribute no more than 3% of the execution time and 0.4% of the non-xor gates); these operations are parallelized through this extension of FastGC. Gradient computations are clearly parallelizable; sorting networks are also highly parallelizable (parallelization is the main motivation behind their development). Moreover, since many of the parallel slices in each sort are identical, the same FastGC objects defining the circuit slices are reused with different inputs, significantly reducing the need to repeatedly create and destroy objects in memory.
[0053] It is to be understood that the present principles may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. Preferably, the present principles are implemented as a combination of hardware and software. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage device. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s). The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof), which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.
[0054] Figure 6 shows a block diagram of a minimum computing environment 600 used to implement the present principles. The computing environment 600 includes a processor 610, and at least one (and preferably more than one) I/O interface 620. The I/O interface can be wired or wireless and, in the wireless implementation is pre-configured with the appropriate wireless communication protocols to allow the computing environment 600 to operate on a global network (e.g., internet) and communicate with other computers or servers (e.g., cloud based computing or storage servers) so as to enable the present principles to be provided, for example, as a Software as a Service (SAAS) feature remotely provided to end users. One or more memories 630 and/or storage devices (HDD) 640 are also provided within the computing environment 600. The computing environment 600 or a plurality of computer environments 600 may implement the protocol Pl-Pl l (Figure 3), for the matrix factorization CI -CI 2 (Figure 4) according to one embodiment of the present principles. In particular, in an embodiment of the present principles, a computing environment 600 may implement the RecSys 230; a separate computing environment 600 may implement the CSP 250 and a Source may contain one or a plurality of computer environments 600, each associated with a distinct user 210, including but not limited to desktop computers, cellular phones, smart phones, phone watches, tablet computers, personal digital assistant (PDA), netbooks and laptop computers, used to communicate with the RecSys 230 and the CSP 250. In addition, the CSP 250 can be included in the Source, or equivalently, included in the computer environment of each User 210 of the Source.
[0055] It is to be further understood that, because some of the constituent system components and method steps depicted in the accompanying figures are preferably implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the manner in which the present principles are programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present principles.
[0056] Although the illustrative embodiments have been described herein with reference to the accompanying figures, it is to be understood that the present principles are not limited to those precise embodiments, and that various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present principles. All such changes and modifications are intended to be included within the scope of the present principles as set forth in the appended claims.

Claims

1. A method for securely profiling items through matrix factorization, said method
comprising:
receiving a set of records (220) from a Source, wherein a record comprises a set of tokens and a set of items, and wherein each record is kept secret from parties other than said Source;
receiving at least one separate item (360); and
evaluating said set of records and said at least one separate item in a Recommender
(RecSys) (230) by using a garbled circuit (395) based on matrix factorization, wherein the output of the garbled circuit comprises item profiles for said at least one separate item.
2. The method according to claim 1, further comprising:
designing the garbled circuit in a Crypto-System Provider (CSP) to perform matrix factorization on said set of records (380) and said at least one separate item (360), wherein the garbled circuit output comprises the item profiles for said at least one separate item; and
transferring the garbled circuit to the RecSys (385).
3. The method according to claim 2, wherein the step of designing comprises:
designing a matrix factorization operation as a Boolean circuit (382).
The method according to claim 3 wherein the step of designing a matrix factorization circuit comprises:
constructing an array of said set of records (410); and
performing the operations of sorting (420, 440, 470, 490), copying (430, 450), updating (470, 480), comparing (480) and computing gradient contributions (460) on array.
5. The method according to claim 2, further comprising: encrypting the set of records to create encrypted records (330), wherein the step of encrypting is performed prior to the step of receiving a set of records.
6. The method according to claim 5, further comprising:
generating public encryption keys in the CSP; and
sending said keys to the Source (320).
7. The method according to claim 5, wherein the encryption is a partially homomorphic encryption (320), said method further comprises:
masking the encrypted records in the RecSys to create masked records (340); and decrypting the masked records in the CSP to create decrypted-masked records (350).
8. The method according to claim 7, wherein the step of designing (380) comprises:
unmasking the decrypted-masked records inside the garbled circuit prior to processing them.
9. The method according to claim 7 further comprising:
performing oblivious transfers (390) between the CSP and the RecSys (392), wherein the RecSys receives the garbled values of the decrypted-masked records and the records are kept private from the RecSys and the CSP.
10. The method according to claim 1, further comprising:
receiving the number of tokens and items of each record (220, 310).
11. The method according to claim 1, further comprising :
padding each record with null entries when the number of tokens of each record is smaller than a value representing a maximum value, in order to create records with a number of tokens equal to said value (312).
12. The method according to claim 1, wherein the Source of the set of records is one of a database and a set of users (210), wherein each user is a source of one record and said one record is kept secret from parties other than said each user.
13. The method according to claim 2, further comprising:
receiving a set of parameters for the design of the garbled circuit by said CSP, wherein the parameters were sent by said RecSys (370).
14. A system for securely profiling items through matrix factorization, said system
comprising a Source which will provide a set of records, a Crypto-Service Provider (CSP) which will provide a secure matrix factorization circuit and a RecSys which will evaluate the records, such that the records are kept private from parties other than said Source, wherein said Source, said CSP and said RecSys each comprise:
a processor (602), for receiving at least one input/output (604); and
at least one memory (606, 608) in signal communication with said processor, and wherein the RecSys processor is configured to:
receive a set of records, wherein each record comprises a set of tokens and a set of items, and wherein each record is kept secret;
receive at least one separate item; and
evaluate said set of records and said at least one separate item with a garbled circuit based on matrix factorization, wherein the output of the garbled circuit comprises item profiles for said at least one separate item.
15. The system according to claim 14, wherein the CSP processor is configured to:
design the garbled circuit to perform matrix factorization of said set of records and said at least one separate item, wherein the garbled circuit output comprises the item profiles for said at least one separate item; and
transfer the garbled circuit to the RecSys.
16. The system according to claim 15, wherein the CSP processor is configured to design the garbled circuit by being configured to:
design a matrix factorization operation as a Boolean circuit.
17. The system according to claim 16 wherein the CSP processor is configured to design the matrix factorization circuit by being configured to: construct an array of said set of records; and
perform the operations of sorting, copying, updating , comparing and computing gradient contributions on the array.
18. The system according to claim 15, wherein the Source processor is configured to:
encrypt the set of records to create encrypted records prior to providing said set of records.
19. The system according to claim 18, wherein the CSP processor is further configured to: generate public encryption keys; and
send said keys to the Source.
20. The system according to claim 18, wherein the encryption is a partially homomorphic encryption, and wherein the RecSys processor is further configured to:
mask the encrypted records to create masked records; and the CSP processor is further configured to:
decrypt the masked records to create decrypted-masked records.
The system according to claim 20, wherein the CSP processor is configured to design the garbled circuit by being further configured to:
unmask the decrypted-masked records inside the garbled circuit prior to processing them.
22. The system according to claim 20, wherein the RecSys processor and the CSP processor are further configured to perform oblivious transfers, wherein said RecSys receives the garbled values of the decerypted-masked records and the records are kept private from the RecSys and the CSP.
23. The system according to claim 14, wherein the RecSys processor is further configured to: receive the number of tokens of each record, wherein the number of tokens were sent by said Source.
24. The system according to claim 14, wherein the Source processor is configured to: pad each record with null entries when the number of tokens of each record is smaller than a value representing a maximum value, in order to create records with a number of tokens equal to said value.
25. The system according to claim 14, wherein the Source of the set of records is one of a database and a set of users, and wherein if the Source is a set of users, each user comprises a processor (602), for receiving at least one input/output (604); and at least one memory (606, 608), and each user is a source of one record, wherein said one record is kept secret from parties other than said each user.
26. The system according to claim 15, wherein the CSP processor is further configured to: receive a set of parameters for the design of the garbled circuit, wherein the parameters were sent by said RecSys.
EP14731436.3A 2013-08-09 2014-05-01 A method and system for privacy preserving matrix factorization Withdrawn EP3031165A2 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201361864088P 2013-08-09 2013-08-09
US201361864085P 2013-08-09 2013-08-09
US201361864098P 2013-08-09 2013-08-09
US201361864094P 2013-08-09 2013-08-09
PCT/US2013/076353 WO2014137449A2 (en) 2013-03-04 2013-12-19 A method and system for privacy preserving counting
PCT/US2014/036357 WO2014138752A2 (en) 2013-03-04 2014-05-01 A method and system for privacy preserving matrix factorization

Publications (1)

Publication Number Publication Date
EP3031165A2 true EP3031165A2 (en) 2016-06-15

Family

ID=49955504

Family Applications (1)

Application Number Title Priority Date Filing Date
EP14731436.3A Withdrawn EP3031165A2 (en) 2013-08-09 2014-05-01 A method and system for privacy preserving matrix factorization

Country Status (4)

Country Link
EP (1) EP3031165A2 (en)
JP (3) JP2016510913A (en)
KR (1) KR20160041028A (en)
CN (3) CN105009505A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11625752B2 (en) 2018-11-15 2023-04-11 Ravel Technologies SARL Cryptographic anonymization for zero-knowledge advertising methods, apparatus, and system

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10437953B2 (en) * 2016-07-08 2019-10-08 efabless corporation Systems for engineering integrated circuit design and development
EP3319001A1 (en) * 2016-11-02 2018-05-09 Skeyecode Method for securely transmitting a secret data to a user of a terminal
CN106548207B (en) * 2016-11-03 2018-11-30 北京图森未来科技有限公司 A kind of image processing method neural network based and device
CN107135061B (en) * 2017-04-17 2019-10-22 北京科技大学 A kind of distributed secret protection machine learning method under 5g communication standard
CN107302498B (en) * 2017-06-21 2019-08-27 安徽大学 The multiple domain QoS path calculation method of secret protection is supported in a kind of SDN network
SG11202001001WA (en) 2017-08-31 2020-03-30 Visa Int Service Ass Single node multi-party encryption
JP6759168B2 (en) * 2017-09-11 2020-09-23 日本電信電話株式会社 Obfuscation circuit generator, obfuscation circuit calculator, obfuscation circuit generation method, obfuscation circuit calculation method, program
CN109756442B (en) * 2017-11-01 2020-04-24 清华大学 Data statistics method, device and equipment based on garbled circuit
EP3729340A4 (en) * 2017-12-18 2021-12-29 Mythic, Inc. Systems and methods for mapping matrix calculations to a matrix multiply accelerator
US11461435B2 (en) * 2017-12-18 2022-10-04 University Of Central Florida Research Foundation, Inc. Techniques for securely executing code that operates on encrypted data on a public computer
CN110909356B (en) 2018-09-18 2022-02-01 百度在线网络技术(北京)有限公司 Secure multiparty computing method, apparatus, device and computer readable medium
CN109992979B (en) * 2019-03-15 2020-12-11 暨南大学 Ridge regression training method, computing device and medium
CN110209994B (en) * 2019-04-25 2022-12-23 广西师范大学 Matrix decomposition recommendation method based on homomorphic encryption
CN110086717B (en) * 2019-04-30 2021-06-22 创新先进技术有限公司 Method, device and system for data security matching
CN110196944B (en) * 2019-05-07 2021-06-01 深圳前海微众银行股份有限公司 Method and device for recommending serialized information
CN110363000B (en) * 2019-07-10 2023-11-17 深圳市腾讯网域计算机网络有限公司 Method, device, electronic equipment and storage medium for identifying malicious files
CN110795631B (en) * 2019-10-29 2022-09-06 支付宝(杭州)信息技术有限公司 Push model optimization and prediction method and device based on factorization machine
CN110990871B (en) * 2019-11-29 2023-04-07 腾讯云计算(北京)有限责任公司 Machine learning model training method, prediction method and device based on artificial intelligence
CN111125517B (en) * 2019-12-06 2023-03-14 陕西师范大学 Implicit matrix decomposition recommendation method based on differential privacy and time perception
CN111259260B (en) * 2020-03-30 2023-06-02 九江学院 Privacy protection method in personalized recommendation based on sorting classification
CN111552852B (en) * 2020-04-27 2021-09-28 北京交通大学 Article recommendation method based on semi-discrete matrix decomposition
CN111553126B (en) * 2020-05-08 2022-05-24 北京华大九天科技股份有限公司 Method for obtaining matrix decomposition time based on machine learning training model
CN111857649B (en) * 2020-06-22 2022-04-12 复旦大学 Fixed point number coding and operation system for privacy protection machine learning
EP4014427B1 (en) * 2020-08-14 2023-05-03 Google LLC Online privacy preserving techniques
CN112528303B (en) * 2020-12-11 2024-01-26 重庆交通大学 Multi-user privacy recommendation method based on NTRU encryption algorithm
IL279406B1 (en) 2020-12-13 2024-09-01 Google Llc Privacy-preserving techniques for content selection and distribution
CN112311546B (en) * 2020-12-25 2021-04-09 鹏城实验室 Data security judgment method, device, equipment and computer readable storage medium
IL280056A (en) * 2021-01-10 2022-08-01 Google Llc Using secure mpc and vector computations to protect access to information in content distribution
US11113707B1 (en) 2021-01-22 2021-09-07 Isolation Network, Inc. Artificial intelligence identification of high-value audiences for marketing campaigns
IL281328A (en) 2021-03-08 2022-10-01 Google Llc Flexible content selection processes using secure multi-party computation
CN113051587B (en) * 2021-03-10 2024-02-02 中国人民大学 Privacy protection intelligent transaction recommendation method, system and readable medium
WO2022216293A1 (en) * 2021-04-09 2022-10-13 Google Llc Processing of machine learning modeling data to improve accuracy of categorization
IL283674B2 (en) 2021-06-03 2024-09-01 Google Llc Privacy-preserving cross-domain experimental group partitioning and monitoring
EP4099609A1 (en) * 2021-06-04 2022-12-07 Zama SAS Computational network conversion for fully homomorphic evaluation
CN113779500B (en) * 2021-08-23 2024-01-30 华控清交信息科技(北京)有限公司 Data processing method and device for data processing
CN114564742B (en) * 2022-02-18 2024-05-14 北京交通大学 Hash learning-based lightweight federal recommendation method
CN114817999B (en) * 2022-06-28 2022-09-02 北京金睛云华科技有限公司 Outsourcing privacy protection method and device based on multi-key homomorphic encryption

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060069452A (en) * 2003-08-08 2006-06-21 코닌클리케 필립스 일렉트로닉스 엔.브이. System for processing data and method thereof
WO2005043808A1 (en) * 2003-11-03 2005-05-12 Koninklijke Philips Electronics N.V. Method and device for efficient multiparty multiplication
US8131732B2 (en) * 2008-06-03 2012-03-06 Nec Laboratories America, Inc. Recommender system with fast matrix factorization using infinite dimensions
US8972742B2 (en) * 2009-09-04 2015-03-03 Gradiant System for secure image recognition
US8676736B2 (en) * 2010-07-30 2014-03-18 Gravity Research And Development Kft. Recommender systems and methods using modified alternating least squares algorithm
CN102129463A (en) * 2011-03-11 2011-07-20 北京航空航天大学 Project correlation fused and probabilistic matrix factorization (PMF)-based collaborative filtering recommendation system
CN102129462B (en) * 2011-03-11 2014-06-18 北京航空航天大学 Method for optimizing collaborative filtering recommendation system by aggregation
US20140180760A1 (en) * 2011-03-18 2014-06-26 Telefonica, S.A. Method for context-aware recommendations based on implicit user feedback
US10102546B2 (en) * 2011-09-15 2018-10-16 Stephan HEATH System and method for tracking, utilizing predicting, and implementing online consumer browsing behavior, buying patterns, social networking communications, advertisements and communications, for online coupons, products, goods and services, auctions, and service providers using geospatial mapping technology, and social networking
US8478768B1 (en) * 2011-12-08 2013-07-02 Palo Alto Research Center Incorporated Privacy-preserving collaborative filtering
US8880439B2 (en) * 2012-02-27 2014-11-04 Xerox Corporation Robust Bayesian matrix factorization and recommender systems using same
CN102982107B (en) * 2012-11-08 2015-09-16 北京航空航天大学 A kind of commending system optimization method merging user, project and context property information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2014138752A2 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11625752B2 (en) 2018-11-15 2023-04-11 Ravel Technologies SARL Cryptographic anonymization for zero-knowledge advertising methods, apparatus, and system

Also Published As

Publication number Publication date
JP2016517069A (en) 2016-06-09
CN105144625A (en) 2015-12-09
CN105103487A (en) 2015-11-25
JP2016510913A (en) 2016-04-11
KR20160041028A (en) 2016-04-15
JP2016510912A (en) 2016-04-11
CN105009505A (en) 2015-10-28

Similar Documents

Publication Publication Date Title
US20160004874A1 (en) A method and system for privacy preserving matrix factorization
EP3031165A2 (en) A method and system for privacy preserving matrix factorization
Giacomelli et al. Privacy-preserving ridge regression with only linearly-homomorphic encryption
Nikolaenko et al. Privacy-preserving matrix factorization
Kim et al. Efficient privacy-preserving matrix factorization via fully homomorphic encryption
Liu et al. Secure multi-label data classification in cloud by additionally homomorphic encryption
Lin et al. A generic federated recommendation framework via fake marks and secret sharing
WO2021010896A1 (en) Method and system for distributed data management
Soykan et al. A survey and guideline on privacy enhancing technologies for collaborative machine learning
JP2023509589A (en) Privacy Preserving Machine Learning via Gradient Boosting
Xu et al. Hercules: Boosting the performance of privacy-preserving federated learning
Vadapalli et al. You may also like... privacy: Recommendation systems meet pir
Song et al. Sok: Training machine learning models over multiple sources with privacy preservation
Deng et al. Non-interactive and privacy-preserving neural network learning using functional encryption
Wang et al. Achieving private and fair truth discovery in crowdsourcing systems
Yu et al. A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective
Xu et al. FedG2L: a privacy-preserving federated learning scheme base on “G2L” against poisoning attack
Sielaff et al. Evaluation Framework for the Use of Privacy Preserving Technologies for Production Data
Dai et al. Privacy‐Preserving Sorting Algorithms Based on Logistic Map for Clouds
Hong et al. FedHD: A Privacy-Preserving Recommendation System with Homomorphic Encryption and Differential Privacy
Jung Ensuring Security and Privacy in Big Data Sharing, Trading, and Computing
Bao Privacy-Preserving Cloud-Assisted Data Analytics
Ajay Privacy Preservation using Federated Learning and Homomorphic Encryption: A Study
Basu Privacy-preserving recommendation system using federated learning
Chiang LFFR: Logistic Function For (single-output) Regression

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20150917

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20171201