WO2015157020A1

WO2015157020A1 - Method and apparatus for sparse privacy preserving mapping

Info

Publication number: WO2015157020A1
Application number: PCT/US2015/023336
Authority: WO
Inventors: Branislav Kveton; Salman SALAMATIAN; Nadia FAWAZ; Nina Taft
Original assignee: Thomson Licensing
Priority date: 2014-04-11
Filing date: 2015-03-30
Publication date: 2015-10-15

Abstract

A user may wish to release some public data, which is correlated with his private data, to an analyst in the hope of getting some utility. The public data can be distorted before its release according to a probabilistic privacy preserving mapping mechanism, which limits information leakage under utility constraints. The present principles provide a solution to speed up the computation of privacy preserving mappings. In particular, we recognize that privacy preserving mappings are sparse, that is, public data of a user may only be mapped to a limited selection of data points with non-zero probabilities. Subsequently, we generate sparse privacy preserving mappings by recasting the optimization problem as a sequence of linear programs and solving each of these incrementally using an adaptation of Dantzig-Wolfe decomposition.

Description

Method and Apparatus for Sparse Privacy Preserving Mapping

CROSS-REFERENCE TO RELATED APPLICATIONS

[1] This application is related to (1) U.S. Provisional Patent Application Serial No.

61/691,090 filed on August 20, 2012, and titled "A Framework for Privacy against Statistical Inference" (hereinafter "Fawaz"); (2) U.S. Provisional Patent Application Serial No.

61/867,543 filed on August 19, 2013, and titled "Method and Apparatus for Utility-Aware Privacy Preserving Mapping against Inference Attacks" (hereinafter "Fawaz2"); and (3) U.S. Provisional Patent Application Serial No. 61/867,546 filed on August 19, 2013, and titled "Method and Apparatus for Utility- Aware Privacy Preserving Mapping through Additive Noise" (hereinafter "Fawaz3"). The provisional applications are expressly incorporated by reference herein in their entirety.

TECHNICAL FIELD

[2] This invention relates to a method and an apparatus for preserving privacy, and more particularly, to a method and an apparatus for generating a privacy preserving mapping in a fast speed.

BACKGROUND

[3] Finding the right balance between privacy risks and big data rewards is a big challenge facing society today. Big data creates tremendous opportunity, especially for all services that offer personalized advice. Recommendation services are rampant today and offer advice on everything including movies, TV shows, restaurants, music, sleep, exercise, vacation, entertainment, shopping, and even friends. On one hand, people are willing to part with some of their personal data (e.g., movie watching history) for the sake of these services. The service, or other benefit that the user derives from allowing access to the user' s data may be referred to as utility. On the other hand, many users have some data about themselves they would prefer to keep private (e.g., their political affiliation, salary, pregnancy status, religion). Most individuals have both public and private data and hence they need to maintain a boundary between these different elements of their personal information. This is an enormous challenge because inference analysis on publicly released data can often uncover private data.

SUMMARY

[4] The present principles provide a method for processing user data for a user, comprising: accessing the user data, which includes private data and public data; determining a set of values that the public data of the user can map to, wherein size of the set of values is small; determining a privacy preserving mapping that maps the public data to released data, wherein the public data of the user only maps to values within the determined set of values; modifying the public data of the user based on the privacy preserving mapping; and releasing the modified data as the released data to at least one of a service provider and a data collecting agency. The present principles also provide an apparatus for performing these steps.

[5] The present principles also provide a method for processing user data for a first user and a second user, comprising: accessing the user data, which includes private data and public data; determining a first set of values that the public data of the first user can map to, wherein size of the first set of values is in the magnitude of order of ten; determining a second set of values that the public data of the second user can map to, wherein size of the determined second set of values is in the magnitude of order of ten, and the determined first set of values is different from the determined second set of values; determining a privacy preserving mapping that maps the public data to released data, wherein the public data of the first user only maps to values within the determined first set of values and the public data of the second user only maps to values within the determined second set of values; modifying the public data of the first user and the second user based on the privacy preserving mapping; and releasing the modified data as the released data to at least one of a service provider and a data collecting agency. The present principles also provide an apparatus for performing these steps.

[6] The present principles also provide a computer readable storage medium having stored thereon instructions for processing user data according to the methods described above.

BRIEF DESCRIPTION OF THE DRAWINGS

[7] FIG. 1A and FIG. IB are pictorial examples illustrating exemplary privacy preserving mappings under small and large distortion constraints, respectively.

[8] FIGs. 2A, 2B and 2C are pictorial examples illustrating effect of parameters on privacy-distortion tradeoff using synthetic data, census data and movie data, respectively.

[9] FIGs. 3A, 3B and 3C are pictorial examples illustrating ROC Curves for Naive Bayes Classifier with distortion constraint Δ = 0.02, 0.14 and 0.44, respectively, using census data.

[10] FIGs. 4A, 4B and 4C are pictorial examples illustrating ROC Curves for Logistic Regression with distortion constraint Δ = 0.04, 0.13 and 0.22, respectively, using movie data. [11] FIG. 5 is a pictorial example illustrating behavior of Sparse Privacy Preserving Mappings (SPPM) and exponential mechanism (ExpMec) using synthetic data.

[12] FIG. 6 is a pictorial examples illustrating time complexity and parameter sensitivity using synthetic data.

[13] FIG. 7 is a flow diagram depicting an exemplary method for preserving privacy, in accordance with an embodiment of the present principles.

[14] FIG. 8 is a block diagram depicting an exemplary privacy agent, in accordance with an embodiment of the present principles.

[15] FIG. 9 is a block diagram depicting an exemplary system that has multiple privacy agents, in accordance with an embodiment of the present principles.

DETAILED DESCRIPTION

[16] A number of research efforts have explored the idea of distorting the public data before they are released to preserve user's privacy. In some of these prior efforts, distortion aims at creating some confusion around user data, by making its value hard to distinguish from other possible values, in other efforts, distortion is designed to counter a particular inference threat (i.e., a specific classifier or analysis). Recently Fawaz proposed a new framework for data distortion, based on an information theory, that captures privacy leakage in terms of mutual information, where mutual information is a measure of mutual dependence between two random variables. Minimizing the mutual information between a user's private data and released data is attractive because it reduces the correlation between the private data and the publicly released data, and thus any inference analysis that tries to learn the private data from the publicly released data is rendered weak, if not useless. In other words, this approach is agnostic to the type of inference analysis used in any given threat.

[17] In the present application, we refer to the data a user would like to remain private as "private data," the data the user is willing to release as "public data," and the data the user actually releases as "released data." For example, a user may want to keep his political opinion private, and is willing to release his TV ratings with modification (for example, the user's actual rating of a program is 4, but he releases the rating as 3). In this case, the user's political opinion is considered to be private data for this user, the TV ratings are considered to be public data, and the released modified TV ratings are considered to be the released data. Note that another user may be willing to release both political opinion and TV ratings without modifications, and thus, for this other user, there is no distinction between private data, public data and released data when only political opinion and TV ratings are considered. If many people release political opinions and TV ratings, an analyst may be able to derive the correlation between political opinions and TV ratings, and thus, may be able to infer the political opinion of the user who wants to keep it private.

[18] The term analyst, which for example may be a part of a service provider' s system, as used in the present application, refers to a receiver of the released data, who ostensibly uses the data in order to provide utility to the user. Often the analyst is a legitimate receiver of the released data. However, an analyst could also illegitimately exploit the released data and infer some information about private data of the user. This creates a tension between privacy and utility requirements. To reduce the inference threat while maintaining utility the user may release a "distorted version" of data, generated according to a conditional probabilistic mapping, called "privacy preserving mapping" or "privacy mapping," designed under a utility constraint. [19] Regarding private data, this refers to data that the user not only indicates that it should not be publicly released, but also that he does not want it to be inferred from other data that he would release. Public data is data that the user would allow the privacy agent to release, possibly in a distorted way to prevent the inference of the private data.

[20] In one embodiment, public data is the data that the service provider requests from the user in order to provide him with the service. The user however will distort (i.e., modify) it before releasing it to the service provider. In another embodiment, public data is the data that the user indicates as being "public" in the sense that he would not mind releasing it as long as the release takes a form that protects against inference of the private data.

[21] As discussed above, whether a specific category of data is considered as private data or public data is based on the point of view of a specific user. For ease of notation, we call a specific category of data as private data or public data from the perspective of the current user. For example, when trying to design privacy preserving mapping for a current user who wants to keep his political opinion private, we call the political opinion as private data for both the current user and for another user who is willing to release his political opinion.

[22] In the present principles, we use the distortion between the released data and public data as a measure of utility. When the distortion is larger, the released data is more different from the public data, and more privacy is preserved, but the utility derived from the distorted data may be lower for the user. On the other hand, when the distortion is smaller, the released data is a more accurate representation of the public data and the user may receive more utility, for example, receive more accurate content recommendations.

[23] Distorting data or modifying data, in the context of recommendation systems, means altering a user' s profile. The framework of Fawaz casts the privacy problem as a convex optimization problem with linear constraints, where the number of variables grows quadratically with the size of the underlying alphabet that describes user's profiles. When the alphabet size can be huge, the enormous number of options for distorting user profiles presents a scalability challenge.

[24] There have been existing works on protecting privacy against statistical inference. In a work by Salman Salamatian et. al., titled "How to hide the elephant- or the donkey- in the room: Practical privacy against statistical inference for large data," IEEE GlobalSIP 2013, a method based on quantization was proposed to reduce the number of optimization variables. It was shown that the reduction in complexity does not affect the privacy levels that can be achieved, but comes at the expense of additional distortion. In Fawaz3, privacy mappings in the class of additive noise were considered. The parametric additive noise model allows to reduce the number of optimization variables to the number of noise parameters. However, this suboptimal solution is not suitable for perfect privacy, as it requires a high distortion.

[25] The use of the information theoretic framework of Fawaz relies on a local privacy setting, where users do not trust the analyst collecting data, thus each user holds his data locally, and passes it through a privacy preserving mechanism before releasing it to the untrusted analyst. Local privacy dates back to randomized response in surveys, and has been considered in privacy for data mining and statistics. Information theoretic privacy metrics have also been considered. Finally, differential privacy is currently the prevalent notion of privacy in privacy research. In the present application, we use the exponential mechanism (ExpMec) to compare our privacy mapping.

[26] The general problem of minimizing a convex function under convex constraints has been studied extensively, and is of main importance in many machine learning tasks. The idea of a sparse approximate solutions to those problems has also been studied in the literature and has been often called Sparse Greedy Approximation. This type of algorithm has been used with success in many applications such as Neural Network, Matrix

Factorization, SVM, Boosting, and others. Perhaps the most basic form of Sparse and Greedy Approximation arises when using the Frank- Wolfe Algorithm for a convex problem over the simplex.

[27] In the present application ,we consider the setting described in Fawaz, where a user has two kinds of data: a vector of personal data A G Λ that he would like to remain private, e.g., his income level, his political views, and a vector of data B G Έ that he is willing to release publicly and from which he will derive some utility, e.g., the release of his media preferences to a service provider would allow the user to receive content recommendations. c Z and Έ are the sets from which A and B can assume values. We assume that the user' s private data A are linked to his public data B by the joint probability distribution ΡΑ,Β - Thus, an analyst who would observe B could infer some information about A. To reduce this inference threat, instead of releasing B, the user releases a distorted version of B, denoted as B G Έ. Έ is a set from which B can assume values, and is generated according to a conditional probabilistic mapping

, called the privacy preserving mapping. Note that the set Έ may differ from Έ. This setting is reminiscent of the local privacy setting (e.g., randomized response, input perturbation), where users do not trust the analyst collecting data, thus each user holds his data locally, and passes it through a privacy preserving mechanism before releasing it. [28] The privacy mapping p_B\_B is designed to render any statistical inference of A based on the observation of B harder, while preserving some utility to the released data B, by limiting the distortion caused by the mapping. Following the framework for privacy-utility against statistical inference in Fawaz, the inference threat is modeled by the mutual information I(A; B) between the private data A and the publicly released data B, while the utility requirement is modeled by a constraint on the average distortion E_{B B} [d(B, B)] < A, for some distortion metric d: Έ x Έ→ R⁺, and Δ > 0. Note that this general framework does not assume a particular inference algorithm. In the case of perfect privacy (704; B) = 0) , the privacy mapping pg_|B renders the released data B statistically independent from the private data A. Both the mutual information I (A; B) and the average distortion E_{B B} [d(B, B)] depend on both the prior distribution p_{A B} and the privacy mapping p_B \_B , since A → B → B form a Markov chain. To stress that I (A; B) depends on both p_{A B} and p_B \_B , we will write I (A; B) as J (PA,B _> PB \B ) - Consequently, given a prior p_{A B} linking the private data A and the public data B , the privacy mapping p_B\_B minimizing the inference threat subject to a distortion constraint is obtained as the solution to the following convex optimization problem:

where Simplex denotes the probability simplex (∑_x p (x) = 1, p(x)≥ 0 Vx).

[30] The present principles propose methods to reduce computation complexity when designing privacy preserving mappings. In particular, we propose to exploit the structure of the optimization problem to achieve computational speedup that will allow scalability. To be more precise, we solve the above problem efficiently when the sets Έ and Έ are large.

In one embodiment, by studying smaller scale problems, both analytically and empirically, we identify that mappings to distort profiles are in fact naturally sparse. We leverage this observation to develop sparse privacy preserving mappings (SPPM) that handle scalability. Although the underlying optimization problem has linear constraints, its objective function is non- linear. We use the Frank- Wolfe algorithm that approximates the objective via a sequence of linear approximations, and this allows solving the problem as a sequence of linear programs, that can be solved quickly. In addition, we limit the number of alternate user profiles to a small number; this can be practical as long as the alternates are selected smartly. To do this, we adapt the Dantzig- Wolfe decomposition to the structure of our problem. Overall we reduce the number of variables from quadratic to linear in the number of user profiles. To the best of our knowledge, this work is the first to apply large scale linear programming optimization techniques to privacy problems.

[31] We also provide a detailed evaluation on three datasets, in which we compare our solution to an optimal one (when feasible) and to a state-of-the-art solution based on differential privacy (called the Exponential Mechanism). We find that our solutions are close to optimal, and consistently outperform the exponential mechanism (ExpMec) approach in that we achieve more privacy with less distortion. We show that our methods scale well with respect to the number of user profiles and their underlying alphabet.

[32] In the following, the sparsity property of the privacy preserving mappings, the proposed sparse privacy preserving mappings and evaluation results will be discussed in great detail.

[33] Sparsity of the Privacy Preserving Mappings

[34] When applying the aforementioned privacy-utility framework to large data, we encounter a challenge of scalability. Designing the privacy mapping requires characterizing the value of

for all possible pairs (b, b) £ Έ x Έ, i.e., solving the convex optimization problem over |S| |S | variables. When Έ = Έ, and the size of the alphabet |S| is large, solving the optimization over |S|² variables may become intractable.

[35] A natural question is whether Έ = Έ is a necessary assumption to achieve the optimal privacy-utility tradeoff. In other words, does the optimization problem (1) need to be solved over |S |² variables to achieve optimality? We use the following theoretical example to motivate a sparser approach to the design of the privacy mapping.

[36] Example 1: Let A £ {0,1}, and B £ (1,2, 2^m), and define the joint distribution p_{A B} such that p{A = 0) = p{A = 1) = ^ and for i G {1,2, 2^m), let p(B = i\A = 0) = if i≤ 2^{m_ 1}, 0 otherwise; and let p(B = i\A = 1) = if i > 2^{m_ 1}, 0 otherwise.

For this example, the privacy threat is the worst it could be, as observing B determines deterministically the value of the private random variable A (equivalently, I(A; B) = H(A) = 1). In FIG. 1 , we consider an l₂ distortion measure d(B; B) = (B— B)² and illustrate the optimal mappings solving Problem (1) for different distortion values. Darker colors mean that the probability of mapping of the corresponding points to the other one is larger. For small distortions, the blackest diagonal in FIG. 1 A shows that most points B = b are only mapped to themselves B = b (i.e., with a mapping probability of 100%), and only a small number of points in B (around B = 65) may be mapped to different points in B. That is, for a given data point in B, the privacy preserving mapping only chooses from a small set of values from B, rather than from the entire set of B. As we increase the distortion level, more points in B get mapped to a larger number of points in B. In both FIG. 1A and FIG. IB, we notice that the optimal mappings (i.e., these points shown on the curves) only occupy a very small portion of the 2-D space B x S. Thus, we consider the optimal privacy preserving mapping as sparse.

[37] This theoretical example, as well as experiments on other datasets such as the census data have shown that the optimal privacy preserving mapping may be sparse, in the sense that the support (i.e., the set of points B = b to which B = b can be mapped to with a non-zero probability) of

(b \b) may be of much smaller size than Έ, and may differ for different values of B = b. We propose to exploit sparsity properties of the privacy mapping to speed up the computation, by choosing the support of Ρβ_|β (δ|&), in an iterative greedy way.

[38] Using an example, we explain some of the notations used in the present application. We assume a group of ten people who consider age and income as private data, and consider gender and education as public data. We call age, income, gender, or education as an attribute of a user, wherein the group of people may have values for the attributes as:

- age: {20-30, 30-40, 40-50} ;

- income: {<50K, >50K } ;

- gender: {male, female} ;

- education: {high school, bachelor, postgraduate}

In this example, private data A is a vector of random variables {age, income} , and public data B is a vector of random variables {gender, education}. For a particular person in this group, who is a 29-year-old woman with a bachelor's degree and makes more than 50K, her private data a = (age: 20-30, income >50K), and public data b = (gender: female, education: bachelor). Her user profile can be set to (gender: female, education: bachelor). The set of Έ may comprise {(male, high school), (male, bachelor), (male, postgraduate), (female, high school), (female, bachelor), (female, postgraduate) }. The set of Έ may also be smaller, for example, the set of Έ may comprise {(male, high school), (male, bachelor), (male, postgraduate), (female, bachelor), (female, postgraduate)} if every woman in this group of people has a bachelor or postgraduate degree. Each element (for example, (male, bachelor)) of Έ is also referred to as a data point or possible value of Έ. The set of Έ may be identical to B. In order to protect his or her age/income information, a person may modify the gender and/or education information, such that he or she appears as another person.

[39] Sparse Privacy Preserving Mapping (SPPM)

[40] Before we describe our algorithm, we rewrite the optimization problem (1). Let X be a n x n matrix of variables to be optimized, whose entries are defined as x_t =

Ρ_Β\_Β Φ_Ι \ bj) , and let Xj be the y^'-th column of X, where n is the cardinality of Έ (i.e., n = \Έ \ , also referred to as alphabet size). Then the objective function ] (PA,B _< PB \B ) ^can be written as /(X), some function of X, and the distortion constraint can be written as ∑"_{= 1} dj Xj < A, where each: dj = p_B (b_j) [(d(b₁, bj) d(B₂, bj) ... d(b_n, bj))] ^T is a vector of length n that represents the distortion metric scaled by the probability of the corresponding symbol bj . The marginal of B is computed as Pe (bj) =∑_a p_{A B} (a, bj). Finally, the simplex constraint can be written as l^X = 1 for all j, where l_n is an all-ones vector of length n. Given the new notation, optimization problem (1) can be written as: minimize /(X) (2)

n

subject to ^ dj Xj≤ A

7 = 1

lnX/ = l Vj = l, ... , n X≥ 0

where X > 0 is an entry-wise inequality. [41] The optimization problem (1) has linear constraints but its objective function is non- linear because of the way mutual information is computed. In one embodiment, we solve the problem as a sequence of linear programs, also known as the Frank- Wolfe method. Each iteration £ of the method consists of three major steps. First, we compute the gradient Vxf(X{-i) at the solution from the previous step Χ^_ι· The gradient is a n x n matrix C, where c_t ,^■ = —

is a partial derivative of the objective function with respect to the variable x_t . Second, we find a feasible solution X' in the direction of the gradient. This problem is solved as a linear program with the same constraint as the optimization problem (1):

subject to

7 = 1

x_j = i v/ = i n

X≥ 0

where C is the /^'-th column of C. Finally, we find the minimum of / between Χ^_ and X', X^, and make it the current solution. Since / is convex, this minimum can be found efficiently by ternary search. The minimum is also feasible because the feasible region is convex, and both X' and X^_i are feasible.

Algorithm 1 SPPM: Sparse privacy preserving mappings

Input: Initial feasible point X₀

Number of linearization steps L

for alR = 1,2 L do

C <- V_x (X^)

V <- DWD

Find a feasible solution X' in the direction of the gradient C: mi (4) subje

Xj = l Vj

X≥ 0

*i_j = 0 V(i,y) i V

Find the minimum of / between X^ and X':

y^* <- arg min/((l - y)X_i_₁ + /X') ₍5

] '

end for

Output: Feasible solution X_L Algorithm 2 DWD: Dantzig- Wolfe decomposition

Initialize the set of active variables: V <- { (1,1)_» (2,2), ... , (n, n) }

while the set V grows do

Solve the master problem for λ^* and μ^* :

n

maximize 1Δ + ^ μ_;- (6) subject to λ≤ 0

Mt_j + i_j≤ ct_j V(i,y) e V

for all j = 1, 2, ... , n do

Find the most violated constraint in the master problem for fixed

i* = arg m_.inf j_j— ld_{i ;}-— μ_;·]

if (c_rj - Idj._j - μ_; < 0) then V ^ V U end if

end for

end while

Output: Active variables

[42] The linear program (3) has n² variables and therefore is hard to solve when n is large. In one embodiment, we propose an incremental solution to this problem, which is defined only on a subset of active variables V G (1, 2, ... , ri} x (1, 2, ... , n}. This is why we refer to our approach as sparse privacy mappings. The active variables are indices of the non-zero variables in the solution to the problem (3). Each active variable include a pair of indices (i, j), wherein the j-th data point in Έ is mapped to the i-th data point in B with a non-zero probability. Therefore, solving (3) on active variables V is equivalent to restricting all inactive variables to zero:

n

(8) minimize ^ cj Xj

7 = 1

n

subject to ^ dj Xj < A

7 = 1

T | 7\

nXj = l Vj = l, ... , n

X≥ 0

The above program has only 1171 variables. Now the challenge is in finding a good set of active variables V. This set should be small, and the solutions to of (3) and (8) should be close.

[43] We grow the set V greedily using the dual linear program of (8). In particular, we incrementally solve the dual by adding most violated constraints, which corresponds to adding most beneficial variables in the primal. The dual of (8) is: (9) maximize A + μ,- λ,μ ί—ι '

7 = 1

subject to λ < 0

d _j + _μ]≤ c _j V(i, ) G V

where λ G W. is a variable associated with the distortion constraint and μ G ^π is vector of n variables associated with the simplex constraints. Given a solution (λ^*, μ^*) to the dual, the most violated constraint for a given j is the one that minimizes:

Ci,j - λ^*ά_ί:] - μ).

This quantity, also called the reduced cost, has an intuitive interpretation. We choose a data point i in the direction of the steepest gradient of /(X), so c_t is as small as possible and data point i is close to data point j, so d_t j is close to zero (as λ^* < 0). This approach is also known as Dantzig- Wolfe decomposition (DWD).

[44] The pseudocode of our search procedure for computing sparse privacy preserving mappings is in Algorithm 2. This is an iterative algorithm, where each iteration consists of three steps. First, we solve the reduced dual linear program (9) on active variables.

Second, for each point j, we identify a point i^* that minimizes the reduced cost. Finally, if the pair corresponds to a violated constraint, we add it to the set of active variables V

[45] The pseudocode of our final solution is in Algorithm 1. We refer to Algorithm 1 as Sparse Privacy Preserving Mappings (SPPM), because of the mappings learned by the algorithm are sparse. Algorithm 2 is a subroutine of Algorithm 1 , which identifies the set of active variables V. As we observe in FIG. 1A, many points mapped to themselves, thus, we initialize the set of active variable V to { (1,1), (2,2), n, ri) }. Active variable V can also be initialized to other values, for example, to an all-zero vector. Algorithm 1 is parameterized by the number of iterations L. The value of L may be determined based on the required speed, the available computation resources and the distortion constraint. The initial feasible point X₀ can be any matrix that does not violate any feasible constraints, for example, X₀ can be an identity matrix (i.e., each data point maps to itself with a 100% possibility).

[46] Algorithm SPPM is a gradient descent method. In each iteration £, we find a solution X' in the direction of the gradient at the current solution X_{- . Then we find the minimum of / between X^_i and X', and make it the next solution X_{. By assumption, the initial solution X₀ is feasible in the optimization problem (1). The solution X' to the LP (linear program) (8) is always feasible in (1), because it satisfies all constraints in (1), and some additional constraints Xjj = 0 on inactive variables. After the first iteration of SPPM, X₁ is a convex combination of X₀ and X'. Since the feasible region is convex, and both X₀ and X' are feasible, X₁ is also feasible. By induction, all solutions X_; are feasible. When the optimization problem is formulated differently from Eq. (1), we may still apply sparsity property to reduce computational complexity.

[47] The value of /(X_;) is guaranteed to monotonically decrease with £. When the method converges, /(X;) = /(X^). The convergence rate of the Frank- Wolfe algorithm is 0 (1 /L) in the worst case.

[48] The time complexity of our method is 0 (n ² ), because we search for n² violated constraints in each iteration of Algorithm 2. To search efficiently, we implemented the following speedup in the computation of gradients q _j . We precompute the marginal and conditional distributions:

PB ( = VA,B (a, b p_{B lB} (b\b

These marginals are common for all elements of C. Indeed, c_t can be written as: dp(bi \b_j) ^ p(bi) ^ \p(bi \a) p{b ) ) As the precomputation can be completed with complexity 0 (n²), the amortized cost of computing each i is 0(1).

[49] Performance Evaluation

[50] We now evaluate the performance of SPPM on three datasets (described below) and compare it against two benchmarks: the optimal mapping of (i.e., the solution of Eq. (1)), and a local differentially private exponential mechanism.

[51] Census Dataset: The Census dataset is a sample of the United States population from 1994, and contains both categoric and numerical features. Each entry in the dataset contains attributes such as age, workclass, education, gender, and native country, as well as income category (smaller or larger than 50k per year). For our purposes, we consider the information to be released publicly are the seven attributes shown in TABLE 1, while the income category is the private information to be protected. In this dataset, roughly 76% of the people have an income smaller than 50k.

[52] In this example, n = 300, n² = 90,000. With our proposed algorithm, the number of active variables is limited to \V\ = 757. Thus, we reduce the computational complexity of the search problem significantly, from searching about n = 300 points to \V\/n ~ 3 points for each user.

TABLE 1

Attributes Example number of possible choices/values

Education B achelor, Doctorate, ... 16

Age 10-20, 20-30,... 9

Occupation Scientist, Manager,.. 15

Gender Male, Female 2

Marital Status Married, Divorced, ... 7

Race White, Black, Asian, ... 5

Country USA, Mexico,... 42

[53] Movie Dataset: Our second dataset is the well-known MovieLens data. The dataset consists of 1M ratings of 6K users on 4K movies. Each movie in the MovieLens dataset comes annotated with metadata indicating its genre. In MovieLens, there are 19 genres.

We extended the list of genres to 300 by using the more extended genre tags from Netflix.

From these genres, we select those that appear in at least 5% of movies, yielding 40 genres.

For user j, we compute the preference for genre i as the probability that the user chooses a movie from the genre times the reciprocal of the number of movies in that genre. For each user, we choose the six highest preferences and generate a binary vector of length 40 that indicates these preferences. Thus the user profile is a binary vector of genres. We treat the preference vector as public but the gender of the user as private. The fact that this profile can be a threat to gender is illustrated in FIGs. 4A-4C which shows the ROC curves for a classifier that tries to guess gender when there is no privacy protection for Δ = 0.04, 0.13 and

0.22, respectively.

[54] In this example, n = 3717, n² = 13,816,089. With our proposed algorithm, the number of active variables is limited to \V\ = 10,140. Thus, we reduce the computational complexity of the search problem significantly, from searching about n = 3717 points to \V\/n « 3 points for each user. We notice from both examples (using Census dataset and Movie dataset) that the number of points to search per user is reduced to around 3. In general, we observe that the number of points to search per user according to the present principles is in the order of magnitude of 10, regardless of the alphabet size n, while our designed privacy preserving mappings are the same or very close to the optimal mappings. [55] Synthetic Dataset: We want to consider synthetic data as well since this allows us to freely vary the problem size. The input distribution is specified in example 1 , namely the private data is a binary variable A £ {0,1}, and the public data B is perfectly correlated with A. By varying the parameter m as defined in the example, we modify the size of the alphabet of Έ, which allow us to asses the scalability. [56] Optimal mapping: The optimal mapping is the solution to optimization problem (1), computed by a CVX solver (e.g., a software designed for convex optimization) for smaller scale problems that CVX can handle without running out of memory. On our server, we could solve optimization problem (1) with alphabet size up to |S| = 2¹² = 4096.

[57] Exponential Mechanism: The differential privacy metric is most commonly used in a database privacy setting, in which an analyst asks a query on a private database of size n containing data from n users. The privacy preserving mechanism, which computes and releases the answer to the query, is designed to satisfy differential privacy under a given notion of neighboring databases. In the strong setting of local differential privacy, users do not trust the analyst collecting the data in a database, thus each user holds his data locally, and passes it through a differentially private mechanism before releasing it to the untrusted analyst. In this case, the privacy preserving mechanism works on a database of size n = 1, and all possible databases are considered to be neighbors. This local differential privacy setting, based on input perturbation at the user end, is comparable to our local privacy setting, where user data is distorted before its release, but it differs from our setting by the privacy metric that the privacy mechanism is required to satisfy. More precisely, the local differential privacy setting considers a database of size 1 which contains the vector b of a user. The local differentially private mechanism p^DP satisfies

p^DP(b \b)≤ e^£p^DP (b\b'), \/b, b' £ B and Vb £ B. [58] As the non-private data in our three datasets is categorical, we focus on the exponential mechanism, a well-known mechanism that preserves differential privacy for non-numeric valued queries. More precisely, in our experiments, we use the exponential mechanism p^DP (b \b) that maps b to b with a probability that decreases exponentially with the distance d(b, b) , p^DP(b \b) o exp (— βά(Β, &)), where β≥ 0. Let d_max = sup_b β_ΕΈ d (b, b). This exponential mechanism satisfies (2 ?d_ma¾) -local differential privacy. The distance d b, b) will be the same as the distance used in the distortion constraint (1). In one embodiment, d(b, b) is set to be the Hamming distance for experiments on the census and the movie dataset, and the squared l₂ distance for experiments on synthetic datasets. [59] In Fawaz, it was shown that in general, differential privacy with some neighboring database notion, does not guarantee low information leakage I(A; B), for all priors p_AiB. However, it was also shown in Fawaz2, that strong ^-differential privacy, i.e., ^-differential privacy under the neighboring notion that all databases are neighbors, implies that I A; B) < ε. Local differential privacy is a particular case of strong differential privacy.

Consequently, the mutual information between private A and the distorted B^DP resulting from the exponential mechanism p^DP will be upperbounded as I(A; B^DP) < 2βά _ηαχ. It is known that differential privacy was not defined with the goal of minimizing mutual information. However, regardless of the mechanism, mutual information is a relevant privacy metric, thus we can compare these two algorithms with respect to this metric. [60] Parameter Choices We first select values for the number of linearization steps L. Using synthetic data, we explore the privacy distortion tradeoff curve (in FIG. 2A) for different sets of parameter values. First we observe that for small values of distortions, the difference between the various curves is insignificant, which verifies that in this region the sparse optimal mapping assumption is valid. Second, as we increase the distortion we see that L does effect accuracy; the result by using L = 100 is fairly close to optimal while the result by using L = 500 nearly matches the optimal. Since the gain of using L = 500 compared to L = 100 seems small, and using 100 rather than 500 approximations is clearly much faster, we elect to use L = 100 for further experiments. [61] Privacy Performance We start by illustrating the privacy versus distortion tradeoff for SPPM and our benchmarks on two datasets. First, we consider the Census dataset, in which each user is represented by a vector of 7 attributes. We do not consider all possible values of this vector, as it would be prohibitive for an exact solver and prevent us from comparing to an optimal solution. Instead, we restrict the alphabet |S | to the 300 most probable vectors of 7 attributes, since we are mainly focused on a relative comparison. In FIG. 2B we see that SPPM is generally close to the optimal solution whereas the exponential mechanism (ExpMec) is much further away. With an optimal solution, the mutual information can be brought to zero (perfect privacy) with a distortion of 0.08 roughly.

SPPM needs 0.2 distortion to achieve perfect privacy, while ExpMec needs twice as much. Note that for a given level of distortion, e.g., 0.1 , we see that SPPM achieves much better privacy than the exponential mechanism as the mutual information is significantly lower.

[62] Next we consider the MovieLens dataset which is one order of magnitude larger than the Census dataset. The results, in FIG. 2C, mirror what we just observed with the Census data; namely that a given level of privacy can be achieved with less distortion using SPPM as opposed to ExpMec. For example, to reduce our privacy leakage metric from 0.11 to 0.02, SPPM requires roughly 0.03 distortion whereas the exponential mechanism needs 0.13, about 4 times as much.

[63] Another metric to gage the success of our privacy mapping is to consider its impact on a classifier attempting to infer the private attribute. The goal of our mapping is to weaken the classifier. First, we consider a simple Naive Bayes classifier that analyzes the Census data to infer each user's income category. We quantify the classifier's success, in terms of true positives and false negatives (in an ROC curve) in FIG. 3. Recall that in an ROC curve, the y = x line corresponds to a blind classifier that is no better than an uninformed guess. We consider three bounds on distortion that allow us to explore the extremes of nearly no distortion (Δ = 0.02 in FIG. 3A), a large amount of distortion (Δ = 0.44 in FIG. 3C), and something in between (Δ = 0.14 in FIG. 3B). In the case of small distortion, all algorithms make modest improvements over the no privacy case.

However even in this scenario, SPPM performs close to optimal, unlike ExpMec that only slightly outperforms the no privacy case. With only a very small amount of distortion, not even the optimal solution can render the classifier completely useless. On the other hand, when a large distortion is permitted, then it is natural that all algorithms will do well (FIG. 3C). For a value of Δ in between these extremes, SPPM is close to optimal, while ExpMec can only weaken the classifier a little. [64] In a second scenario, we study a logistic regression classifier that analyzes the movie dataset to infer gender. We focus on logistic regression for movie data because it has been shown to be an effective classifier for inferring gender. Again we see in FIGs. 4A-4C that the findings essentially mimic those shown in FIGs. 3A-3C. We thus conclude that SPPM can weaken a classifier more than the exponential mechanism for a fixed constraint on the allowed distortion.

[65] In the following, we explain why SPPM consistently outperforms ExpMec. FIG. 5 illustrates how these mappings work. We plot the average probability of being mapped to the kth closest point, together with the standard error (represented by the two dashed lines in FIG. 5) that measures the variability among points. In ExpMec, the probability of mapping one point to another decreases exponentially with distance and the same mapping is applied to all points (in other words the standard deviation is null). With SPPM, many points are mapped to themselves as illustrated by a high peak around distance=0; these are points for which we cannot provide privacy with the allowed distortion constraint. ExpMec however, wastes some distortion on those points which are mapped to close neighbours. Overall, the standard error shows that SPPM has quite some variance in its mappings. This indicates that the mappings are tailored to each point.

[66] In order to gain some insight in the mappings SPPM proposes, TABLE. 2 and TABLE. 3 index the decreases in mutual information between a single attribute of the public data, and an attribute of the private data we wish to hide. This allows us to determine which public attributes are the most correlated with the private attribute, but also to understand the mappings by observing which mutual information are decreased the most (i.e., with the highest decrease in I (A; F)). As such, on the Census dataset, Education, Marital status and Occupation are the best individual attributes to infer income. We also notice that these are the attributes for which the mutual information has decreased the most, meaning that in the privacy-utility region represented by this table, it was favorable to spend more distortion on tackling the biggest threats. Similarly, on the Movie dataset, the genres most correlated with gender are Comedy+Romance, Romance, and Romance+Drama, and once again these are the one that are modified the most. This intuitive property shows that the mappings learned depend on the underlying prior distribution in a smart way, such that with limited distortion budget the priority is on the biggest privacy threats.

[67] Scalability Having established good privacy performance, we now assess how the runtime performance scales in terms of the size of the problem, shown in FIG. 6. We fix the distortion constraint to be proportional to the size of the problem, in order to keep a similar difficulty as we grow the size. As discussed before, the time complexity is linear in L. This trend is evident as we observe the gaps between the lines for L = 3, 10, and 100. Importantly, we see that our method scales with the problem size better than the optimal solution in terms of computational complexity. We observe that the computation speed of the exponential mechanism is very quick, and note that indeed this is one of the salient properties of this mechanism. Overall, FIG. 6 shows that our method is indeed tractable and can compute the privacy preserving mappings fast, for example, within a few minutes, for problems whose alphabet size is in the order of tens of thousands.

ountry . .

TABLE 3

Feature F I(A; F) I(A; F) after SPPM

Action + Sci-Fi 0.0347 0.0227

Action + Thriller 0.0115 0.0072

Adventure + Action 0.0178 0.0128

Adventure + Sci-Fi 0.0176 0.0131

Animation + Children' s 0.0115 0.0084

Comedy + Drama 0.0124 0.0089

Musical 0.0186 0.0130

Romance 0.0568 0.0380

Romance + Drama 0.0452 0.0281

Sci-Fi 0.0178 0.0127

Thriller + Sci-Fi 0.0193 0.0139 [68] In the present principles, we apply large scale LP optimization techniques to the problem of designing privacy preserving mappings. We show that our privacy preserving mappings can be close to optimal, and consistently outperform a state of the art technique called the Exponential Mechanism. Our solution achieves better privacy with less distortion than existing solutions, when privacy leakage is measured by a mutual information metric. We demonstrate that our method can scale, even for systems with many users and a large underlying alphabet that describes their profiles. We can compute mappings, for systems with a large number of, for example, tens of thousands of, potential alternate profiles in a short time. [69] FIG. 7 illustrates an exemplary method 700 for distorting public data to be released in order to preserve privacy according to the present principles. Method 700 starts at 705. At step 710, it performs initialization, for example, determines possible values for public data or private data (i.e., determines A and B), and set up a utility constraint. At step 720, it collects statistical information about public or private data, for example, from the users who are not concerned about privacy of their public data or private data. We denote these users as "public users," and denote the users who wish to distort public data to be released as "private users."

[70] The statistics may be collected by crawling the web, accessing different databases, or may be provided by a data aggregator, for example, by bluekai.com. Which statistical information can be gathered depends on what the public users release. For example, if the public users release both private data and public data, an estimate of the joint distribution P_{A B} can be obtained. In another example, if the public users only release public data, an estimate of the marginal probability measure P_B can be obtained, but not the joint distribution P_{A B} . In another example, we may only be able to get the mean and variance of the public data. In the worst case, we may be unable to get any information about the public data or private data.

[71] At step 730, it determines a sparse privacy preserving mapping based on the statistical information given the utility constraint. Using the sparsity property of the privacy mapping, we design sparse privacy mappings where each value in the public data is mapped to a limited selection of values, thus enabling a fast design of privacy preserving mapping. The mapping can be obtained using Algorithm 1 that provides a fast solution to optimization problem (1). When the optimization problem is formulated differently, the sparsity property of the privacy mapping may also be used. [72] At step 740, the public data of a current private user is distorted, according to the determined privacy preserving mapping, before it is released to, for example, a service provider or a data collecting agency, at step 750. Given the value B = b for the private user, a value B = b is sampled according to the distribution Pg \B=b - This value b is released instead of the true b. Note that the use of the privacy mapping to generate the released b does not require knowing the value of the private data A = a of the private user. Method 700 ends at step 799.

[73] A privacy agent is an entity that provides privacy service to a user. A privacy agent may perform any of the following:

- receive from the user what data he deems private, what data he deems public, and what level of privacy he wants;

- compute the privacy preserving mapping;

- implement the privacy preserving mapping for the user (i.e., distort his data according to the mapping); and

- release the distorted data, for example, to a service provider or a data collecting agency.

[74] The present principles can be used in a privacy agent that protects the privacy of user data. FIG. 8 depicts a block diagram of an exemplary system 800 where a privacy agent can be used. Public users 810 release their private data 4) and/or public data (B). As discussed before, public users may release public data as is, that is, B = B. The information released by the public users becomes statistical information useful for a privacy agent.

[75] A privacy agent 880 includes statistics collecting module 820, privacy preserving mapping decision module 830, and privacy preserving module 840. Statistics collecting module 820 may be used to collect joint distribution P_{A B} , marginal probability measure P_B , and/or mean and covariance of public data. Statistics collecting module 820 may also receive statistics from data aggregators, such as bluekai.com. Depending on the available statistical information, privacy preserving mapping decision module 830 designs a privacy preserving mapping mechanism P_B\_B , for example, based on the optimization problem formulated as Eq. (l)-(3), using SPPM described in Algorithm 1. Privacy preserving module 840 distorts public data of private user 860 before it is released, according to the conditional probability P_B \_B . I^{n one} embodiment, statistics collecting module 820, privacy preserving mapping decision module 830, and privacy preserving module 840 can be used to perform steps 720, 730, and 740 in method 700, respectively. [76] Note that the privacy agent needs only the statistics to work without the knowledge of the entire data that was collected in the data collection module. Thus, in another embodiment, the data collection module could be a standalone module that collects data and then computes statistics, and needs not be part of the privacy agent. The data collection module shares the statistics with the privacy agent. [77] A privacy agent sits between a user and a receiver of the user data (for example, a service provider). For example, a privacy agent may be located at a user device, for example, a computer, or a set-top box (STB). In another example, a privacy agent may be a separate entity. [78] All the modules of a privacy agent may be located at one device, or may be distributed over different devices, for example, statistics collecting module 820 may be located at a data aggregator who only releases statistics to the module 830, the privacy preserving mapping decision module 830, may be located at a "privacy service provider" or at the user end on the user device connected to a module 820, and the privacy preserving module 840 may be located at a privacy service provider, who then acts as an intermediary between the user, and the service provider to whom the user would like to release data, or at the user end on the user device.

[79] The privacy agent may provide released data to a service provider 850, for example, Comcast or Netflix, in order for private user 860 to improve received service based on the released data, for example, a recommendation system provides movie recommendations to a user based on its released movies rankings.

[80] In FIG. 9, we show that there are multiple privacy agents in the system. In different variations, there need not be privacy agents everywhere as it is not a requirement for the privacy system to work. For example, there could be only a privacy agent at the user device, or at the service provider, or at both. In FIG. 9, we show that the same privacy agent "C" for both Netflix and Facebook. In another embodiment, the privacy agents at Facebook and Netflix, can, but need not, be the same.

[81] The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users. [82] Reference to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one implementation" or "in an implementation", as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.

[83] Additionally, this application or its claims may refer to "determining" various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

[84] Further, this application or its claims may refer to "accessing" various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

[85] Additionally, this application or its claims may refer to "receiving" various pieces of information. Receiving is, as with "accessing", intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, "receiving" is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

[86] As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry the bitstream of a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.

Claims

CLAIMS:

1. A method for processing user data for a user, comprising:

accessing the user data, which includes private data and public data;

determining a set of values that the public data of the user can map to, wherein size of the set of values is small;

determining (730) a privacy preserving mapping that maps the public data to released data, wherein the public data of the user only maps to values within the determined set of values;

modifying (740) the public data of the user based on the privacy preserving mapping; and

releasing (750) the modified data as the released data to at least one of a service provider and a data collecting agency.

2. The method of claim 1 , wherein the size of the set of values is in the magnitude of order of ten.

3. The method of claim 1, wherein the method further processes user data for a second user, comprising determining a second set of values that public data of the second user can map to, wherein size of the determined second set of values is small, and the determined set of values is different from the determined second set of values, and wherein the public data of the second user only maps to values within the determined second set of values.

4. The method of claim 1, wherein the public data comprises data that the user has indicated can be publicly released, and the private data comprises data that the user has indicated is not to be publicly released.

5. The method of claim 1, wherein a value in the set of values is non-numeric.

6. The method of claim 1 , wherein a value in the set of values corresponds to a vector.

7. The method of claim 1, wherein the determining a privacy preserving mapping are responsive to a sequence of linear programs.

8. The method of claim 7, wherein the determining a privacy preserving mapping are based on Frank- Wolfe algorithm.

9. The method of claim 7, wherein the linear programs are generated incrementally by a greedy procedure.

10. The method of claim 1, wherein the determining a set of values are based on Dantzig- Wolfe decomposition.

11. An apparatus for processing user data for a user, comprising:

a statistics collecting module (820) configured to access the user data, which includes private data and public data;

a privacy preserving mapping decision module (830) configured to determine a set of values that the public data of the user can map to, wherein size of the set of values is small, and to determine a privacy preserving mapping that maps the public data to released data, wherein the public data of the user only maps to values within the determined set of values; and

a privacy preserving module (840) configured to modify the public data of the user based on the privacy preserving mapping, and release the modified data as the released data to at least one of a service provider and a data collecting agency.

12. The apparatus of claim 11, wherein the size of the set of values is in the magnitude of order of ten.

13. The apparatus of claim 11, wherein the apparatus is further configured to process user data for a second user, wherein the privacy preserving mapping decision module (830) is configured to determine a second set of values that public data of the second user can map to, wherein size of the determined second set of values is small, and the determined set of values is different from the determined second set of values, and wherein the public data of the second user only maps to values within the determined second set of values.

14. The apparatus of claim 11, wherein the public data comprises data that the user has indicated can be publicly released, and the private data comprises data that the user has indicated is not to be publicly released.

15. The apparatus of claim 11, wherein a value in the set of values is non- numeric.

16. The apparatus of claim 11 , wherein a value in the set of values corresponds to a vector.

17. The apparatus of claim 11, wherein the privacy preserving mapping decision module (830) is configured to determine the privacy preserving mapping responsive to a sequence of linear programs.

18. The apparatus of claim 17, wherein the privacy preserving mapping decision module (830) is based on Frank- Wolfe algorithm.

19. The apparatus of claim 17, wherein the linear programs are generated

incrementally by a greedy procedure.

20. The apparatus of claim 11 , wherein the privacy preserving mapping decision module (830) is based on Dantzig- Wolfe decomposition.