EP3036679A1 - Method and apparatus for utility-aware privacy preserving mapping through additive noise - Google Patents
Method and apparatus for utility-aware privacy preserving mapping through additive noiseInfo
- Publication number
- EP3036679A1 EP3036679A1 EP13812234.6A EP13812234A EP3036679A1 EP 3036679 A1 EP3036679 A1 EP 3036679A1 EP 13812234 A EP13812234 A EP 13812234A EP 3036679 A1 EP3036679 A1 EP 3036679A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- user
- noise
- public
- released
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Definitions
- This invention relates to a method and an apparatus for preserving privacy, and more particularly, to a method and an apparatus for adding noise to user data to preserve privacy.
- This service, or other benefit that the user derives from allowing access to the user's data may be referred to as utility.
- privacy risks arise as some of the collected data may be deemed sensitive by the user, e.g., political opinion, health status, income level, or may seem harmless at first sight, e.g., product ratings, yet lead to the inference of more sensitive data with which it is correlated.
- the latter threat refers to an inference attack, a technique of inferring private data by exploiting its correlation with publicly released data.
- the present principles provide a method for processing user data for a user, comprising the steps of: accessing the user data, which includes private data and public data, the private data corresponding to a first category of data, and the public data corresponding to a second category of data; determining a covariance matrix of PU130122
- the present principles also provide an apparatus for performing these steps.
- the present principles also provide a method for processing user data for a user, comprising the steps of: accessing the user data, which includes private data and public data; accessing a constraint on utility D, the utility being responsive to the public data and released data of the user; generating a random noise Z responsive to the utility constraint, the random noise follows a maximum entropy probability distribution under the utility constraint; and adding the generated noise to the public data of the user to generate the released data for the user as described below.
- the present principles also provide an apparatus for performing these steps.
- the present principles also provide a computer readable storage medium having stored thereon instructions for processing user data for a user according to the methods described above.
- FIG. 1 is a flow diagram depicting an exemplary method for preserving privacy by adding Gaussian noise to continuous data, in accordance with an embodiment of the present principles.
- FIG. 2 is a flow diagram depicting an exemplary method for preserving privacy by adding discrete noise to discrete data, in accordance with an embodiment of the present principles.
- FIG. 3 is a block diagram depicting an exemplary privacy agent, in
- FIG. 4 is a block diagram depicting an exemplary system that has multiple privacy agents, in accordance with an embodiment of the present principles.
- the term analyst which for example may be a part of a service provider's system, as used in the present application, refers to a receiver of the released data, who ostensibly uses the data in order to provide utility to the user.
- the analyst is a legitimate receiver of the released data.
- an analyst could also be a receiver of the released data.
- a user may release a "distorted version" of data, generated according to a conditional probabilistic mapping, called “privacy preserving mapping,” designed under a utility constraint.
- a conditional probabilistic mapping called “privacy preserving mapping”
- the user's political opinion is considered to be private data for this user
- the TV ratings are considered to be public data
- the released modified TV ratings are considered to be the released data.
- another user may be willing to release both political opinion and TV ratings without modifications, and thus, for this other user, there is no distinction between private data, public data and released data when only political opinion and TV ratings are considered. If many people release political opinions and TV ratings, an analyst may be able to derive the correlation between political opinions and TV ratings, and thus, may be able to infer the political opinion of the user who wants to keep it private.
- private data this refers to data that the user not only indicates that it should not be publicly released, but also that he does not want it to be inferred from other data that he would release.
- Public data is data that the user would allow the privacy agent to release, possibly in a distorted way to prevent the inference of the private data.
- public data is the data that the service provider requests from the user in order to provide him with the service. The user however will distort (i.e., modify) it before releasing it to the service provider.
- public data is the data that the user indicates as being "public” in the sense that he would not mind releasing it as long as the release takes a form that protects against inference of the private data.
- the distortion between the released data and public data as a measure of utility.
- the distortion is larger, the released data is more different from the public data, and more privacy is preserved, but the utility derived from the distorted data may be lower for the user.
- the distortion is smaller, the released data is a more accurate representation of the public data and the user may receive more utility, for example, receive more accurate content recommendations.
- finding the privacy preserving mapping relies on the fundamental assumption that the prior joint distribution that links private data and released data is known and can be provided as an input to the optimization problem.
- the true prior distribution may not be known, but rather some prior statistics may be estimated from a set of sample data that can be observed.
- the prior joint distribution could be estimated from a set of users who do not have privacy PU130122
- the marginal distribution of the public data to be released, or simply its second order statistics may be estimated from a set of users who only release their public data. The statistics estimated based on this set of samples are then used to design the privacy preserving mapping mechanism that will be applied to new users, who are concerned about their privacy. In practice, there may also exist a mismatch between the estimated prior statistics and the true prior statistics, due for example to a small number of observable samples, or to the incompleteness of the observable data.
- the public data is denoted by a random variable X ⁇ X with the probability distribution P x .
- X is correlated with the private data, denoted by random variable S e S.
- the correlation ofS and X is defined by the joint distribution P sx .
- the released data, denoted by random variable Y G y is a distorted version of X.
- Y is achieved via passing X through a kernel, P Y ⁇ X .
- the term "kernel” refers to a conditional probability that maps data X to data Y probabilistically. That is, the kernel P Y ⁇ X is the privacy preserving mapping that we wish to design.
- D (. ) is the K-L divergence
- E(. ) is the expectation of a random variable
- H(. ) is the entropy
- e e [0,1] is called the leakage factor
- I(S; Y) represents the information leakage.
- the present principles propose methods to design utility-aware privacy preserving mapping mechanisms when only partial statistical knowledge of the prior is available. More specifically, the present principles provide privacy preserving mapping mechanisms in the class of additive noise mechanisms, wherein noise is added to public data before it is released. In the analysis, we assume the mean value of the noise to be zero. The mechanism can also be applied when the mean is PU130122
- Exemplary continuous public data may be the height or blood pressure of a user.
- the mapping is obtained by knowing VAR(X) (or covariance matrix in the case of multi-dimensional X), without knowing P x and P s x .
- Gaussian noise has zero mean and variance (- 2 -norm distortion) not greater than for some ⁇ ⁇ .
- Gaussian noise is the best, in the following sense:
- Gaussian mechanism proceeds by steps as illustrated in FIG. 1 .
- Method 100 starts at 105.
- step 1 10 it estimates statistical information based on public data released by users who are not concerned about privacy of their public data or private data. We denote these users as “public users,” and denote the PU130122
- the statistics may be collected by crawling the web, accessing different databases, or may be provided by a data aggregator, for example, by bluekai.com. Which statistical information can be gathered depends on what the public users release. Note that it requires less data to characterize the variance than to characterize the marginal distribution P x . Hence we may be in a situation where we can estimate the variance, but not the marginal distribution accurately. In one example, we may only be able to get the mean and variance (or covariance) of the public data at step 120 based on the collected statistical information. At step 130, we take the eigenvalue decomposition of covariance matrix C x .
- the covariance matrix of the Gaussian noise, N G has eigenvectors same as the eigenvectors of C x . Moreover, the corresponding eigenvalues of C N are given by solving the following optimization problem
- the distorted data is then released to, for example, a service provider or a data collecting agency, at step 150.
- Method 100 ends at step 199.
- the proposed Gaussian mechanism is optimal under £ 2 - norm distortion constraint.
- Theorem 3 Assuming £ 2 - norm distortion and a given distortion level, D , the optimum Gaussian noise in the Gaussian mechanism that minimizes mutual information, satisfies: the covariance matrix of the optimum noise, N G , has eigenvectors same as the eigenvectors of C x . Also, the eigenvalues are given in (17). Proof: We have
- Example 5 It can be shown that, by adding Gaussian noise with variance ⁇ ⁇ 2 ⁇ ⁇ 2 log ( 2/5) we can achieve (e, ⁇ ) -differential privacy. This scheme results in a distortion D ⁇
- the optimization problem is to located maximum entropy discrete probability distribution P* D , subject to a constraint on the p th moment.
- the maximum entropy is denoted by H*(p, D).
- ⁇ 1 and E[
- discrete mechanism proceeds by steps as illustrated in FIG. 2.
- Method 200 starts at 205.
- it accesses parameters, for example, p and D, to define a distortion measure.
- p and D parameters, for example, p and D
- P p D probability measure
- the distribution P p D is only determined by p and D, but the resulting praivcy accuracy tradeoff will depend on X because the distortion constraint couples the privacy and the accuracy.
- Method 200 ends at step 299.
- the privacy guarantee i.e., information leakage
- the discrete mechanism is upper-bounded by the right term, which depends both on D and the average l p norm of X.
- the beauty of the additive noise technique is that not only it does not require PU 130122
- a privacy agent is an entity that provides privacy service to a user.
- a privacy agent may perform any of the following:
- FIG. 3 depicts a block diagram of an exemplary system 300 where a privacy agent can be used.
- a privacy agent 380 includes statistics collecting module 320, additive noise generator 330, and privacy preserving module 340.
- Statistics collecting module 320 may be used to collect covariance of public data.
- Statistics collecting module 320 may also receive statistics from data aggregators, such as bluekai.com.
- additive noise generator 330 designs a noise, for example, based on the Gaussian mechanism or discrete mechanism.
- Privacy preserving module 340 distorts public data of private user 360 before it is released, by adding the generated noise.
- statistics collecting module 320, additive noise generator 330, and privacy preserving module 340 can be used to perform steps 1 10, 130, and 140 in method 100, respectively.
- the privacy agent needs only the statistics to work without the knowledge of the entire data that was collected in the data collection module.
- the data collection module could be a standalone module that collects data and then computes statistics, and needs not be part of the privacy agent.
- the data collection module shares the statistics with the privacy agent.
- additive noise generator 330, and privacy preserving module 340 can be used to perform steps 220 and 230 in method 200, respectively.
- a privacy agent sits between a user and a receiver of the user data (for example, a service provider).
- a privacy agent may be located at a user device, for example, a computer, or a set-top box (STB).
- STB set-top box
- a privacy agent may be a separate entity.
- All the modules of a privacy agent may be located at one device, or may be distributed over different devices, for example, statistics collecting module 320 may be located at a data aggregator who only releases statistics to the module 330, the additive noise generator 330, may be located at a "privacy service provider" or at the user end on the user device connected to a module 320, and the privacy preserving module 340 may be located at a privacy service provider, who then acts as an intermediary between the user, and the service provider to who the user would like to release data, or at the user end on the user device.
- the privacy agent may provide released data to a service provider, for example, Comcast or Netflix, in order for private user 360 to improve received service based on the released data, for example, a recommendation system provides movie recommendations to a user based on its released movies rankings.
- a service provider for example, Comcast or Netflix
- FIG. 4 we show that there are multiple privacy agents in the system. In different variations, there need not be privacy agents everywhere as it is not a requirement for the privacy system to work. For example, there could be only a privacy agent at the user device, or at the service provider, or at both. In FIG. 4, we show that the same privacy agent "C" for both Netflix and Facebook. In another embodiment, the privacy agents at Facebook and Netflix, can, but need not, be the same.
- the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be PU130122
- An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
- the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device.
- processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
- PDAs portable/personal digital assistants
- the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
- Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
- Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), PU130122
- Receiving is, as with “accessing”, intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory).
- receiving is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
- implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
- the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
- a signal may be formatted to carry the bitstream of a described embodiment.
- Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
- the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
- the information that the signal carries may be, for example, analog or digital information.
- the signal may be transmitted over a PU130122
- the signal may be stored on a processor-readable medium.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Storage Device Security (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201361867546P | 2013-08-19 | 2013-08-19 | |
| PCT/US2013/071290 WO2015026386A1 (en) | 2013-08-19 | 2013-11-21 | Method and apparatus for utility-aware privacy preserving mapping through additive noise |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP3036679A1 true EP3036679A1 (en) | 2016-06-29 |
Family
ID=49880942
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP13812234.6A Withdrawn EP3036679A1 (en) | 2013-08-19 | 2013-11-21 | Method and apparatus for utility-aware privacy preserving mapping through additive noise |
Country Status (5)
| Country | Link |
|---|---|
| EP (1) | EP3036679A1 (cg-RX-API-DMAC7.html) |
| JP (1) | JP2016531513A (cg-RX-API-DMAC7.html) |
| KR (1) | KR20160044553A (cg-RX-API-DMAC7.html) |
| CN (1) | CN105659249A (cg-RX-API-DMAC7.html) |
| WO (1) | WO2015026386A1 (cg-RX-API-DMAC7.html) |
Families Citing this family (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3342131B1 (en) * | 2015-10-14 | 2020-03-11 | Samsung Electronics Co., Ltd. | A system and method for privacy management of infinite data streams |
| US10956603B2 (en) * | 2016-04-07 | 2021-03-23 | Samsung Electronics Co., Ltd. | Private dataaggregation framework for untrusted servers |
| CN106130675B (zh) * | 2016-06-06 | 2018-11-09 | 联想(北京)有限公司 | 一种加噪处理方法和装置 |
| US10452865B2 (en) * | 2016-12-30 | 2019-10-22 | Mitsubishi Electric Research Laboratories, Inc. | Method and systems using privacy-preserving analytics for aggregate data |
| US10333822B1 (en) | 2017-05-23 | 2019-06-25 | Cisco Technology, Inc. | Techniques for implementing loose hop service function chains price information |
| US11132453B2 (en) * | 2017-12-18 | 2021-09-28 | Mitsubishi Electric Research Laboratories, Inc. | Data-driven privacy-preserving communication |
| CN109543445B (zh) * | 2018-10-29 | 2022-12-20 | 复旦大学 | 一种基于条件概率分布的隐私保护数据发布方法 |
| CN111209531B (zh) * | 2018-11-21 | 2023-08-08 | 百度在线网络技术(北京)有限公司 | 关联度的处理方法、装置和存储介质 |
| CN109753921A (zh) * | 2018-12-29 | 2019-05-14 | 上海交通大学 | 一种人脸特征向量隐私保护识别方法 |
| KR102055864B1 (ko) * | 2019-05-08 | 2019-12-13 | 서강대학교 산학협력단 | 차분 프라이버시를 적용한 시간 간격 데이터 공개 방법 |
| US12321478B2 (en) | 2019-05-14 | 2025-06-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Utility optimized differential privacy system |
| CN110648289B (zh) * | 2019-08-29 | 2023-07-11 | 腾讯科技(深圳)有限公司 | 图像的加噪处理方法及装置 |
| CN113449313B (zh) * | 2020-03-25 | 2025-01-14 | 中关村海华信息技术前沿研究院 | 用能量数据的加密方法、噪声生成方法、设备及存储介质 |
| SE2050534A1 (en) | 2020-05-07 | 2021-11-08 | Dpella Ab | Estimating Accuracy of Privacy-Preserving Data Analyses |
| CN112231764B (zh) * | 2020-09-21 | 2023-07-04 | 北京邮电大学 | 一种时序数据隐私的保护方法及相关设备 |
| CN114282084A (zh) * | 2020-09-28 | 2022-04-05 | 阿里巴巴集团控股有限公司 | 一种数据分析方法、噪声构建方法、设备及存储介质 |
| CN112364372A (zh) * | 2020-10-27 | 2021-02-12 | 重庆大学 | 一种有监督矩阵补全的隐私保护方法 |
| CN113821577B (zh) * | 2021-08-27 | 2024-02-02 | 同济大学 | 一种室内环境下的基于地理不可区分性的位置模糊方法 |
| CN116305292B (zh) * | 2023-05-17 | 2023-08-08 | 中国电子科技集团公司第十五研究所 | 基于差分隐私保护的政务数据发布方法及系统 |
| CN117196012A (zh) * | 2023-09-07 | 2023-12-08 | 南京信息工程大学 | 一种基于差分隐私的个性化联邦学习识别方法及系统 |
| CN118053596B (zh) * | 2024-03-04 | 2024-08-06 | 飞图云科技(山东)有限公司 | 一种智能化医疗平台数据管理方法和系统 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070016528A1 (en) * | 2003-08-08 | 2007-01-18 | Verhaegh Wilhelmus F J | System for processing data and method thereof |
| US7302420B2 (en) * | 2003-08-14 | 2007-11-27 | International Business Machines Corporation | Methods and apparatus for privacy preserving data mining using statistical condensing approach |
| US7363192B2 (en) * | 2005-12-09 | 2008-04-22 | Microsoft Corporation | Noisy histograms |
| US7853545B2 (en) * | 2007-02-26 | 2010-12-14 | International Business Machines Corporation | Preserving privacy of one-dimensional data streams using dynamic correlations |
| US8619984B2 (en) * | 2009-09-11 | 2013-12-31 | Microsoft Corporation | Differential privacy preserving recommendation |
-
2013
- 2013-11-21 KR KR1020167007121A patent/KR20160044553A/ko not_active Withdrawn
- 2013-11-21 CN CN201380078968.XA patent/CN105659249A/zh active Pending
- 2013-11-21 JP JP2016536079A patent/JP2016531513A/ja not_active Withdrawn
- 2013-11-21 WO PCT/US2013/071290 patent/WO2015026386A1/en not_active Ceased
- 2013-11-21 EP EP13812234.6A patent/EP3036679A1/en not_active Withdrawn
Non-Patent Citations (1)
| Title |
|---|
| See references of WO2015026386A1 * |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20160044553A (ko) | 2016-04-25 |
| CN105659249A (zh) | 2016-06-08 |
| JP2016531513A (ja) | 2016-10-06 |
| WO2015026386A1 (en) | 2015-02-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3036679A1 (en) | Method and apparatus for utility-aware privacy preserving mapping through additive noise | |
| US20160210463A1 (en) | Method and apparatus for utility-aware privacy preserving mapping through additive noise | |
| Xiong et al. | A comprehensive survey on local differential privacy | |
| Asoodeh et al. | Estimation efficiency under privacy constraints | |
| US11106809B2 (en) | Privacy-preserving transformation of continuous data | |
| US20160203333A1 (en) | Method and apparatus for utility-aware privacy preserving mapping against inference attacks | |
| Makhdoumi et al. | From the information bottleneck to the privacy funnel | |
| US20150235051A1 (en) | Method And Apparatus For Privacy-Preserving Data Mapping Under A Privacy-Accuracy Trade-Off | |
| EP3036678A1 (en) | Method and apparatus for utility-aware privacy preserving mapping in view of collusion and composition | |
| US20160006700A1 (en) | Privacy against inference attacks under mismatched prior | |
| US20150339493A1 (en) | Privacy protection against curious recommenders | |
| WO2015157020A1 (en) | Method and apparatus for sparse privacy preserving mapping | |
| EP3036677A1 (en) | Method and apparatus for utility-aware privacy preserving mapping against inference attacks | |
| CN114239860A (zh) | 基于隐私保护的模型训练方法及装置 | |
| Sharma et al. | A practical approach to navigating the tradeoff between privacy and precise utility | |
| US20160203334A1 (en) | Method and apparatus for utility-aware privacy preserving mapping in view of collusion and composition | |
| EP3267353A1 (en) | Privacy protection against curious recommenders | |
| Guo et al. | AdaLinUCB: Opportunistic learning for contextual bandits | |
| Wu et al. | FCER: A Federated Cloud-Edge Recommendation Framework With Cluster-Based Edge Selection | |
| Qi et al. | Privacy protection and statistical efficiency trade-off for federated learning | |
| CN105989154A (zh) | 相似性度量的方法及设备 | |
| WO2018184463A1 (en) | Statistics-based multidimensional data cloning | |
| Yuliana | Improving performance of secret key generation from wireless channel using filtering techniques | |
| Jiang | Improving Privacy-utility Tradeoffs in Privacy-preserving Data Release with Context Information | |
| Mak et al. | Uncertainty quantification and design for noisy matrix completion-a unified framework |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20160315 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| AX | Request for extension of the european patent |
Extension state: BA ME |
|
| DAX | Request for extension of the european patent (deleted) | ||
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
| 18W | Application withdrawn |
Effective date: 20190429 |