CN110457940B - Differential privacy measurement method based on graph theory and mutual information quantity - Google Patents

Differential privacy measurement method based on graph theory and mutual information quantity Download PDF

Info

Publication number
CN110457940B
CN110457940B CN201910621081.XA CN201910621081A CN110457940B CN 110457940 B CN110457940 B CN 110457940B CN 201910621081 A CN201910621081 A CN 201910621081A CN 110457940 B CN110457940 B CN 110457940B
Authority
CN
China
Prior art keywords
graph
privacy
information
data set
differential privacy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910621081.XA
Other languages
Chinese (zh)
Other versions
CN110457940A (en
Inventor
彭长根
王毛妮
何文竹
丁兴
丁红发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou University
Original Assignee
Guizhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou University filed Critical Guizhou University
Priority to CN201910621081.XA priority Critical patent/CN110457940B/en
Publication of CN110457940A publication Critical patent/CN110457940A/en
Application granted granted Critical
Publication of CN110457940B publication Critical patent/CN110457940B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种一种基于图论和互信息量的差分隐私度量方法。本发明以信息论通信模型重构了差分隐私保护框架,构造了差分隐私的信息通信模型,将原始数据集表示为信源,发布数据集表示为信宿,查询机制和噪音机制表示为通信信道;所提出的差分隐私度量模型以信息通信模型为基础,利用图的特性结合信息熵给出隐私泄露量的互信息化计算方法,隐私泄露量的界仅依赖于原始数据集的属性数量、属性值数量及差分隐私预算参数,对任意分布的原始数据集,任意攻击能力的敌手都成立。本发明提出的差分隐度量方法可给出差分隐私保护的隐私泄露互信息上界,限制条件较少,适用于所有信道,且不依赖原始数据集的分布。

Figure 201910621081

The invention discloses a differential privacy measurement method based on graph theory and mutual information. The present invention reconstructs the differential privacy protection framework with the information theory communication model, constructs the information communication model of differential privacy, expresses the original data set as the information source, expresses the published data set as the sink, and expresses the query mechanism and the noise mechanism as the communication channel; The proposed differential privacy measurement model is based on the information communication model, and uses the characteristics of graphs combined with information entropy to give a mutual information calculation method for privacy leakage. The bound of privacy leakage only depends on the number of attributes and attribute values of the original data set. And differential privacy budget parameters, for any distributed original data set, any adversary with attack capability is established. The differential implicit metric method proposed by the present invention can provide the upper bound of privacy leakage mutual information for differential privacy protection, has fewer restrictive conditions, is applicable to all channels, and does not depend on the distribution of the original data set.

Figure 201910621081

Description

一种基于图论和互信息量的差分隐私度量方法A differential privacy measurement method based on graph theory and mutual information

技术领域Technical Field

本发明涉及信息安全技术领域,尤其是一种基于图论和互信息量的差分隐私度量方法。The present invention relates to the field of information security technology, and in particular to a differential privacy measurement method based on graph theory and mutual information.

背景技术Background Art

大数据时代的到来和移动互联网的普及,在产生巨大商业和社会价值的同时,引发了人们对隐私的广泛关注和担忧,更加隐蔽、多样的数据收集和存储及数据挖掘,导致隐私泄露和隐私窃取更加频繁,危害和影响更加巨大。一方面,数据拥有者未经任何保护处理直接发布含有隐私信息的数据,将会造成个人隐私信息的泄露;另一方面,恶意攻击者利用已成熟的数据挖掘等技术窃取发布数据中的敏感信息。因此,解决隐私泄露问题迫在眉睫。The advent of the big data era and the popularization of mobile Internet, while generating huge commercial and social value, have also triggered widespread concern and worry about privacy. More hidden and diverse data collection, storage, and data mining have led to more frequent privacy leaks and thefts, with greater harm and impact. On the one hand, data owners directly publish data containing private information without any protection, which will cause the leakage of personal privacy information; on the other hand, malicious attackers use mature data mining and other technologies to steal sensitive information in the published data. Therefore, it is urgent to solve the problem of privacy leakage.

数据的隐私保护问题研究已久,其最早可以追溯到1977年统计学家Dalenuis提出的数据库隐私信息的概念,他认为,在访问数据的过程中,即使攻击者拥有背景知识也无法获得关于任何个体的确切信息。在该定义下,相应的隐私保护模型及方法被相继提出。早期的隐私保护技术主要是基于匿名模型,基本思想是通过对记录中的准标识符进行匿名化处理,使得所有记录被划分为若干个等价类,从而实现将一条记录隐藏在另一组记录中。尽管传统的匿名保护模型及其衍生的算法模型能够在一定程度上保护用户的个人隐私信息,但是均无法抵御背景知识攻击、同质攻击和相似性攻击。知道2006年微软研究院的Dwork提出差分隐私保护概念,改模型忽略最大背景知识攻击,保证至多相差一条记录的邻近数据集在概率输出上具有不可区分性。The issue of data privacy protection has been studied for a long time. It can be traced back to the concept of database privacy information proposed by statistician Dalenuis in 1977. He believed that in the process of accessing data, even if the attacker has background knowledge, he cannot obtain accurate information about any individual. Under this definition, corresponding privacy protection models and methods have been proposed one after another. Early privacy protection technology was mainly based on anonymous models. The basic idea is to anonymize the quasi-identifiers in the records so that all records are divided into several equivalence classes, thereby hiding one record in another set of records. Although traditional anonymous protection models and their derived algorithm models can protect users' personal privacy information to a certain extent, they cannot resist background knowledge attacks, homogeneous attacks and similarity attacks. Until 2006, Dwork of Microsoft Research proposed the concept of differential privacy protection. This model ignores the maximum background knowledge attack and ensures that neighboring data sets that differ by at most one record are indistinguishable in probability output.

差分隐私保护是一种基于数据失真的隐私保护技术,通过在原始数据集或统计结果中添加噪声扰动来实现隐私保护,同时保持数据集中的某些数据属性或统计属性不变。差分隐私保护技术确保了数据集中单个记录的变化不会影响查询结果,即使攻击者具有无限背景知识也可以保证邻近数据集的查询具有概率不可区分性。Differential privacy protection is a privacy protection technology based on data distortion. It achieves privacy protection by adding noise perturbations to the original data set or statistical results, while keeping certain data attributes or statistical attributes in the data set unchanged. Differential privacy protection technology ensures that changes in a single record in the data set will not affect the query results, and even if the attacker has unlimited background knowledge, it can ensure that queries on adjacent data sets are probabilistically indistinguishable.

差分隐私保护根据实现环境不同可分为两大类:交互式差分隐私和非交互式差分隐私。交互式差分隐私保护机制是指用户通过查询接口想数据拥有者递交查询请求,数据拥有者根据查询请求在原始数据集中进行查询,然后将查询结果添加噪声扰动后反馈给用户。非交互式差分隐私保护机制是指数据管理者直接发布一个满足差分隐私保护后的发布数据集,再依据用户的请求对发布数据集进行查询操作。Differential privacy protection can be divided into two categories according to different implementation environments: interactive differential privacy and non-interactive differential privacy. The interactive differential privacy protection mechanism refers to the user submitting a query request to the data owner through the query interface. The data owner queries the original data set according to the query request, and then feeds back the query result to the user after adding noise disturbance. The non-interactive differential privacy protection mechanism refers to the data manager directly publishing a published data set that meets the differential privacy protection, and then querying the published data set according to the user's request.

差分隐私的隐私预算参数ε代表隐私保护强度,该参数的选取高度依赖经验,仍然缺乏有效的信息量化方法对差分隐私强度和隐私泄露量进行预先量化,因此,如何利用信息论的方法对其隐私泄漏量进行量化,对给定数据集的差分隐私保护程度上界的量化方法,已成为优化差分隐私算法和设计隐私风险评估方案的关键。The privacy budget parameter ε of differential privacy represents the strength of privacy protection. The selection of this parameter is highly dependent on experience. There is still a lack of effective information quantification methods to pre-quantify the strength of differential privacy and the amount of privacy leakage. Therefore, how to use information theory methods to quantify the amount of privacy leakage and the quantification method of the upper bound of the differential privacy protection degree for a given data set has become the key to optimizing differential privacy algorithms and designing privacy risk assessment schemes.

发明内容Summary of the invention

本发明所要解决的技术问题是提供一种基于图论和互信息量的差分隐私度量方法,它解决差分隐私保护机制中隐私泄露的量化中存在的难题:(1)目前对差分隐私保护的强度和效果仅能后验评估,且高度依赖于经验性选择的隐私预算参数ε,难以预先对隐私保护强度和隐私泄露量进行量化。(2)在差分隐私保护机制中,隐私预算ε一旦耗尽将会破坏差分隐私保护,隐私保护算法将失去其意义。现有的隐私度量方法主要是基于信息熵的隐私度量模型,如何将香农信息论与差分隐私结合对差分隐私保护机制中的隐私泄露进行量化,并证明求解差分隐私保护机制中隐私泄露的上界值是本发明重点解决的难点问题。The technical problem to be solved by the present invention is to provide a differential privacy measurement method based on graph theory and mutual information, which solves the difficulties in quantifying privacy leakage in differential privacy protection mechanisms: (1) At present, the strength and effect of differential privacy protection can only be evaluated a posteriori, and it is highly dependent on the privacy budget parameter ε selected empirically, making it difficult to quantify the privacy protection strength and privacy leakage in advance. (2) In the differential privacy protection mechanism, once the privacy budget ε is exhausted, the differential privacy protection will be destroyed, and the privacy protection algorithm will lose its meaning. The existing privacy measurement method is mainly based on the privacy measurement model of information entropy. How to combine Shannon information theory with differential privacy to quantify the privacy leakage in the differential privacy protection mechanism and prove the upper limit of the privacy leakage in the differential privacy protection mechanism is the difficult problem that the present invention focuses on solving.

本发明是这样实现的:基于图论与互信息量的差分隐私度量方法,包含如下步骤进行:The present invention is implemented as follows: A differential privacy measurement method based on graph theory and mutual information includes the following steps:

步骤1:首先以信息论通信模型重构差分隐私保护框架,构造差分隐私的信息通信模型,将差分隐私保护机制中原始数据集表示为信源,发布数据集表示为信宿,差分隐私保护机制表示为通信信道;Step 1: First, reconstruct the differential privacy protection framework based on the information theory communication model, construct the information communication model of differential privacy, represent the original data set in the differential privacy protection mechanism as the information source, represent the published data set as the information destination, and represent the differential privacy protection mechanism as the communication channel;

步骤2:构造隐私量化模型,将差分隐私通信模型中的通信信道建模为查询机制和噪音机制:Step 2: Construct a privacy quantification model and model the communication channel in the differential privacy communication model as a query mechanism and a noise mechanism:

步骤3:再将信源和信宿视为图形结构,以此将信道转移矩阵视为信源图和信宿图的复合图;Step 3: The signal source and the signal sink are then regarded as graph structures, so that the channel transfer matrix is regarded as a composite graph of the signal source graph and the signal sink graph;

步骤4:信道矩阵M转换为最大对角线矩阵M′;将信道矩阵M前n列中每一列元素的最大值移动到对角线上,矩阵M′仍满足ε-差分隐私且原始数据集与发布数据集间的条件熵H(X|Y)不变;Step 4: The channel matrix M is converted to the maximum diagonal matrix M′; the maximum value of each column element in the first n columns of the channel matrix M is moved to the diagonal. The matrix M′ still satisfies ε-differential privacy and the conditional entropy H(X|Y) between the original data set and the published data set remains unchanged.

步骤5:基于图的距离正则和点传递将最大对角线矩阵转换M′为汉明矩阵M″,使得对角线上的元素都相等且等于矩阵中的最大元素,即且原始数据集与发布数据集间的条件熵H(X|Y)不变;Step 5: Based on the graph distance regularization and point transfer, the maximum diagonal matrix M′ is converted into a Hamming matrix M″, so that the elements on the diagonal are equal and equal to the maximum element in the matrix, that is, the conditional entropy H(X|Y) between the original dataset and the published dataset remains unchanged;

步骤6:利用图的自同构、邻接关系,通过放缩公式的方法证明差分隐私保护机制隐私泄露量存在上界,并给出一个计算隐私泄露上界的公式。Step 6: Using the automorphism and adjacency relationship of the graph, we prove that there is an upper bound on the privacy leakage of the differential privacy protection mechanism through the scaling formula method, and give a formula for calculating the upper bound of privacy leakage.

所述步骤2当中基于通信机制的隐私量化模型,将差分隐私保护机制中原始数据集表示为信源,发布数据集表示为信宿,查询机制和噪音机制机制表示为通信信道。In the privacy quantification model based on the communication mechanism in step 2, the original data set in the differential privacy protection mechanism is represented as the source, the published data set is represented as the destination, and the query mechanism and the noise mechanism are represented as the communication channel.

步骤4和步骤5的基于图论的隐私量化方法,利用图的自同构、点传递和正则距离性质给出隐私泄露上界,其中自同构知点集V(G)上的置换σ称为图G的自同构,即对任意顶点v,v′∈V均有如果v~v′则σ(v)~σ(v′);点传递指图G中任意顶点v,v′∈V存在自同构使得σ(v)=v′,则称图G为点传递顶;距离正则指如果存在整数bd和cd(d∈0,1,K dmax)使得图G中任意顶点v,v′,其中d(v,v′)=d,顶点v有bd个邻点属于集合V<d+1>(v),顶点v'有cd个邻点属于集合V<d-1>(v),则称图G为距离正则图。The privacy quantification method based on graph theory in steps 4 and 5 uses the automorphism, point transitivity and regular distance properties of the graph to give an upper bound on privacy leakage, where the permutation σ on the automorphism point set V(G) is called the automorphism of the graph G, that is, for any vertex v, v′∈V, if v~v′ then σ(v)~σ(v′); point transitivity means that for any vertex v, v′∈V in the graph G, there exists an automorphism such that σ(v)=v′, then the graph G is called a point transitive vertex; distance regularity means that if there exist integers b d and c d (d∈0,1,K d max ) such that for any vertex v, v′ in the graph G, d(v,v′)=d, vertex v has b d neighbors belonging to the set V <d+1> (v), and vertex v' has c d neighbors belonging to the set V <d-1> (v), then the graph G is called a distance regular graph.

本发明以信息论通信模型重构了差分隐私保护框架,构造了差分隐私的信息通信模型,将原始数据集表示为信源,发布数据集表示为信宿,查询机制和噪音机制表示为通信信道;进一步将信源和信宿视为图,以此将信道转移矩阵视为信源图和信宿图的复合图,并基于图的距离正则和点传递将信道转移矩阵转换为汉明图,提出差分隐私的隐私泄露互信息量化方法;利用图的自同构、邻接关系,通过放缩公式的方法证明差分隐私保护机制隐私泄露量存在上界,并提出一个计算隐私泄露上界的公式。所提出的差分隐私度量模型以信息通信模型为基础,利用图的特性结合信息熵给出隐私泄露量的互信息化计算方法,隐私泄露量的界仅依赖于原始数据集的属性数量、属性值数量及差分隐私预算参数,对任意分布的原始数据集,任意攻击能力的敌手都成立。The present invention reconstructs the differential privacy protection framework based on the information theory communication model, constructs the differential privacy information communication model, represents the original data set as the information source, represents the published data set as the information destination, and represents the query mechanism and the noise mechanism as the communication channel; further regards the information source and the information destination as a graph, and thus regards the channel transfer matrix as a composite graph of the information source graph and the information destination graph, and converts the channel transfer matrix into a Hamming graph based on the distance regularization and point transfer of the graph, and proposes a privacy leakage mutual information quantification method for differential privacy; using the self-isomorphism and adjacency relationship of the graph, the method of scaling formula is used to prove that the privacy leakage amount of the differential privacy protection mechanism has an upper bound, and a formula for calculating the upper bound of privacy leakage is proposed. The proposed differential privacy measurement model is based on the information communication model, and uses the characteristics of the graph combined with information entropy to give a mutual information calculation method for the privacy leakage amount. The bound of the privacy leakage amount only depends on the number of attributes, the number of attribute values and the differential privacy budget parameter of the original data set, and it is applicable to any distributed original data set and any adversary with any attack capability.

方案中利用了图论与互信息量的结合,相对于传统的依赖于经验性选择隐私预算的后验评估方法,本方法不仅给出一种具体的隐私想、量化方法而且考虑到查询中隐私泄露的上界问题并利用图的自同构、邻接关系,通过放缩公式的方法证明差分隐私保护机制隐私泄露量存在上界且给出隐私泄露上界的计算公式。通过分析证明,本发明提出的差分隐度量方法可给出差分隐私保护的隐私泄露互信息上界,限制条件较少,适用于所有信道,且不依赖原始数据集的分布。The scheme uses a combination of graph theory and mutual information. Compared with the traditional a posteriori evaluation method that relies on empirical selection of privacy budget, this method not only provides a specific privacy concept and quantification method, but also considers the upper bound of privacy leakage in the query and uses the self-isomorphism and adjacency relationship of the graph. It proves that the privacy leakage of the differential privacy protection mechanism has an upper bound by the scaling formula and gives the calculation formula of the upper bound of privacy leakage. It is proved through analysis that the differential implicit measurement method proposed in the present invention can give the upper bound of the privacy leakage mutual information of differential privacy protection, with fewer restrictions, applicable to all channels, and independent of the distribution of the original data set.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的流程示意图;Fig. 1 is a schematic diagram of the process of the present invention;

图2为本发明的差分隐私度量模型图;FIG2 is a diagram of a differential privacy measurement model of the present invention;

图3为本发明的通信模型中信道矩阵转换图。FIG. 3 is a diagram showing a channel matrix conversion in the communication model of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图和实施对本发明做进一步的说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

本发明的实施例:一种基于图论与互信息量的差分隐私度量方法与技术流程如图1所示,包括6个步骤,构造差分隐私信道模型和隐私量化模型,再基于图论和互信息量给出度量方法并证明差分隐私保护机制存在隐私泄露上界且给出计算公式。An embodiment of the present invention: A differential privacy measurement method and technical process based on graph theory and mutual information is shown in Figure 1, which includes 6 steps, constructing a differential privacy channel model and a privacy quantification model, and then giving a measurement method based on graph theory and mutual information, proving that the differential privacy protection mechanism has an upper bound on privacy leakage and giving a calculation formula.

所述的通信模型中的信道转移过程如图2所示,包括两个步骤:信道矩阵M转换为最大对角线矩阵M′。将信道矩阵M前n列中每一列元素的最大值移动到对角线上;基于图的距离正则和点传递将最大对角线矩阵转换M′为汉明矩阵M″,使得对角线上的元素都相等且等于矩阵中的最大元素。The channel transfer process in the communication model is shown in FIG2 , and includes two steps: the channel matrix M is converted into the maximum diagonal matrix M′. The maximum value of each column element in the first n columns of the channel matrix M is moved to the diagonal; based on the distance regularization and point transfer of the graph, the maximum diagonal matrix M′ is converted into a Hamming matrix M″, so that the elements on the diagonal are equal and equal to the maximum element in the matrix.

基于图论和互信息量的差分隐私度量模型如图3所示,基于图2中的差分隐私度量模型对原始数据集与发布数据集间的隐私泄漏量进行量化,记隐私度量模型中发布数据集对原始数据集的最大隐私泄漏量为ML。首先,将原始数据集和分布数据及数据集视为无向图,构造出基于二元图形结构的信道矩阵(信道图),并利用图的距离正则和点传递对信道矩阵M进行转换处理得到汉明矩阵(汉明图);然后,通过对汉明矩阵的邻接关系证明原始数据集与发布数据集的条件熵存在下界,进一步利用汉明矩阵的对称性和自同构关系得到条件熵的下界,并对任意分布的输入数据集,利用互信息的计算方法,计算隐私泄露量的上界。The differential privacy measurement model based on graph theory and mutual information is shown in Figure 3. The privacy leakage between the original data set and the published data set is quantified based on the differential privacy measurement model in Figure 2. The maximum privacy leakage of the published data set to the original data set in the privacy measurement model is ML. First, the original data set, the distributed data set, and the data set are regarded as undirected graphs, and a channel matrix (channel graph) based on the binary graph structure is constructed. The channel matrix M is transformed by using the distance regularization and point transfer of the graph to obtain the Hamming matrix (Hamming graph); then, the conditional entropy of the original data set and the published data set is proved to have a lower bound by the adjacency relationship of the Hamming matrix. The symmetry and automorphism relationship of the Hamming matrix are further used to obtain the lower bound of the conditional entropy. For any distributed input data set, the upper bound of the privacy leakage is calculated by using the mutual information calculation method.

由条件熵定义知:According to the definition of conditional entropy:

Figure BDA0002125512330000051
Figure BDA0002125512330000051

再由信息熵的定义及均匀分布最大熵原理得According to the definition of information entropy and the principle of maximum entropy of uniform distribution, we can get

Figure BDA0002125512330000052
Figure BDA0002125512330000052

又因M″i,j≤maxM″,故Since M″ i,j ≤max M″ ,

Figure BDA0002125512330000061
Figure BDA0002125512330000061

又因

Figure BDA0002125512330000062
Figure BDA0002125512330000063
故Also because
Figure BDA0002125512330000062
and
Figure BDA0002125512330000063
Therefore

Figure BDA0002125512330000064
Figure BDA0002125512330000064

因信道矩阵转换为汉明矩阵后,原始数据集和发数据集间的条件熵不变,即HM(X|Y)=HM″(X|Y),故After the channel matrix is converted to the Hamming matrix, the conditional entropy between the original data set and the transmitted data set remains unchanged, that is, H M (X|Y) = H M ″(X|Y), so

H(X|Y)≥-log2maxM H(X|Y)≥-log 2 max M

由差分隐私的扩展定义知,假设信道矩阵M满足ε-差分隐私,则对于任意列j,以及任意一对行i和h(i~h),有According to the extended definition of differential privacy, assuming that the channel matrix M satisfies ε-differential privacy, then for any column j and any pair of rows i and h (i~h), we have

Figure BDA0002125512330000065
Figure BDA0002125512330000065

当h=j时,矩阵M″对角线上的元素相等且等于最大元素值,故,对于每一个元素M″i,jWhen h=j, the elements on the diagonal of the matrix M″ are equal and equal to the maximum element value, so for each element M″ i,j,

maxM″≤eεd(i,j)M″i,j max M″ ≤e εd(i,j) M″ i,j

又因为矩阵M”中任意行元素均为概率分布,则∑jM″i,j=1,故And because any row element in the matrix M" is a probability distribution, ∑ j M" i,j = 1, so

Figure BDA0002125512330000066
Figure BDA0002125512330000066

且根据图形结构元素的距离分组知得到And according to the distance grouping of the graphic structure elements, we can get

Figure BDA0002125512330000067
Figure BDA0002125512330000067

通过不等式变换得到By transforming the inequality, we get

Figure BDA0002125512330000071
Figure BDA0002125512330000071

若通信模型的输入图形结构为距离正则图和点传递,则对于每一个d∈SG,|X<d>(i)|值均相同且只取决于d,将其值记为Nd,即Nd=|X<d>(i)|。故If the input graph structure of the communication model is a distance regular graph and point transfer, then for each d∈S G , the value of |X <d> (i)| is the same and depends only on d. Its value is recorded as N d , that is, N d =|X <d> (i)|. Therefore

Figure BDA0002125512330000072
Figure BDA0002125512330000072

通过改变表示i的u元组中的个体的值,可以得到距x距离为d的每个元素j。这些个体有

Figure BDA0002125512330000073
种可能选择,每一种选择有(v-1)种可能情况,故By changing the value of the individual in the u-tuple representing i, each element j with a distance d from x can be obtained. These individuals have
Figure BDA0002125512330000073
There are (v-1) possible choices for each choice, so

Figure BDA0002125512330000074
Figure BDA0002125512330000074

but

Figure BDA0002125512330000075
Figure BDA0002125512330000075

Therefore

Figure BDA0002125512330000076
Figure BDA0002125512330000076

又因为Also because

Figure BDA0002125512330000081
Figure BDA0002125512330000081

Therefore

Figure BDA0002125512330000082
Figure BDA0002125512330000082

当原始数据集的概率分布为均匀分布时,信息熵有最大值,即H(X)=log2n=log2vu。根据互信息量的定义知When the probability distribution of the original data set is uniform, the information entropy reaches its maximum value, that is, H(X) = log 2 n = log 2 v u . According to the definition of mutual information,

Figure BDA0002125512330000083
Figure BDA0002125512330000083

由上述证明推到可知,当原始数据集的概率分布为均匀分布时,原始数据集有最大信息熵,此时互信息量泄露最大,故当原始数据集为任意概率分布时,结果仍然成立。因此,互信息量上界对原始数据集上的任意分布都是有效的。此外,由于在所提出的模型仍满足差分隐私机制,故互信息上界对对手可能具有的任何背景知识都是有效的。From the above proof, we can infer that when the probability distribution of the original data set is uniform, the original data set has the maximum information entropy, and the mutual information leakage is the largest at this time. Therefore, when the original data set is an arbitrary probability distribution, the result still holds. Therefore, the upper bound of the mutual information is valid for any distribution on the original data set. In addition, since the proposed model still satisfies the differential privacy mechanism, the upper bound of the mutual information is valid for any background knowledge that the adversary may have.

以上结合具体附图对本发明进行了详细的说明,这些并非构成对发明的限制。在不脱离本发明原理的情况下,本领域的技术人员还可以作出许多变形和改进,这些也应属于本发明的保护范围。The present invention has been described in detail above with reference to the specific drawings, which do not constitute a limitation of the invention. Without departing from the principle of the present invention, those skilled in the art may make many variations and improvements, which should also fall within the protection scope of the present invention.

Claims (3)

1. A differential privacy measurement method based on graph theory and mutual information content is characterized by comprising the following steps:
step 1: firstly, reconstructing a differential privacy protection framework by using an information theory communication model, constructing an information communication model of differential privacy, representing an original data set in a differential privacy protection mechanism as an information source, representing a released data set as an information sink, and representing the differential privacy protection mechanism as a communication channel;
step 2: constructing a privacy quantification model, and modeling a communication channel in the differential privacy communication model into a query mechanism and a noise mechanism:
and step 3: then, the information source and the information sink are regarded as graph structures, and the channel transfer matrix is regarded as a composite graph of the information source graph and the information sink graph;
and 4, step 4: converting the channel matrix M into a maximum diagonal matrix M'; moving the maximum value of each row of elements in the first n rows of the channel matrix M to a diagonal line, wherein the matrix M' still meets epsilon-difference privacy and the conditional entropy H (X | Y) between the original data set and the published data set is unchanged;
and 5: transforming the maximum diagonal matrix M 'to a Hamming matrix M' based on graph distance regularization and point transfer such that the elements on the diagonals are all equal and equal to the maximum element in the matrix, i.e., and the conditional entropy H (X | Y) between the original dataset and the published dataset is unchanged; the point transmission refers to that any vertex V in the graph G exists in a self-isomorphism mode, V 'is belonged to V, so that sigma (V) = V', and the graph G is called point transmission;
step 6: the self-isomorphic and adjacency relation of the graph is utilized, the privacy disclosure quantity of the differential privacy protection mechanism is proved to be stored in the upper bound through a scaling formula method, and a formula for calculating the upper bound of the privacy disclosure is provided;
defined by conditional entropy:
Figure FDA0004116093600000011
then the method is obtained by the definition of information entropy and the principle of uniformly distributing maximum entropy
Figure FDA0004116093600000021
Also due to M i,j ≤max M″ Therefore, it is
Figure FDA0004116093600000022
And due to
Figure FDA0004116093600000023
And->
Figure FDA0004116093600000024
Therefore->
Figure FDA0004116093600000025
After the channel matrix is converted into the hamming matrix, the conditional entropy between the original data set and the transmitted data set is unchanged, i.e. H M (X|Y)=H M″ (X | Y), so
H(X|Y)≥-log 2 max M
As defined by the extended definition of differential privacy, assuming that the channel matrix M satisfies ε -differential privacy, for any column j, and any pair of rows i and h (i-h), there is
Figure FDA0004116093600000026
When h = j, the elements on the diagonal of the matrix M "are equal and equal to the maximum element value, so for each element M ″ i,j Is provided with
max M″ ≤e εd(i,j) M″ i,j
Since any row element in the matrix M' is probability distribution, then sigma j M″ i,j Is not less than 1, therefore
Figure FDA0004116093600000031
And is obtained by grouping the distance of the graphic structure elements
Figure FDA0004116093600000032
Obtained by inequality transformation
Figure FDA0004116093600000033
If the input graph structure of the communication model is a distance regular graph and point transfer, for each d ∈ S G ,|X <d> (i) The values of | are identical and depend only on d, and are denoted as N d I.e. N d =|X <d> (i) L, |; therefore, it is
Figure FDA0004116093600000034
By changing the value of an individual in the u-tuple representing i, each element j at a distance d from x can be obtained; these individuals have
Figure FDA0004116093600000035
(v-1) possible choices for each choice, therefore
Figure FDA0004116093600000036
Then
Figure FDA0004116093600000041
Therefore, it is
Figure FDA0004116093600000042
And because of
Figure FDA0004116093600000043
Therefore, it is
Figure FDA0004116093600000044
When the probability distribution of the original data set is uniform, the entropy of the information has a maximum value, i.e., H (X) = log 2 n=log 2 v u (ii) a According to the definition of mutual information quantity
Figure FDA0004116093600000045
2. The differential privacy measurement method based on graph theory and mutual information quantity according to claim 1, characterized in that: in the step 2, based on the privacy quantization model of the communication mechanism, the original data set in the differential privacy protection mechanism is represented as an information source, the issued data set is represented as an information sink, and the inquiry mechanism and the noise mechanism are represented as communication channels.
3. The differential privacy measurement method based on graph theory and mutual information quantity according to claim 1, characterized in that: the privacy quantification method based on graph theory in the steps 4 and 5 utilizes the self-isomorphism, point transmission and regular distance properties of the graph to give an upper bound of privacy disclosure, wherein the substitution sigma on a self-isomorphism known point set V (G) is called the self-isomorphism of the graph G, namely, sigma (V) -sigma (V ') exists for any vertex V, V ' ∈ V if V-V '; point transfer refers to that any vertex V in the graph G exists in a self-isomorphism mode, V 'is belonged to V, so that sigma (V) = V', and the graph G is called point transfer; distance regular refers to the integer b if present d And c d (d∈0,1,...d max ) Let any vertex v, v 'in the graph G, where d (v, v') = d, vertex v has b d A neighbor belongs to the set V <d+1> (v) The vertex v' has c d A neighbor belongs to the set V <d-1> (v) Then, the graph G is called a distance regular graph.
CN201910621081.XA 2019-07-10 2019-07-10 Differential privacy measurement method based on graph theory and mutual information quantity Active CN110457940B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910621081.XA CN110457940B (en) 2019-07-10 2019-07-10 Differential privacy measurement method based on graph theory and mutual information quantity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910621081.XA CN110457940B (en) 2019-07-10 2019-07-10 Differential privacy measurement method based on graph theory and mutual information quantity

Publications (2)

Publication Number Publication Date
CN110457940A CN110457940A (en) 2019-11-15
CN110457940B true CN110457940B (en) 2023-04-11

Family

ID=68482567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910621081.XA Active CN110457940B (en) 2019-07-10 2019-07-10 Differential privacy measurement method based on graph theory and mutual information quantity

Country Status (1)

Country Link
CN (1) CN110457940B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117240982B (en) * 2023-11-09 2024-01-26 沐城测绘(北京)有限公司 Video desensitization method based on privacy protection
CN117371046B (en) * 2023-12-07 2024-03-01 清华大学 A data privacy enhancement method and device for multi-party collaborative optimization

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015026384A1 (en) * 2013-08-19 2015-02-26 Thomson Licensing Method and apparatus for utility-aware privacy preserving mapping against inference attacks
WO2015077542A1 (en) * 2013-11-22 2015-05-28 The Trustees Of Columbia University In The City Of New York Database privacy protection devices, methods, and systems
CN109766710A (en) * 2018-12-06 2019-05-17 广西师范大学 A Differential Privacy Protection Method for Linked Social Network Data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015026384A1 (en) * 2013-08-19 2015-02-26 Thomson Licensing Method and apparatus for utility-aware privacy preserving mapping against inference attacks
WO2015077542A1 (en) * 2013-11-22 2015-05-28 The Trustees Of Columbia University In The City Of New York Database privacy protection devices, methods, and systems
CN109766710A (en) * 2018-12-06 2019-05-17 广西师范大学 A Differential Privacy Protection Method for Linked Social Network Data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DP2G_(sister):差分隐私社交网络图发布模型;殷轶平等;《信息技术与网络安全》;20180610(第06期);全文 *
Group Differential Privacy-Preserving Disclosure of Multi-level Association Graphs;Balaji Palanisamy 等;《IEEE》;20170717;全文 *
基于层次随机图的社会网络差分隐私数据发布;张伟等;《南京邮电大学学报(自然科学版)》;20160629(第03期);全文 *
隐私保护的信息熵模型及其度量方法;彭长根;《软件学报》;20161231;第1891-1902页 *

Also Published As

Publication number Publication date
CN110457940A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
CN107871087B (en) Personalized differential privacy protection method for high-dimensional data release in distributed environment
Yang et al. Survey on improving data utility in differentially private sequential data publishing
Zhang et al. Towards accurate histogram publication under differential privacy
CN112329056B (en) A localized differential privacy method for government data sharing
CN101888341A (en) Access Control Method Based on Computable Reputation in Distributed Multi-trust Domain Environment
CN110457940B (en) Differential privacy measurement method based on graph theory and mutual information quantity
CN110866263B (en) A method and system for protecting user privacy information against vertical attacks
Yin et al. GANs Based Density Distribution Privacy‐Preservation on Mobility Data
CN106572111A (en) Big-data-oriented privacy information release exposure chain discovery method
Liu et al. Face image publication based on differential privacy
CN109766710B (en) A Differential Privacy Protection Method for Linked Social Network Data
CN108197492A (en) A kind of data query method and system based on difference privacy budget allocation
CN104573560A (en) Differential private data publishing method based on wavelet transformation
Yuan et al. Privacy‐preserving mechanism for mixed data clustering with local differential privacy
CN114328640A (en) A method and system for differential privacy protection and data mining based on dynamic sensitive data of mobile users
CN113094746A (en) High-dimensional data publishing method based on localized differential privacy and related equipment
CN108154185A (en) A kind of k-means clustering methods of secret protection
Zhou et al. Optimizing the numbers of queries and replies in convex federated learning with differential privacy
Reijsbergen et al. {TAP}: Transparent and {Privacy-Preserving} Data Services
CN114662157B (en) Block compressed sensing indistinguishable protection method and device for social text data stream
Li et al. A Differentially private hybrid decomposition algorithm based on quad-tree
CN113378223B (en) K-anonymous data processing method and system based on double coding and cluster mapping
CN113704787B (en) Privacy protection clustering method based on differential privacy
Delaney et al. Differentially private ad conversion measurement
Feng et al. Local differential privacy for unbalanced multivariate nominal attributes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant