CN115495778A - Differential privacy histogram publishing method and device based on grouping combination - Google Patents

Differential privacy histogram publishing method and device based on grouping combination Download PDF

Info

Publication number
CN115495778A
CN115495778A CN202211109967.4A CN202211109967A CN115495778A CN 115495778 A CN115495778 A CN 115495778A CN 202211109967 A CN202211109967 A CN 202211109967A CN 115495778 A CN115495778 A CN 115495778A
Authority
CN
China
Prior art keywords
merging
histogram
grouping
scheme
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211109967.4A
Other languages
Chinese (zh)
Inventor
孟博
张国兴
王德军
李子茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Kongtian Software Technology Co ltd
South Central Minzu University
Original Assignee
Wuhan Kongtian Software Technology Co ltd
South Central University for Nationalities
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Kongtian Software Technology Co ltd, South Central University for Nationalities filed Critical Wuhan Kongtian Software Technology Co ltd
Priority to CN202211109967.4A priority Critical patent/CN115495778A/en
Publication of CN115495778A publication Critical patent/CN115495778A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明提供了一种基于分组合并的差分隐私直方图发布方法及设备。所述方法包括:步骤S1至步骤S5。本发明通过对直方图采用分组合并的方式,对数据进行合理准确的划分,有效的提升了分组划分的准确性,进而降低了发布数据的误差,提升了数据的可用性,可以实现直方图分组最优划分,大幅度降低噪音对数据准确性带来的影响,在满足差分隐私约束的同时,有效的提升了数据的可用性和发布效率。

Figure 202211109967

The invention provides a differential privacy histogram publishing method and equipment based on group merging. The method includes: step S1 to step S5. The present invention divides the data reasonably and accurately by adopting the method of grouping and merging the histogram, which effectively improves the accuracy of grouping division, further reduces the error of publishing data, improves the usability of data, and can realize the optimal histogram grouping. Optimal division greatly reduces the impact of noise on data accuracy, and effectively improves data availability and release efficiency while meeting differential privacy constraints.

Figure 202211109967

Description

基于分组合并的差分隐私直方图发布方法及设备Method and device for publishing differentially private histograms based on group merging

技术领域technical field

本发明实施例涉及直方图数据隐私保护技术领域,尤其涉及一种基于分组合并的差分隐私直方图发布方法及设备。Embodiments of the present invention relate to the technical field of histogram data privacy protection, and in particular to a method and device for publishing differentially private histograms based on packet merging.

背景技术Background technique

直方图发布是数据发布领域一种广泛使用的技术,直方图可以直观明了的反应出数据的统计特征,用户能够依据直方图快速的获取到自己所需要的信息。然而,攻击者很容易对发布的直方图发动攻击,窃取用户隐私。因此,为了防止数据隐私泄露,数据发布者通常会使用差分隐私技术,对直方图进行去隐私化处理,但差分隐私在对数据进行隐私保护的同时也会带来一个问题:数据可用性降低;这个问题困扰着直方图数据发布领域的研究者。因此,开发一种基于分组合并的差分隐私直方图发布方法及设备,可以有效克服上述相关技术中的缺陷,就成为业界亟待解决的技术问题。Histogram publishing is a widely used technology in the field of data publishing. The histogram can intuitively and clearly reflect the statistical characteristics of the data, and users can quickly obtain the information they need based on the histogram. However, it is easy for an attacker to launch an attack on the published histogram and steal user privacy. Therefore, in order to prevent data privacy leaks, data publishers usually use differential privacy technology to deprivate the histogram, but differential privacy also brings about a problem while protecting data privacy: reduced data availability; Problems plague researchers in the field of histogram data distribution. Therefore, developing a differentially private histogram distribution method and device based on group merging, which can effectively overcome the above-mentioned defects in related technologies, has become a technical problem to be solved urgently in the industry.

发明内容Contents of the invention

针对现有技术存在的上述问题,本发明实施例提供了一种基于分组合并的差分隐私直方图发布方法及设备。In view of the above-mentioned problems existing in the prior art, embodiments of the present invention provide a method and device for publishing a differentially private histogram based on group merging.

第一方面,本发明的实施例提供了一种基于分组合并的差分隐私直方图发布方法,包括:S1:隐私预算划分,将隐私预算ε划分为ε1和ε2;S2:分组合并,首先设置最终合并分组数K,并将原始直方图H={h1,h2,...,hn}中的每个桶为一个单独的分组,得到分组集合g(g1,g2,...,gn);然后对g中分组进行两两合并,并遍历出所有的合并方案,计算出每个合并方案的方案距离,通过指数机制选取合并方案进行近似合并,并将合并后的方案视为一个新的分组,替代原有的两个分组,重复上述合并过程,直至达到最终分组数K,并得到最终合并方案,形成直方图分组G(G1,G2,...,Gk);其中:H={h1,h2,...,hn}表示原始直方图序列;h1,h2,...,hn表示直方图中的桶,n为原始直方图桶的总数;g(g1,g2,...,gn)表示合并前的初始分组集合;g1,g2,...,gn表示初始分组集合中的分组,每个分组由一个桶组成,因此初始分组总数为n;G(G1,G2,...,Gk)表示合并得到的最终分组集合,G1,G2,...,Gk表示最终分组集合中的分组,Gk表示第K个分组;S3:对G的每个分组求取均值得到

Figure BDA0003842726240000011
其中
Figure BDA0003842726240000012
表示求取均值后的均值分组集合;
Figure BDA0003842726240000013
表示第K个均值分组;S4:对
Figure BDA0003842726240000014
添加Laplace噪声得到
Figure BDA0003842726240000015
其中:
Figure BDA0003842726240000016
表示由均值分组集合添加噪音得到的噪音分组集合;
Figure BDA0003842726240000017
表示第K个添加噪音的分组;S5:对
Figure BDA0003842726240000021
恢复原始直方图顺序,得到发布的差分隐私直方图
Figure BDA0003842726240000022
其中:
Figure BDA0003842726240000023
表示由
Figure BDA0003842726240000024
恢复原始直方图顺序得到的差分隐私直方图;
Figure BDA0003842726240000025
表示直方图中添加噪音的桶;n表示差分隐私直方图桶的总数。In the first aspect, the embodiment of the present invention provides a differential privacy histogram publishing method based on group merging, including: S1: privacy budget division, privacy budget ε is divided into ε 1 and ε 2 ; S2: group merging, first Set the final combined grouping number K, and use each bucket in the original histogram H={h 1 ,h 2 ,...,h n } as a separate grouping to obtain the grouping set g(g 1 ,g 2 , ..., g n ); Then, pairwise merge the groups in g, and traverse all the merged schemes, calculate the scheme distance of each merged scheme, select the merged scheme through the index mechanism for approximate merger, and merge The plan is regarded as a new group, replacing the original two groups, repeating the above merging process until the final grouping number K is reached, and the final merging plan is obtained to form a histogram grouping G(G 1 ,G 2 ,... ,G k ); where: H={h 1 ,h 2 ,...,h n } represents the original histogram sequence; h 1 ,h 2 ,...,h n represent the buckets in the histogram, and n is The total number of original histogram buckets; g(g 1 ,g 2 ,...,g n ) represents the initial grouping set before merging; g 1 ,g 2 ,...,g n represents the grouping in the initial grouping set, Each group consists of a bucket, so the total number of initial groups is n; G(G 1 ,G 2 ,...,G k ) represents the final grouping set obtained by merging, G 1 ,G 2 ,...,G k Represents the grouping in the final grouping set, G k represents the Kth grouping; S3: calculate the mean value for each grouping of G to get
Figure BDA0003842726240000011
in
Figure BDA0003842726240000012
Indicates the mean value grouping set after calculating the mean value;
Figure BDA0003842726240000013
Indicates the Kth mean grouping; S4: pair
Figure BDA0003842726240000014
Add Laplace noise to get
Figure BDA0003842726240000015
in:
Figure BDA0003842726240000016
Represents the noise grouping set obtained by adding noise to the mean grouping set;
Figure BDA0003842726240000017
Indicates the Kth noise-added grouping; S5: Yes
Figure BDA0003842726240000021
Restore the original histogram order and get the published differential privacy histogram
Figure BDA0003842726240000022
in:
Figure BDA0003842726240000023
Indicated by
Figure BDA0003842726240000024
The differentially private histogram obtained by restoring the original histogram order;
Figure BDA0003842726240000025
Indicates the buckets in which noise is added in the histogram; n indicates the total number of differentially private histogram buckets.

在上述方法实施例内容的基础上,本发明实施例中提供的基于分组合并的差分隐私直方图发布方法,步骤S1中的隐私预算ε是给定的正值,并且ε1用于分组合并,ε2用于分组添加噪声。On the basis of the contents of the above-mentioned method embodiments, in the differential privacy histogram release method based on group merging provided in the embodiment of the present invention, the privacy budget ε in step S1 is a given positive value, and ε 1 is used for group merging, ε 2 is used to add noise in groups.

在上述方法实施例内容的基础上,本发明实施例中提供的基于分组合并的差分隐私直方图发布方法,步骤S2的实现具体包括:S2.1设置最终分组数K,其中,K=1,2,...,n。n为直方图的分组数;S2.2将原始直方图H={h1,h2,...,hn}中的每个桶视为一个单独的分组,得到分组集合g(g1,g2,...,gn);S2.3对直方图内的分组进行两两合并,遍历出所有可能的合并方案P(p1,p2,...,py);其中:P(p1,p2,...,py)表示合并方案集合;p1,p2,...,py表示有所有可能的合并方案;y表示方案总数;S2.4计算出每个合并方案的方案距离u(p,ps),并将其设置为效用函数:On the basis of the content of the above-mentioned method embodiment, the implementation of the differential privacy histogram publishing method based on group merging provided in the embodiment of the present invention specifically includes: S2.1 setting the final number of groups K, where K=1, 2,...,n. n is the grouping number of the histogram; S2.2 treats each bucket in the original histogram H={h 1 ,h 2 ,...,h n } as a separate grouping, and obtains the grouping set g(g 1 ,g 2 ,...,g n ); S2.3 Merge the groups in the histogram in pairs, and traverse all possible merge schemes P(p 1 ,p 2 ,...,p y ); where : P(p 1 ,p 2 ,...,p y ) represents the set of merging schemes; p 1 ,p 2 ,...,p y represents all possible merging schemes; y represents the total number of schemes; S2.4 Calculation Calculate the solution distance u(p,p s ) of each combined solution, and set it as the utility function:

Figure BDA0003842726240000026
Figure BDA0003842726240000026

其中:ps(gi,gj)为方案集合P中的一个合并方案,Ps中包含两个分组gi和gj,h为gi中的某个桶,h′表示gj中的某个桶,

Figure BDA0003842726240000027
表示分组之间的最小距离,该距离用来衡量分组之间的相似性,效用函数设置应满足要求:分组之间的距离越小,被合并的概率也就越大;为后续概率计算满足此要求,采用分组距离的相反数
Figure BDA0003842726240000028
来构造效用函数;S2.5利用指数机制结合方案距离计算出每个合并方案的抽样概率Pr(p,ps):Among them: p s (g i , g j ) is a merge plan in the plan set P, P s contains two groups g i and g j , h is a certain bucket in g i , h ′ represents the a bucket of
Figure BDA0003842726240000027
Indicates the minimum distance between groups, which is used to measure the similarity between groups, and the utility function setting should meet the requirements: the smaller the distance between groups, the greater the probability of being merged; for subsequent probability calculations, this Requirements, using the opposite number of grouping distance
Figure BDA0003842726240000028
to construct a utility function; S2.5 calculates the sampling probability Pr(p, p s ) of each merged scheme by using the exponential mechanism combined with the scheme distance:

Figure BDA0003842726240000029
Figure BDA0003842726240000029

其中,合并方案概率Pr(p,ps)表示合并方案Ps被选取概率,ε1为隐私预算;Δu为全局敏感度;u(p,ps)为Ps的效用函数;由全局敏感度的定义可知,在数据集中删除任意一条记录对效用函数的影响最大为1,因此Δu=1,y表示方案总数;

Figure BDA0003842726240000031
为合并方案Ps的适应度函数;分子计算的是合并方案Ps的适应度值,分母计算的是所有合并方案的适应度值的总和;S2.6根据每个合并方案的抽样概率;利用轮盘对合并方案进行选取;S2.7将选取到的合并方案进行合并,并将其视为一个新的分组,替代原有的两个分组;S2.8重复S2.3-S2.7,直到合并为K个分组,循环结束;S2.9返回最终合并方案G(G1,G2,...,Gk)。Among them, the combination scheme probability Pr(p, p s ) represents the probability of the combination scheme P s being selected, ε 1 is the privacy budget; Δu is the global sensitivity; u(p, p s ) is the utility function of P s ; The definition of degree shows that deleting any record in the data set has a maximum impact of 1 on the utility function, so Δu=1, y represents the total number of solutions;
Figure BDA0003842726240000031
is the fitness function of the merged scheme P s ; the numerator calculates the fitness value of the merged scheme P s , and the denominator calculates the sum of the fitness values of all merged schemes; S2.6 is based on the sampling probability of each merged scheme; using The roulette selects the merge plan; S2.7 merges the selected merge plan and treats it as a new group to replace the original two groups; S2.8 repeats S2.3-S2.7, Until K groups are merged, the loop ends; S2.9 returns the final merge solution G(G 1 ,G 2 ,...,G k ).

在上述方法实施例内容的基础上,本发明实施例中提供的基于分组合并的差分隐私直方图发布方法,步骤S2中,若直方图的桶数目为n个,则分组也为n个。On the basis of the content of the above-mentioned method embodiments, in the method for publishing differentially private histograms based on group merging provided in the embodiments of the present invention, in step S2, if the number of buckets in the histogram is n, then the number of groups is also n.

在上述方法实施例内容的基础上,本发明实施例中提供的基于分组合并的差分隐私直方图发布方法,步骤S4中对

Figure BDA00038427262400000310
添加的Laplace噪声的大小为ε2。On the basis of the contents of the above-mentioned method embodiments, in the method for issuing differentially private histograms based on group merging provided in the embodiments of the present invention, in step S4, the
Figure BDA00038427262400000310
The magnitude of the added Laplace noise is ε 2 .

在上述方法实施例内容的基础上,本发明实施例中提供的基于分组合并的差分隐私直方图发布方法,步骤S5中的差分隐私直方图

Figure BDA00038427262400000311
的维度为一维。On the basis of the content of the above-mentioned method embodiment, the differential privacy histogram release method based on group merging provided in the embodiment of the present invention, the differential privacy histogram in step S5
Figure BDA00038427262400000311
The dimension of is one-dimensional.

第二方面,本发明的实施例提供了一种基于分组合并的差分隐私直方图发布装置,包括:第一主模块,用于实现S1:隐私预算划分,将隐私预算ε划分为ε1和ε2;第二主模块,用于实现S2:分组合并,首先设置最终合并分组数K,并将原始直方图H={h1,h2,...,hn}中的每个桶为一个单独的分组,得到分组集合g(g1,g2,...,gn);然后对g中分组进行两两合并,并遍历出所有的合并方案,计算出每个合并方案的方案距离,通过指数机制选取合并方案进行近似合并,并将合并后的方案视为一个新的分组,替代原有的两个分组,重复上述合并过程,直至达到最终分组数K,并得到最终合并方案,形成直方图分组G(G1,G2,...,Gk);其中:H={h1,h2,...,hn}表示原始直方图序列;h1,h2,...,hn表示直方图中的桶,n为原始直方图桶的总数;g(g1,g2,...,gn)表示合并前的初始分组集合;g1,g2,...,gn表示初始分组集合中的分组,每个分组由一个桶组成,因此初始分组总数为n;G(G1,G2,...,Gk)表示合并得到的最终分组集合,G1,G2,...,Gk表示最终分组集合中的分组,Gk表示第K个分组;第三主模块,用于实现S3:对G的每个分组求取均值得到

Figure BDA0003842726240000032
其中
Figure BDA0003842726240000033
表示求取均值后的均值分组集合;
Figure BDA0003842726240000034
表示第K个均值分组;第四主模块,用于实现S4:对
Figure BDA0003842726240000035
添加Laplace噪声得到
Figure BDA0003842726240000036
其中:
Figure BDA0003842726240000037
表示由均值分组集合添加噪音得到的噪音分组集合;
Figure BDA0003842726240000038
表示第K个添加噪音的分组;第五主模块,用于实现S5:对
Figure BDA0003842726240000039
恢复原始直方图顺序,得到发布的差分隐私直方图
Figure BDA0003842726240000041
其中:
Figure BDA0003842726240000042
表示由
Figure BDA0003842726240000043
恢复原始直方图顺序得到的差分隐私直方图;
Figure BDA0003842726240000044
表示直方图中添加噪音的桶;n表示差分隐私直方图桶的总数。In the second aspect, the embodiment of the present invention provides a differential privacy histogram distribution device based on group merging, including: a first main module, used to realize S1: divide the privacy budget, and divide the privacy budget ε into ε1 and ε 2 ; the second main module is used to realize S2: group merging, firstly set the final merging group number K, and convert each bucket in the original histogram H={h 1 ,h 2 ,...,h n } to A single grouping, get the grouping set g(g 1 ,g 2 ,...,g n ); then merge the groups in g in pairs, traverse all the merging schemes, and calculate the scheme of each merging scheme Distance, select the merge plan through the index mechanism for approximate merge, and regard the merged plan as a new group, replace the original two groups, repeat the above merge process until the final number of groups K is reached, and the final merge plan is obtained , forming a histogram group G(G 1, G 2 ,...,G k ); where: H={h 1 ,h 2 ,...,h n } represents the original histogram sequence; h 1 ,h 2 ,...,h n represent the buckets in the histogram, n is the total number of buckets in the original histogram; g(g 1 ,g 2 ,...,g n ) represents the initial grouping set before merging; g 1 ,g 2 ,...,g n represent the groups in the initial grouping set, each group consists of a bucket, so the total number of initial groups is n; G(G 1 ,G 2 ,...,G k ) represents the combined The final grouping set, G 1 , G 2 ,..., G k represent the groups in the final grouping set, and G k represents the Kth grouping; the third main module is used to realize S3: for each grouping of G mean get
Figure BDA0003842726240000032
in
Figure BDA0003842726240000033
Indicates the mean value grouping set after calculating the mean value;
Figure BDA0003842726240000034
Represents the Kth mean value grouping; the fourth main module is used to realize S4: pair
Figure BDA0003842726240000035
Add Laplace noise to get
Figure BDA0003842726240000036
in:
Figure BDA0003842726240000037
Represents the noise grouping set obtained by adding noise to the mean grouping set;
Figure BDA0003842726240000038
Represents the Kth noise-added grouping; the fifth main module is used to realize S5: right
Figure BDA0003842726240000039
Restore the original histogram order and get the published differential privacy histogram
Figure BDA0003842726240000041
in:
Figure BDA0003842726240000042
Indicated by
Figure BDA0003842726240000043
The differentially private histogram obtained by restoring the original histogram order;
Figure BDA0003842726240000044
Indicates the buckets in which noise is added in the histogram; n indicates the total number of differentially private histogram buckets.

第三方面,本发明的实施例提供了一种电子设备,包括:In a third aspect, an embodiment of the present invention provides an electronic device, including:

至少一个处理器;以及at least one processor; and

与处理器通信连接的至少一个存储器,其中:at least one memory communicatively coupled to the processor, wherein:

存储器存储有可被处理器执行的程序指令,处理器调用程序指令能够执行第一方面的各种实现方式中任一种实现方式所提供的基于分组合并的差分隐私直方图发布方法。The memory stores program instructions that can be executed by the processor, and the processor calls the program instructions to execute the differential privacy histogram publishing method based on group merging provided by any one of the various implementations of the first aspect.

第四方面,本发明的实施例提供了一种非暂态计算机可读存储介质,非暂态计算机可读存储介质存储计算机指令,计算机指令使计算机执行第一方面的各种实现方式中任一种实现方式所提供的基于分组合并的差分隐私直方图发布方法。In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute any one of the various implementations of the first aspect. A differentially private histogram publishing method based on group merging provided by an implementation.

本发明实施例提供的基于分组合并的差分隐私直方图发布方法及设备,通过对直方图采用分组合并的方式,对数据进行合理准确的划分,有效的提升了分组划分的准确性,进而降低了发布数据的误差,提升了数据的可用性,可以实现直方图分组最优划分,大幅度降低噪音对数据准确性带来的影响,在满足差分隐私约束的同时,有效的提升了数据的可用性和发布效率。The differentially private histogram release method and device based on group merging provided by the embodiments of the present invention divide the data reasonably and accurately by adopting the method of group merging for the histogram, which effectively improves the accuracy of group division and further reduces the The error of published data improves the availability of data, can realize the optimal division of histogram grouping, greatly reduces the impact of noise on data accuracy, and effectively improves the availability and release of data while satisfying differential privacy constraints. efficiency.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图做一简单的介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will give a brief introduction to the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明实施例提供的基于分组合并的差分隐私直方图发布方法流程图;Fig. 1 is the flow chart of the differential privacy histogram release method based on packet merging provided by the embodiment of the present invention;

图2为本发明实施例提供的基于分组合并的差分隐私直方图发布装置结构示意图;FIG. 2 is a schematic structural diagram of a differentially private histogram distribution device based on packet merging provided by an embodiment of the present invention;

图3为本发明实施例提供的电子设备的实体结构示意图。FIG. 3 is a schematic diagram of a physical structure of an electronic device provided by an embodiment of the present invention.

具体实施方式detailed description

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。另外,本发明提供的各个实施例或单个实施例中的技术特征可以相互任意结合,以形成可行的技术方案,这种结合不受步骤先后次序和/或结构组成模式的约束,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时,应当认为这种技术方案的结合不存在,也不在本发明要求的保护范围之内。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention. In addition, the technical features in each embodiment or a single embodiment provided by the present invention can be combined arbitrarily with each other to form a feasible technical solution. This combination is not restricted by the sequence of steps and/or structural composition mode, but it must be Based on the ability of those skilled in the art to realize, when the combination of technical solutions is contradictory or unrealizable, it should be considered that such combination of technical solutions does not exist and is not within the protection scope of the present invention.

本发明对隐私预算进行划分;设置最终分组数K,并将直方图中的每个桶视为一个单独的分组,采用分组合并的方式,遍历出所有可能的合并方案。通过计算每个合并方案的方案距离结合指数机制,对合并方案进行近似选取,并将选取到的合并方案进行合并,替代原有两个分组;然后重复分组合并过程,直至形成最终分组方案。最后对得到的直方图分组求取均值,添加Laplace噪声并恢复其原有顺序,得到发布的差分隐私直方图。基于这种思想,本发明实施例提供了一种基于分组合并的差分隐私直方图发布方法,参见图1,该方法包括:S1:隐私预算划分,将隐私预算ε划分为ε1和ε2;S2:分组合并,首先设置最终合并分组数K,并将原始直方图H={h1,h2,...,hn}中的每个桶为一个单独的分组,得到分组集合g(g1,g2,...,gn);然后对g中分组进行两两合并,并遍历出所有的合并方案,计算出每个合并方案的方案距离,通过指数机制选取合并方案进行近似合并,并将合并后的方案视为一个新的分组,替代原有的两个分组,重复上述合并过程,直至达到最终分组数K,并得到最终合并方案,形成直方图分组G(G1,G2,...,Gk);其中:H={h1,h2,...,hn}表示原始直方图序列;h1,h2,...,hn表示直方图中的桶,n为原始直方图桶的总数;g(g1,g2,...,gn)表示合并前的初始分组集合;g1,g2,...,gn表示初始分组集合中的分组,每个分组由一个桶组成,因此初始分组总数为n;G(G1,G2,...,Gk)表示合并得到的最终分组集合,G1,G2,...,Gk表示最终分组集合中的分组,Gk表示第K个分组;S3:对G的每个分组求取均值得到

Figure BDA0003842726240000051
其中
Figure BDA0003842726240000052
表示求取均值后的均值分组集合;
Figure BDA0003842726240000053
表示第K个均值分组;S4:对
Figure BDA0003842726240000054
添加Laplace噪声得到
Figure BDA0003842726240000055
其中:
Figure BDA0003842726240000056
表示由均值分组集合添加噪音得到的噪音分组集合;
Figure BDA0003842726240000057
表示第K个添加噪音的分组;S5:对
Figure BDA0003842726240000058
恢复原始直方图顺序,得到发布的差分隐私直方图
Figure BDA0003842726240000059
其中:
Figure BDA00038427262400000510
表示由
Figure BDA00038427262400000511
恢复原始直方图顺序得到的差分隐私直方图;
Figure BDA00038427262400000512
表示直方图中添加噪音的桶;n表示差分隐私直方图桶的总数。The present invention divides the privacy budget; sets the final number of groups K, regards each bucket in the histogram as a separate group, and uses group merging to traverse all possible merging schemes. By calculating the combination index mechanism of the scheme distance of each merging scheme, the merging scheme is approximately selected, and the selected merging scheme is merged to replace the original two groups; then the grouping and merging process is repeated until the final grouping scheme is formed. Finally, calculate the average value of the obtained histogram group, add Laplace noise and restore its original order, and obtain the published differential privacy histogram. Based on this idea, the embodiment of the present invention provides a differential privacy histogram publishing method based on group merging, see FIG. 1, the method includes: S1: privacy budget division, privacy budget ε is divided into ε 1 and ε 2 ; S2: Grouping and merging. First, set the final number of merged groups K, and use each bucket in the original histogram H={h 1 ,h 2 ,...,h n } as a separate group to obtain the grouping set g( g 1 ,g 2 ,...,g n ); then merge the groups in g in pairs, and traverse all the merged schemes, calculate the scheme distance of each merged scheme, and select the merged scheme through the index mechanism for approximation Merge, and treat the merged plan as a new group, replace the original two groups, repeat the above merging process until the final number of groups K is reached, and obtain the final merged plan to form a histogram group G(G 1 , G 2 ,...,G k ); where: H={h 1 ,h 2 ,...,h n } represents the original histogram sequence; h 1 ,h 2 ,...,h n represents the histogram In the bucket, n is the total number of buckets in the original histogram; g(g 1, g 2 ,...,g n ) represents the initial grouping set before merging; g 1 , g 2 ,..., g n represents the initial Grouping in the grouping set, each grouping consists of a bucket, so the total number of initial groups is n; G(G 1 ,G 2 ,...,G k ) represents the final grouping set obtained by merging, G 1 ,G 2 , ..., G k represents the grouping in the final grouping set, G k represents the Kth grouping; S3: calculate the mean value for each grouping of G to get
Figure BDA0003842726240000051
in
Figure BDA0003842726240000052
Indicates the mean value grouping set after calculating the mean value;
Figure BDA0003842726240000053
Indicates the Kth mean grouping; S4: pair
Figure BDA0003842726240000054
Add Laplace noise to get
Figure BDA0003842726240000055
in:
Figure BDA0003842726240000056
Represents the noise grouping set obtained by adding noise to the mean grouping set;
Figure BDA0003842726240000057
Indicates the Kth noise-added grouping; S5: Yes
Figure BDA0003842726240000058
Restore the original histogram order and get the published differential privacy histogram
Figure BDA0003842726240000059
in:
Figure BDA00038427262400000510
Indicated by
Figure BDA00038427262400000511
The differentially private histogram obtained by restoring the original histogram order;
Figure BDA00038427262400000512
Indicates the buckets in which noise is added in the histogram; n indicates the total number of differentially private histogram buckets.

基于上述方法实施例的内容,作为一种可选的实施例,本发明实施例中提供的基于分组合并的差分隐私直方图发布方法,步骤S1中的隐私预算ε是给定的正值,并且ε1用于分组合并,ε2用于分组添加噪声。Based on the content of the above-mentioned method embodiment, as an optional embodiment, in the differential privacy histogram distribution method based on group merging provided in the embodiment of the present invention, the privacy budget ε in step S1 is a given positive value, and ε 1 is used for group merging, ε 2 is used for group adding noise.

基于上述方法实施例的内容,作为一种可选的实施例,本发明实施例中提供的基于分组合并的差分隐私直方图发布方法,步骤S2的实现具体包括:S2.1设置最终分组数K,其中,K=1,2,...,n。n为直方图的分组数;S2.2将原始直方图H={h1,h2,...,hn}中的每个桶视为一个单独的分组,得到分组集合g(g1,g2,...,gn);S2.3对直方图内的分组进行两两合并,遍历出所有可能的合并方案P(p1,p2,...,py);其中:P(p1,p2,...,py)表示合并方案集合;p1,p2,...,py表示有所有可能的合并方案;y表示方案总数;S2.4计算出每个合并方案的方案距离u(p,ps),并将其设置为效用函数:Based on the content of the above-mentioned method embodiment, as an optional embodiment, the implementation of step S2 of the differential privacy histogram release method based on group merging provided in the embodiment of the present invention specifically includes: S2.1 Setting the final number of groups K , where K=1,2,...,n. n is the grouping number of the histogram; S2.2 treats each bucket in the original histogram H={h 1 ,h 2 ,...,h n } as a separate grouping, and obtains the grouping set g(g 1 ,g 2 ,...,g n ); S2.3 Merge the groups in the histogram in pairs, and traverse all possible merge schemes P(p 1 ,p 2 ,...,p y ); where : P(p 1 ,p 2 ,...,p y ) represents the set of merging schemes; p 1 ,p 2 ,...,p y represents all possible merging schemes; y represents the total number of schemes; S2.4 Calculation Calculate the solution distance u(p,p s ) of each combined solution, and set it as the utility function:

Figure BDA0003842726240000061
Figure BDA0003842726240000061

其中:ps(gi,gj)为方案集合P中的一个合并方案,Ps中包含两个分组gi和gj,h为gi中的某个桶,h′表示gj中的某个桶,

Figure BDA0003842726240000062
表示分组之间的最小距离,该距离用来衡量分组之间的相似性,效用函数设置应满足要求:分组之间的距离越小,被合并的概率也就越大;为后续概率计算满足此要求,采用分组距离的相反数
Figure BDA0003842726240000063
来构造效用函数;S2.5利用指数机制结合方案距离计算出每个合并方案的抽样概率Pr(p,ps):Among them: p s (g i , g j ) is a merge plan in the plan set P, P s contains two groups g i and g j , h is a certain bucket in g i , h ′ represents the a bucket of
Figure BDA0003842726240000062
Indicates the minimum distance between groups, which is used to measure the similarity between groups, and the utility function setting should meet the requirements: the smaller the distance between groups, the greater the probability of being merged; for subsequent probability calculations, this Requirements, using the opposite number of grouping distance
Figure BDA0003842726240000063
to construct a utility function; S2.5 calculates the sampling probability Pr(p, p s ) of each merged scheme by using the exponential mechanism combined with the scheme distance:

Figure BDA0003842726240000064
Figure BDA0003842726240000064

其中,合并方案概率Pr(p,ps)表示合并方案Ps被选取概率,ε1为隐私预算;Δu为全局敏感度;u(p,ps)为Ps的效用函数;由全局敏感度的定义可知,在数据集中删除任意一条记录对效用函数的影响最大为1,因此Δu=1,y表示方案总数;

Figure BDA0003842726240000065
为合并方案Ps的适应度函数;分子计算的是合并方案Ps的适应度值,分母计算的是所有合并方案的适应度值的总和;S2.6根据每个合并方案的抽样概率;利用轮盘对合并方案进行选取;S2.7将选取到的合并方案进行合并,并将其视为一个新的分组,替代原有的两个分组;S2.8重复S2.3-S2.7,直到合并为K个分组,循环结束;S2.9返回最终合并方案G(G1,G2,...,Gk)。Among them, the combination scheme probability Pr(p, p s ) represents the probability of the combination scheme P s being selected, ε 1 is the privacy budget; Δu is the global sensitivity; u(p, p s ) is the utility function of P s ; The definition of degree shows that deleting any record in the data set has a maximum impact of 1 on the utility function, so Δu=1, y represents the total number of solutions;
Figure BDA0003842726240000065
is the fitness function of the merged scheme P s ; the numerator calculates the fitness value of the merged scheme P s , and the denominator calculates the sum of the fitness values of all merged schemes; S2.6 is based on the sampling probability of each merged scheme; using The roulette selects the merge plan; S2.7 merges the selected merge plan and treats it as a new group to replace the original two groups; S2.8 repeats S2.3-S2.7, Until K groups are merged, the loop ends; S2.9 returns the final merge solution G(G 1 ,G 2 ,...,G k ).

基于上述方法实施例的内容,作为一种可选的实施例,本发明实施例中提供的基于分组合并的差分隐私直方图发布方法,步骤S2中,若直方图的桶数目为n个,则分组也为n个。Based on the content of the above-mentioned method embodiment, as an optional embodiment, in the method for publishing a differentially private histogram based on group merging provided in the embodiment of the present invention, in step S2, if the number of buckets in the histogram is n, then There are also n groups.

基于上述方法实施例的内容,作为一种可选的实施例,本发明实施例中提供的基于分组合并的差分隐私直方图发布方法,步骤S4中对

Figure BDA0003842726240000071
添加的Laplace噪声的大小为ε2。Based on the content of the above-mentioned method embodiment, as an optional embodiment, the differential privacy histogram distribution method based on group merging provided in the embodiment of the present invention, in step S4
Figure BDA0003842726240000071
The magnitude of the added Laplace noise is ε 2 .

基于上述方法实施例的内容,作为一种可选的实施例,本发明实施例中提供的基于分组合并的差分隐私直方图发布方法,步骤S5中的差分隐私直方图

Figure BDA0003842726240000072
的维度为一维。具体地,隐私性证明如下:设分组合并过程为M1,求均值添加噪声过程为M2。首先,合并过程中每次以正比于
Figure BDA0003842726240000073
的概率对合并方案进行选择,整个过程不会导致隐私泄露,因此合并过程满足ε1-差分隐私。由于对均值分组添加的噪声为ε2,因此过程M2满足ε2-差分隐私。又因为ε=ε12,由差分隐私的组合性质可知,该方法整体满足ε-差分隐私。Based on the content of the above method embodiment, as an optional embodiment, the differential privacy histogram publishing method based on group merging provided in the embodiment of the present invention, the differential privacy histogram in step S5
Figure BDA0003842726240000072
The dimension of is one-dimensional. Specifically, the proof of privacy is as follows: Let the process of group merging be M 1 , and the process of calculating the mean and adding noise is M 2 . First, each time during the merge process is proportional to
Figure BDA0003842726240000073
The probability of is to choose the merging scheme, and the whole process will not lead to privacy leakage, so the merging process satisfies ε 1 -differential privacy. Since the noise added to the mean grouping is ε 2 , the process M 2 satisfies ε 2 -differential privacy. And because ε=ε 12 , it can be seen from the combination property of differential privacy that this method satisfies ε-differential privacy as a whole.

本发明实施例提供的基于分组合并的差分隐私直方图发布方法,通过对直方图采用分组合并的方式,对数据进行合理准确的划分,有效的提升了分组划分的准确性,进而降低了发布数据的误差,提升了数据的可用性,可以实现直方图分组最优划分,大幅度降低噪音对数据准确性带来的影响,在满足差分隐私约束的同时,有效的提升了数据的可用性和发布效率。The differential privacy histogram publishing method based on group merging provided by the embodiment of the present invention divides the data reasonably and accurately by adopting the method of group merging for the histogram, which effectively improves the accuracy of group division and reduces the number of published data. The error improves the availability of data, realizes the optimal division of histogram grouping, greatly reduces the impact of noise on data accuracy, and effectively improves data availability and release efficiency while meeting differential privacy constraints.

本发明各个实施例的实现基础是通过具有处理器功能的设备进行程序化的处理实现的。因此在工程实际中,可以将本发明各个实施例的技术方案及其功能封装成各种模块。基于这种现实情况,在上述各实施例的基础上,本发明的实施例提供了一种基于分组合并的差分隐私直方图发布装置,该装置用于执行上述方法实施例中的基于分组合并的差分隐私直方图发布方法。参见图2,该装置包括:第一主模块,用于实现S1:隐私预算划分,将隐私预算ε划分为ε1和ε2;第二主模块,用于实现S2:分组合并,首先设置最终合并分组数K,并将原始直方图H={h1,h2,...,hn}中的每个桶为一个单独的分组,得到分组集合g(g1,g2,...,gn);然后对g中分组进行两两合并,并遍历出所有的合并方案,计算出每个合并方案的方案距离,通过指数机制选取合并方案进行近似合并,并将合并后的方案视为一个新的分组,替代原有的两个分组,重复上述合并过程,直至达到最终分组数K,并得到最终合并方案,形成直方图分组G(G1,G2,...,Gk);其中:H={h1,h2,...,hn}表示原始直方图序列;h1,h2,...,hn表示直方图中的桶,n为原始直方图桶的总数;g(g1,g2,...,gn)表示合并前的初始分组集合;g1,g2,...,gn表示初始分组集合中的分组,每个分组由一个桶组成,因此初始分组总数为n;G(G1,G2,...,Gk)表示合并得到的最终分组集合,G1,G2,...,Gk表示最终分组集合中的分组,Gk表示第K个分组;第三主模块,用于实现S3:对G的每个分组求取均值得到

Figure BDA0003842726240000081
其中
Figure BDA0003842726240000082
表示求取均值后的均值分组集合;
Figure BDA0003842726240000083
表示第K个均值分组;第四主模块,用于实现S4:对
Figure BDA0003842726240000084
添加Laplace噪声得到
Figure BDA0003842726240000085
其中:
Figure BDA0003842726240000086
表示由均值分组集合添加噪音得到的噪音分组集合;
Figure BDA0003842726240000087
表示第K个添加噪音的分组;第五主模块,用于实现S5:对
Figure BDA0003842726240000088
恢复原始直方图顺序,得到发布的差分隐私直方图
Figure BDA0003842726240000089
其中:
Figure BDA00038427262400000810
表示由
Figure BDA00038427262400000811
恢复原始直方图顺序得到的差分隐私直方图;
Figure BDA00038427262400000812
表示直方图中添加噪音的桶;n表示差分隐私直方图桶的总数。The implementation basis of each embodiment of the present invention is implemented by a device with a processor function performing programmed processing. Therefore, in engineering practice, the technical solutions and functions of the various embodiments of the present invention can be encapsulated into various modules. Based on this reality, on the basis of the above-mentioned embodiments, an embodiment of the present invention provides a differential privacy histogram publishing device based on packet merging, which is used to implement the packet merging-based Differentially private histogram publishing method. Referring to Fig. 2, the device includes: a first main module, used to realize S1: privacy budget division, divide the privacy budget ε into ε 1 and ε 2 ; a second main module, used to realize S2: group merging, first set the final Merge the number of groups K, and use each bucket in the original histogram H={h 1 ,h 2 ,...,h n } as a separate group to obtain a grouping set g(g 1 ,g 2 ,.. ., g n ); then merge the groups in g in pairs, traverse all the merged schemes, calculate the scheme distance of each merged scheme, select the merged scheme through the index mechanism for approximate merger, and combine the merged scheme As a new group, replace the original two groups, repeat the above merging process until the final grouping number K is reached, and the final merging scheme is obtained to form a histogram grouping G(G 1 ,G 2 ,...,G k ); where: H={h 1 ,h 2 ,...,h n } represents the original histogram sequence; h 1 ,h 2 ,...,h n represents the bucket in the histogram, and n is the original histogram The total number of graph buckets; g(g 1 ,g 2 ,...,g n ) represents the initial grouping set before merging; g 1 ,g 2 ,...,g n represent the groups in the initial grouping set, each Grouping consists of a bucket, so the total number of initial groups is n; G(G 1 ,G 2 ,...,G k ) represents the final grouping set obtained by merging, and G 1 ,G 2 ,...,G k represents the final Grouping in the grouping set, G k represents the Kth grouping; the third main module is used to realize S3: calculate the mean value for each grouping of G to obtain
Figure BDA0003842726240000081
in
Figure BDA0003842726240000082
Indicates the mean value grouping set after calculating the mean value;
Figure BDA0003842726240000083
Represents the Kth mean value grouping; the fourth main module is used to realize S4: pair
Figure BDA0003842726240000084
Add Laplace noise to get
Figure BDA0003842726240000085
in:
Figure BDA0003842726240000086
Represents the noise grouping set obtained by adding noise to the mean grouping set;
Figure BDA0003842726240000087
Represents the Kth noise-added grouping; the fifth main module is used to realize S5: right
Figure BDA0003842726240000088
Restore the original histogram order and get the published differential privacy histogram
Figure BDA0003842726240000089
in:
Figure BDA00038427262400000810
Indicated by
Figure BDA00038427262400000811
The differentially private histogram obtained by restoring the original histogram order;
Figure BDA00038427262400000812
Indicates the buckets in which noise is added in the histogram; n indicates the total number of differentially private histogram buckets.

本发明实施例提供的基于分组合并的差分隐私直方图发布装置,采用图2中的若干模块,通过对直方图采用分组合并的方式,对数据进行合理准确的划分,有效的提升了分组划分的准确性,进而降低了发布数据的误差,提升了数据的可用性,可以实现直方图分组最优划分,大幅度降低噪音对数据准确性带来的影响,在满足差分隐私约束的同时,有效的提升了数据的可用性和发布效率。The differential privacy histogram release device based on group merging provided by the embodiment of the present invention adopts several modules in Fig. 2, and divides the data reasonably and accurately by adopting the method of group merging for the histogram, which effectively improves the efficiency of group division. Accuracy, which in turn reduces the error of published data, improves the availability of data, can realize the optimal division of histogram grouping, and greatly reduces the impact of noise on data accuracy. While meeting differential privacy constraints, it can effectively improve Improved data availability and release efficiency.

需要说明的是,本发明提供的装置实施例中的装置,除了可以用于实现上述方法实施例中的方法外,还可以用于实现本发明提供的其他方法实施例中的方法,区别仅仅在于设置相应的功能模块,其原理与本发明提供的上述装置实施例的原理基本相同,只要本领域技术人员在上述装置实施例的基础上,参考其他方法实施例中的具体技术方案,通过组合技术特征获得相应的技术手段,以及由这些技术手段构成的技术方案,在保证技术方案具备实用性的前提下,就可以对上述装置实施例中的装置进行改进,从而得到相应的装置类实施例,用于实现其他方法类实施例中的方法。例如:It should be noted that, in addition to implementing the methods in the above method embodiments, the devices in the device embodiments provided by the present invention can also be used to realize the methods in other method embodiments provided by the present invention. The only difference is that Corresponding functional modules are set, the principle of which is basically the same as that of the above-mentioned device embodiment provided by the present invention, as long as those skilled in the art refer to the specific technical solutions in other method embodiments on the basis of the above-mentioned device embodiment, through combination technology The corresponding technical means for obtaining the characteristics, and the technical solutions composed of these technical means, on the premise of ensuring the practicability of the technical solutions, the devices in the above device embodiments can be improved, so as to obtain corresponding device embodiments, Used to implement methods in other method class embodiments. E.g:

基于上述装置实施例的内容,作为一种可选的实施例,本发明实施例中提供的基于分组合并的差分隐私直方图发布装置,还包括:第一子模块,用于实现步骤S1中的隐私预算ε是给定的正值,并且ε1用于分组合并,ε2用于分组添加噪声。Based on the content of the above-mentioned device embodiment, as an optional embodiment, the device for issuing a differentially private histogram based on group merging provided in the embodiment of the present invention further includes: a first submodule, configured to implement the step S1 The privacy budget ε is a given positive value, and ε1 is used for group merging and ε2 is used for group adding noise.

基于上述装置实施例的内容,作为一种可选的实施例,本发明实施例中提供的基于分组合并的差分隐私直方图发布装置,还包括:第二子模块,用于实现步骤S2的实现具体包括:S2.1设置最终分组数K,其中,K=1,2,...,n。n为直方图的分组数;S2.2将原始直方图H={h1,h2,...,hn}中的每个桶视为一个单独的分组,得到分组集合g(g1,g2,...,gn);S2.3对直方图内的分组进行两两合并,遍历出所有可能的合并方案P(p1,p2,...,py);其中:P(p1,p2,...,py)表示合并方案集合;p1,p2,...,py表示有所有可能的合并方案;y表示方案总数;S2.4计算出每个合并方案的方案距离u(p,ps),并将其设置为效用函数:Based on the content of the above-mentioned device embodiment, as an optional embodiment, the device for issuing a differentially private histogram based on group merging provided in the embodiment of the present invention further includes: a second submodule, configured to implement step S2 Specifically include: S2.1 Set the final grouping number K, where K=1, 2,...,n. n is the grouping number of the histogram; S2.2 treats each bucket in the original histogram H={h 1 ,h 2 ,...,h n } as a separate grouping, and obtains the grouping set g(g 1 ,g 2 ,...,g n ); S2.3 Merge the groups in the histogram in pairs, and traverse all possible merge schemes P(p 1 ,p 2 ,...,p y ); where : P(p 1 ,p 2 ,...,p y ) represents the set of merging schemes; p 1 ,p 2 ,...,p y represents all possible merging schemes; y represents the total number of schemes; S2.4 Calculation Calculate the solution distance u(p,p s ) of each combined solution, and set it as the utility function:

Figure BDA0003842726240000091
Figure BDA0003842726240000091

其中:ps(gi,gj)为方案集合P中的一个合并方案,Ps中包含两个分组gi和gj,h为gi中的某个桶,h′表示gj中的某个桶,

Figure BDA0003842726240000092
表示分组之间的最小距离,该距离用来衡量分组之间的相似性,效用函数设置应满足要求:分组之间的距离越小,被合并的概率也就越大;为后续概率计算满足此要求,采用分组距离的相反数
Figure BDA0003842726240000093
来构造效用函数;S2.5利用指数机制结合方案距离计算出每个合并方案的抽样概率Pr(p,ps):Among them: p s (g i , g j ) is a merge plan in the plan set P, P s contains two groups g i and g j , h is a certain bucket in g i , h ′ represents the a bucket of
Figure BDA0003842726240000092
Indicates the minimum distance between groups, which is used to measure the similarity between groups, and the utility function setting should meet the requirements: the smaller the distance between groups, the greater the probability of being merged; for subsequent probability calculations, this Requirements, using the opposite number of grouping distance
Figure BDA0003842726240000093
to construct a utility function; S2.5 calculates the sampling probability Pr(p, p s ) of each merged scheme by using the exponential mechanism combined with the scheme distance:

Figure BDA0003842726240000094
Figure BDA0003842726240000094

其中,合并方案概率Pr(p,ps)表示合并方案Ps被选取概率,ε1为隐私预算;Δu为全局敏感度;u(p,ps)为Ps的效用函数;由全局敏感度的定义可知,在数据集中删除任意一条记录对效用函数的影响最大为1,因此Δu=1,y表示方案总数;

Figure BDA0003842726240000095
为合并方案Ps的适应度函数;分子计算的是合并方案Ps的适应度值,分母计算的是所有合并方案的适应度值的总和;S2.6根据每个合并方案的抽样概率;利用轮盘对合并方案进行选取;S2.7将选取到的合并方案进行合并,并将其视为一个新的分组,替代原有的两个分组;S2.8重复S2.3-S2.7,直到合并为K个分组,循环结束;S2.9返回最终合并方案G(G1,G2,...,Gk)。Among them, the combination scheme probability Pr(p, p s ) represents the probability of the combination scheme P s being selected, ε 1 is the privacy budget; Δu is the global sensitivity; u(p, p s ) is the utility function of P s ; The definition of degree shows that deleting any record in the data set has a maximum impact of 1 on the utility function, so Δu=1, y represents the total number of solutions;
Figure BDA0003842726240000095
is the fitness function of the merged scheme P s ; the numerator calculates the fitness value of the merged scheme P s , and the denominator calculates the sum of the fitness values of all merged schemes; S2.6 is based on the sampling probability of each merged scheme; using The roulette selects the merge plan; S2.7 merges the selected merge plan and treats it as a new group to replace the original two groups; S2.8 repeats S2.3-S2.7, Until K groups are merged, the loop ends; S2.9 returns the final merge solution G(G 1 ,G 2 ,...,G k ).

基于上述装置实施例的内容,作为一种可选的实施例,本发明实施例中提供的基于分组合并的差分隐私直方图发布装置,还包括:第三子模块,用于实现步骤S2中,若直方图的桶数目为n个,则分组也为n个。Based on the content of the above-mentioned device embodiment, as an optional embodiment, the device for issuing a differentially private histogram based on group merging provided in the embodiment of the present invention further includes: a third submodule, configured to implement step S2, If the number of buckets in the histogram is n, then the number of groups is also n.

基于上述装置实施例的内容,作为一种可选的实施例,本发明实施例中提供的基于分组合并的差分隐私直方图发布装置,还包括:第四子模块,用于实现步骤S4中对

Figure BDA0003842726240000101
添加的Laplace噪声的大小为ε2。Based on the content of the above-mentioned device embodiment, as an optional embodiment, the device for issuing a differentially private histogram based on group merging provided in the embodiment of the present invention further includes: a fourth submodule, configured to realize the
Figure BDA0003842726240000101
The magnitude of the added Laplace noise is ε 2 .

基于上述装置实施例的内容,作为一种可选的实施例,本发明实施例中提供的基于分组合并的差分隐私直方图发布装置,还包括:第五子模块,用于实现步骤S5中的差分隐私直方图

Figure BDA0003842726240000102
的维度为一维。Based on the content of the above-mentioned device embodiment, as an optional embodiment, the device for issuing a differentially private histogram based on group merging provided in the embodiment of the present invention further includes: a fifth submodule, configured to implement the step S5 Differential Privacy Histogram
Figure BDA0003842726240000102
The dimension of is one-dimensional.

本发明实施例的方法是依托电子设备实现的,因此对相关的电子设备有必要做一下介绍。基于此目的,本发明的实施例提供了一种电子设备,如图3所示,该电子设备包括:至少一个处理器(processor)、通信接口(Communications Interface)、至少一个存储器(memory)和通信总线,其中,至少一个处理器,通信接口,至少一个存储器通过通信总线完成相互间的通信。至少一个处理器可以调用至少一个存储器中的逻辑指令,以执行前述各个方法实施例提供的方法的全部或部分步骤。The method in the embodiment of the present invention is realized by relying on electronic equipment, so it is necessary to introduce related electronic equipment. Based on this purpose, an embodiment of the present invention provides an electronic device. As shown in FIG. 3 , the electronic device includes: at least one processor, a communication interface, at least one memory and a communication A bus, wherein at least one processor, a communication interface, and at least one memory complete mutual communication through the communication bus. At least one processor can invoke logic instructions in at least one memory to execute all or part of the steps of the methods provided by the foregoing method embodiments.

此外,上述的至少一个存储器中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个方法实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above logic instructions in the at least one memory can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each method embodiment of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes. .

以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件实现。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的一些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general-purpose hardware platform, and of course can also be implemented by hardware. Based on this understanding, the essence of the above technical solution or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic discs, optical discs, etc., including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.

附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。基于这种认识,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. Based on this understanding, each block in the flowchart or block diagram may represent a module, program segment or part of code, which contains one or more Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.

在本专利中,术语"包括"、"包含"或者其任何其它变体意在涵盖非排它性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其它要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句"包括……"限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。In this patent, the terms "comprises," "comprises," or any other variation thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a set of elements includes not only those elements, but also Include other elements not expressly listed, or also include elements inherent in the process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims (9)

1. A differential privacy histogram release method based on packet merging is characterized by comprising the following steps: s1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And epsilon 2 (ii) a S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Every bucket in the tree is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then, pairwise merging the groups in the group G, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an index mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H = 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is a total of 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing an initial packet set before merging; g 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Representing the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; s3: averaging each group of G to obtain
Figure FDA0003842726230000011
Wherein
Figure FDA0003842726230000012
Representing a mean grouping set after the mean value is solved;
Figure FDA0003842726230000013
represents the Kth mean grouping; s4: to pair
Figure FDA0003842726230000014
Adding Laplace noise to obtain
Figure FDA0003842726230000015
Wherein:
Figure FDA0003842726230000016
representing a noise packet set obtained by adding noise to the mean packet set;
Figure FDA0003842726230000017
a packet representing the kth added noise; s5: to pair
Figure FDA0003842726230000018
Recovering the original histogram order to obtain the issued differential privacy histogram
Figure FDA0003842726230000019
Wherein:
Figure FDA00038427262300000110
is represented by
Figure FDA00038427262300000111
Recovering the difference privacy histogram obtained by the original histogram sequence;
Figure FDA00038427262300000112
a bucket representing the added noise in the histogram; n denotes the total number of differential privacy histogram buckets.
2. The packet-based of claim 1Merged differential privacy histogram distribution method, characterized in that the privacy budget ε in step S1 is a given positive value, and ε 1 is used for packet merging 2 For packet-added noise.
3. The method for distributing the differential privacy histogram based on the grouping combination according to claim 2, wherein the step S2 is implemented specifically by: s2.1 setting a final number of groups K, where K =1, 2.., n, n is the number of groups of the histogram; s2.2 original histogram H = { H = 1 ,h 2 ,...,h n Every bucket in the tree is treated as a single packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a S2.3, pairwise merging is carried out on the groups in the histogram, and all possible merging schemes P (P) are traversed 1 ,p 2 ,...,p y ) (ii) a Wherein: p (P) 1 ,p 2 ,...,p y ) Representing a set of merging schemes; p is a radical of formula 1 ,p 2 ,...,p y Indicating that there are all possible merging scenarios; y represents the total number of recipes; s2.4 calculate the solution distance u (p, p) for each merging solution s ) And set it as utility function:
Figure FDA0003842726230000021
wherein: p is a radical of formula s (g i ,g j ) For a merging scheme of a set of schemes P, P s In which two packets g are included i And g j H is g i A certain bucket in (1), h' represents g j Is provided to the one of the buckets in the drum,
Figure FDA0003842726230000022
representing the minimum distance between the packets, which is used to measure the similarity between the packets, the utility function setting should satisfy the requirement: the smaller the distance between the packets, the greater the probability of being merged; for the subsequent probability calculation to meet the requirement, the inverse number of the grouping distance is adopted
Figure FDA0003842726230000023
To construct a utility function; s2.5 calculating the sampling probability Pr (p, p) of each merging scheme by using an exponential mechanism and combining scheme distances s ):
Figure FDA0003842726230000024
Wherein, the probability Pr (p, p) of the merging scheme s ) Represents a merging scheme P s Probability of being selected, epsilon 1 Is a privacy budget; Δ u is global sensitivity; u (p, p) s ) Is P s A utility function of (a); as can be seen from the definition of global sensitivity, deleting any record in the data set has a maximum influence on the utility function of 1, so Δ u =1, y represents the total number of solutions;
Figure FDA0003842726230000025
for merging scheme P s A fitness function of (a); the numerator calculates the merging scheme P s The denominator calculates the sum of the fitness values of all the merging schemes; s2.6, according to the sampling probability of each merging scheme; selecting a merging scheme by using a wheel disc; s2.7, merging the selected merging schemes, and regarding the merging schemes as a new group to replace the original two groups; s2.8, repeating S2.3-S2.7 until the groups are combined into K groups, and ending the cycle; s2.9 Return to the Final Merge scheme G (G) 1 ,G 2 ,...,G k )。
4. The method according to claim 3, wherein in step S2, if the number of the histogram buckets is n, the number of the groups is n.
5. The method for distributing the differential privacy histogram based on the grouping combination as claimed in claim 4, wherein the step S4 is performed on
Figure FDA0003842726230000031
The magnitude of the added Laplace noise is epsilon 2
6. The method according to claim 5, wherein the differential privacy histogram distribution method in step S5 is characterized in that the differential privacy histogram distribution method in step S5 is implemented by using a plurality of different privacy histograms
Figure FDA0003842726230000032
Is one-dimensional.
7. A differential privacy histogram distribution apparatus based on packet merging, comprising: a first master module, configured to implement S1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And ε 2 (ii) a A second master module, configured to implement S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Each bucket in the set of buckets is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then merging the groups in the group G pairwise, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an exponential mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing an initial packet set before merging; g is a radical of formula 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Representing the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in a final set of packets,G k Represents the kth packet; a third main module, configured to implement S3: averaging each group of G to obtain
Figure FDA0003842726230000033
Wherein
Figure FDA0003842726230000034
Representing a mean grouping set after mean value calculation;
Figure FDA0003842726230000035
represents the Kth mean grouping; a fourth master module, configured to implement S4: to pair
Figure FDA0003842726230000036
Adding Laplace noise to obtain
Figure FDA0003842726230000037
Wherein:
Figure FDA0003842726230000038
representing a noise packet set obtained by adding noise to the mean packet set;
Figure FDA0003842726230000039
a packet representing the kth added noise; a fifth master module, configured to implement S5: to pair
Figure FDA00038427262300000310
Restoring the original histogram sequence to obtain the published differential privacy histogram
Figure FDA00038427262300000311
Wherein:
Figure FDA00038427262300000312
is represented by
Figure FDA00038427262300000313
Recovering the original histogram sequence to obtain a differential privacy histogram;
Figure FDA00038427262300000314
a bucket representing the added noise in the histogram; n represents the total number of differential privacy histogram buckets.
8. An electronic device, comprising:
at least one processor, at least one memory, and a communication interface; wherein,
the processor, the memory and the communication interface are communicated with each other;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 6.
CN202211109967.4A 2022-09-13 2022-09-13 Differential privacy histogram publishing method and device based on grouping combination Pending CN115495778A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211109967.4A CN115495778A (en) 2022-09-13 2022-09-13 Differential privacy histogram publishing method and device based on grouping combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211109967.4A CN115495778A (en) 2022-09-13 2022-09-13 Differential privacy histogram publishing method and device based on grouping combination

Publications (1)

Publication Number Publication Date
CN115495778A true CN115495778A (en) 2022-12-20

Family

ID=84467535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211109967.4A Pending CN115495778A (en) 2022-09-13 2022-09-13 Differential privacy histogram publishing method and device based on grouping combination

Country Status (1)

Country Link
CN (1) CN115495778A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688614A (en) * 2024-02-01 2024-03-12 杭州海康威视数字技术股份有限公司 Differential privacy protection data availability enhancement method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688614A (en) * 2024-02-01 2024-03-12 杭州海康威视数字技术股份有限公司 Differential privacy protection data availability enhancement method and device and electronic equipment
CN117688614B (en) * 2024-02-01 2024-04-30 杭州海康威视数字技术股份有限公司 Differential privacy protection data availability enhancement method and device and electronic equipment

Similar Documents

Publication Publication Date Title
Wang et al. Circuit oram: On tightness of the goldreich-ostrovsky lower bound
JP2024052988A5 (en)
Chestnut et al. Hardness and approximation for network flow interdiction
CN106533778A (en) Method for identifying key node of command and control network based on hierarchical flow betweenness
CN114726634B (en) Knowledge graph-based hacking scene construction method and device
CN114328640A (en) A method and system for differential privacy protection and data mining based on dynamic sensitive data of mobile users
CN111125750B (en) Database watermark embedding and detecting method and system based on double-layer ellipse model
JP6310345B2 (en) Privacy protection device, privacy protection method, and database creation method
CN113687975A (en) Data processing method, device, equipment and storage medium
CN115495778A (en) Differential privacy histogram publishing method and device based on grouping combination
CN114978510A (en) Security processing method and device for privacy vector
CN104090952A (en) Method and system for estimating average value of data flow under sliding window
CN112488297B (en) Neural network pruning method, model generation method and device
CN118036764A (en) Method, apparatus, device and storage medium for quantum phase encoding
Li et al. A new closed frequent itemset mining algorithm based on GPU and improved vertical structure
CN113572721A (en) Abnormal access detection method and device, electronic equipment and storage medium
CN117014318B (en) Method, device, equipment and medium for adding links between multi-scale network nodes
CN116488909B (en) A power Internet of Things network security protection method based on hierarchical expansion of data dimensions
CN113592529B (en) Potential customer recommendation method and device for bond products
Martinsson Most edge‐orderings of K n have maximal altitude
CN104933248B (en) Road network approximate shortest path computational methods on multi-core platform
CN109951275A (en) Key generation method, device, computer equipment and storage medium
CN113760876B (en) A data filtering method and device
CN115601593A (en) Image classification method, device, equipment and medium
CN114611713A (en) Method and system for constructing tree model based on longitudinal federated learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination