CN112667712B - Grouped accurate histogram data publishing method based on differential privacy - Google Patents

Grouped accurate histogram data publishing method based on differential privacy Download PDF

Info

Publication number
CN112667712B
CN112667712B CN202011637291.7A CN202011637291A CN112667712B CN 112667712 B CN112667712 B CN 112667712B CN 202011637291 A CN202011637291 A CN 202011637291A CN 112667712 B CN112667712 B CN 112667712B
Authority
CN
China
Prior art keywords
histogram
grouping
noise
function
bucket
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011637291.7A
Other languages
Chinese (zh)
Other versions
CN112667712A (en
Inventor
陶陶
李思文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Technology AHUT
Original Assignee
Anhui University of Technology AHUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Technology AHUT filed Critical Anhui University of Technology AHUT
Priority to CN202011637291.7A priority Critical patent/CN112667712B/en
Publication of CN112667712A publication Critical patent/CN112667712A/en
Application granted granted Critical
Publication of CN112667712B publication Critical patent/CN112667712B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a grouping accurate histogram data issuing method based on differential privacy, and belongs to the technical field of data privacy protection. The invention provides a Grouping Accurate Histogram data issuing method (AGHP) based on differential privacy, which is characterized in that firstly, based on a smooth Grouping idea, an exponential mechanism is utilized to carry out global approximate sequencing on the frequency of original Histogram buckets; secondly, a dynamic programming algorithm is provided, global grouping with optimal error balance is realized on the ordered histogram, and grouping reconstruction errors and noise errors are balanced; and finally, adding Laplace noise to the grouped histogram and then issuing. The algorithm effectively reduces the error of histogram data release on the premise of meeting the differential privacy, improves the usability of the histogram data release, and expands the practical application of the differential privacy technology theory.

Description

Grouped accurate histogram data publishing method based on differential privacy
Technical Field
The invention relates to the technical field of data privacy protection, in particular to a grouping accurate histogram data issuing method based on differential privacy.
Background
Under the background of a big data era, a large amount of personal information data is generated every day, and the information digitization technology enables various organizations to easily collect a large amount of information data, issue statistical results in various forms and conduct data analysis research. Although the analysis and mining results of the data can help people analyze and research things, the problem that private information is stolen can be caused in the actual information publishing process.
Histogram is a common technique for visually displaying data distribution characteristics, and is often used to issue statistical data. The technique partitions data into disjoint buckets by some attribute and then represents the data characteristics with a bucket frequency. If we directly publish the statistical histogram without privacy protection in the information publishing process, an attacker can deduce user data by combining background knowledge and the real count of the histogram bucket, and the privacy of the user is revealed.
At present, as a new privacy protection model, differential privacy has many applications in histogram distribution technology. The method achieves the effect of protecting privacy by converting original data and adding noise to a statistical result, most of the existing histogram distribution technologies based on differential privacy are to add noise and reconstruct a histogram, and the reconstruction generally adopts a method of combining buckets adjacent to positions and then taking an average value so as to reduce the global sensitivity. However, this method cannot measure buckets with similar frequencies in the global range, which may result in a large reconstruction error when reconstructing the packet. It is therefore necessary to consider the ordering of bucket counts prior to reconstruction. In addition, most of the current common grouping methods are fixed-length grouping or greedy clustering grouping, and these methods cannot well balance reconstruction errors and noise errors, which can cause the usability of the issued histogram to be reduced. Therefore, the balance of reconstruction errors, noise errors and the like is achieved during global grouping, and the usability of published data is improved while the differential privacy is met.
Through retrieval, the Chinese patent number: ZL201811273045.0, filed as: 10 and 30 months in 2018, and the invention name is as follows: a privacy protection method for data release. In the application, corresponding batch data is obtained from a database according to a batch query request submitted to a data open platform by a user, random noise meeting the given differential privacy protection requirement is added to the batch data, and finally, a noise disturbance result is returned to the user in a histogram issuing mode. However, the method performs secondary noise addition on data, which results in a large data error, and does not perform filtering operation on the data after noise addition, and although the privacy of the data can be ensured, the usability of the data is not considered.
As another example, the chinese patent application No.: ZL202010573117.4, filed as: in 2020, on 22 months and 06 months, the invention name is: a data publishing method based on differential privacy is provided. In the application, laplacian denoising is performed on input histogram data, then the denoised data is subjected to filtering operation, then the denoised histograms are sequenced according to frequency values by a reordering method, and finally the minimum SSE grouping is found according to a clustering strategy of a dynamic planning idea. However, according to the method, the privacy cost is high due to the fact that excessive noise is added, and then the method adopts a dynamic programming method to reconstruct the ordered histogram, only the reconstruction error of the grouping is considered, and the balance between the reconstruction error and the noise error caused by the grouping is not considered.
Based on the above analysis, a histogram data distribution method satisfying the difference privacy and having less error generated in the distribution process is needed.
Disclosure of Invention
1. Technical problem to be solved by the invention
In view of the problem that the existing histogram distribution encryption method cannot give consideration to the difference privacy and the usability of the histogram, the group precise histogram data distribution method based on the difference privacy reduces noise errors added in the histogram distribution process, effectively balances the group reconstruction errors and the noise errors, improves the usability of the histogram, and improves the usability of data on the premise of not revealing the privacy when statistical data are distributed.
2. Technical scheme
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the invention discloses a grouping accurate histogram data issuing method based on differential privacy, which comprises the following steps:
step one, obtaining a numerical histogram statistical data field, and inputting the histogram frequency into a histogram data set H = (H) 1 ,H 2 ,…H n ) In (1), the privacy protecting budgets ε and Δ f are given simultaneously, Δ f is the L of the data set H and its neighboring data sets 1 A distance;
step two, using the first histogram bucket H 1 Is a base barrel H i Adding it to the ordered histogram sequence H * And delete the bucket from H;
step three, calculating a base barrel H i Neighbor bucket set L (H) i ) And an exponential scoring function u (H) i ) According to a scoring function u (H) i ) In proportion to
Figure BDA0002878853260000021
From L (H) to i ) Select out H j Wherein the privacy budget ε 1 = ε/2, reaction of H with j Adding to an ordered histogram sequence H * Then, H is introduced j As a base barrel;
step four, repeating the step three until the original histogram data set H is empty;
step five, the ordered histogram sequence H is paired * Performing dynamic programming grouping according to the global error Err, and selecting a histogram grouping structure H with the minimum global error G
Step six, describing the bucket frequency of the grouping by the grouping average number, and adding Laplace noise Lap (1/epsilon) to each bucket frequency 2 ) Obtaining a histogram sequence after adding noise
Figure BDA0002878853260000022
And releasing.
Further, in the step one, each H i The frequency of the unit interval is that the privacy protection budget epsilon is less than 1.
Furthermore, in the third step, the score function u (H) is determined i ) From the base bucket H i Neighbor bucket set L (H) i ) Selecting out the buckets with frequency similar to that of the base bucket, wherein L (H) i ) And u (H) i ) Calculated according to the formula (1) and the formula (2) respectively,
L(H i )={H j :|H j -H i |≤δ} (1)
u(H i )=-(|H j -H i |+|j-i|) (2)
where δ is a threshold that controls the number of buckets in the neighbor bucket set.
Further, in the fifth step, the dynamically planned error evaluation function is set as the global error Err (, H) l ,H r ) As shown in the formula (3),
Figure BDA0002878853260000031
wherein
Figure BDA0002878853260000032
Frequency mean, G, representing the group i And | represents the number of buckets in the group.
Figure BDA0002878853260000033
In order to reconstruct the error AE,
Figure BDA0002878853260000034
is a noise error. Wherein the privacy budget ε 2 = ε/2, which is determined to be the group mean
Figure BDA0002878853260000035
The magnitude of the added Laplace noise
Figure BDA0002878853260000036
The added Laplace noise has a size of Lap (1/epsilon) 2 )/|G i |。
Furthermore, in the fifth step, grouping the histogram H by using the dynamic programming concept, and recording the minimum global error of each grouping structure
Figure BDA0002878853260000037
Select T of Err Lowest packet structure H G And recording the optimal grouping number k, as shown in formula (6):
Figure BDA0002878853260000038
where n is the number of histogram buckets, k is the number of all possible groupings, and k is 1 ≦ n.
Furthermore, in the sixth step, the grouped histogram H is processed G Taking group mean value of frequency of each group of buckets in group G i The middle histogram bucket frequency is:
Figure BDA0002878853260000039
post-pair Laplace noise Lap (b) per barrel frequency, where b =1/ε 2 Obtaining the histogram sequence after adding noise
Figure BDA00028788532600000310
Wherein
Figure BDA00028788532600000311
Furthermore, in the sixth step, the noise adding process to the original data set is to construct a probability density function obeying laplace distribution, an inverse cumulative distribution function is obtained according to the probability density function, and then uniformly distributed random variables are input to the function, so that laplace noise can be obtained.
Further, the specific steps for obtaining the laplace noise are,
s1, setting a constructed obedience position parameter mu to be 0, setting Laplace distribution with a scale parameter b to be Lap (b), and enabling a probability density function p (x) to be shown as a formula (7),
Figure BDA00028788532600000312
s2, random variables alpha-U (0,1) meeting the uniform distribution are substituted into an inverse function of the Laplace cumulative distribution function, and then the noise value meeting the condition can be obtained as shown in a formula (8):
Figure BDA00028788532600000313
s3, uniformly distributing alpha to U (-0.5,0.5), and combining the piecewise function of the formula (8) into a formula (9) as follows:
F -1 (x) =0-b + sign (α) × ln (1-2 abs (α)) (9) wherein the sign function is used to obtain the sign and the abs function is used to obtain the absolute value of the parameter.
3. Advantageous effects
Compared with the prior art, the technical scheme provided by the invention has the following remarkable effects:
(1) Due to the traditional differential privacy histogram data issuing method, only the similar bucket counting of position neighbors is considered when the histogram is grouped, and the similar buckets counting in the global range cannot be measured, so that a large reconstruction error is generated when the histogram is grouped. According to the grouping precise histogram data issuing method based on the difference privacy, the approximate sorting algorithm based on the index mechanism is adopted, the frequency of the original histogram buckets is subjected to global approximate sorting by the index mechanism according to the relation of the difference values among the bucket counts, and the accuracy in grouping is improved.
(2) The traditional differential privacy histogram data publishing method obtains the optimal error balance global grouping of the original histogram through the fixed-length grouping or the greedy clustering grouping, is easy to fall into local optimization, cannot well balance the approximate error and the Laplace error, and causes the usability of the published histogram to be reduced. When the histogram is grouped, the optimized dynamic programming technology is adopted for self-adaptive grouping, and the grouping number does not need to be determined. Meanwhile, the global error Err is used as an error evaluation function, self-adaptive grouping is carried out according to a dynamic programming recurrence formula, and a grouping scheme H with the minimum global error is obtained from all possible groupings G And the buckets with similar counting values are combined into one group, so that the accuracy of the final distribution histogram is improved, wherein the global error is composed of an approximation error and a Laplace error, and the global group with the optimal error balance is realized on the sequencing histogram.
Drawings
FIG. 1 is a theoretical architecture diagram of the method of the present invention;
FIG. 2 is a block flow diagram of the method of the present invention.
Detailed Description
For a further understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings and examples.
The existing histogram publishing method based on the difference privacy directly adds Laplace noise on each barrel of the original histogram to achieve the purpose of protecting the privacy. However, although the method of directly adding noise can effectively protect private data, it is easy to reduce the usability of the histogram due to excessive noise addition, and may result in a higher cumulative error in a long-range counting query.
Generally, in order to improve the accuracy of histogram distribution, reduce noise errors and improve data availability, there are two common strategies, and the histogram distribution method under the strategy 1 directly adds laplacian noise to the count of each bucket, so as to achieve the effect of disturbing the real count. Such methods are costly in terms of privacy due to the excessive noise added. The histogram issuing method under the strategy 2 is just opposite to the strategy 1 in sequence, the original histogram is firstly reconstructed to reduce the global sensitivity, then noise is added to the counting result after reconstruction, and although reconstruction processing can generate reconstruction errors, the added noise amount is reduced. The precision of the strategy response query is generally higher, and the problems exist in how to balance the reconstruction error and the noise error, ensure the privacy of histogram publication and improve the usability.
The invention adopts a strategy 2 method, for input histogram data, firstly sorting the histogram data and then reconstructing grouping, wherein the sorting operation enables buckets with similar frequency numbers to be arranged together, reduces errors during grouping reconstruction, then adopts a dynamic programming grouping based on the global minimum error to obtain a grouping scheme with the minimum global error from all possible groupings, realizes the global grouping with the best error balance on a sorting histogram, balances approximate errors and Laplace errors, finally adds Laplace noise to the grouped histogram and distributes the histogram in an original sequence, obviously reduces the added noise value, and simultaneously effectively improves the usability of the data distributed by the histogram.
Meanwhile, when the histogram is grouped, the optimized dynamic programming technology is adopted for self-adaptive grouping, and the grouping number does not need to be determined. Meanwhile, the global error Err is used as an error evaluation function, self-adaptive grouping is carried out according to a dynamic programming recurrence formula, and a grouping scheme H with the minimum global error is obtained from all possible groupings G The buckets with similar counting values are combined into one group, so that the accuracy of the final distribution histogram is improved, wherein the global error is composed of an approximation error and a Laplace error, and the global score with the optimal error balance is realized on the sequencing histogramAnd (4) grouping.
The invention improves the traditional histogram data publishing method based on differential privacy, achieves higher usability while protecting the privacy data,
for a further understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings.
Example 1
With reference to fig. 1, a method for issuing group-based precise histogram data based on differential privacy according to this embodiment includes the steps of:
step one, obtaining a numerical histogram statistical data field, and inputting the histogram frequency into a histogram data set H = (H) 1 ,H 2 ,…H n ) In (1), the privacy protecting budgets ε and Δ f are given simultaneously, Δ f is the L of the data set H and its neighboring data sets 1 Distance:
firstly, reading a numerical histogram statistical data field to be issued from a data source such as a database or csv, inputting statistical data (namely histogram frequency) of each interval into a histogram data set H, completing the input of an original histogram data set H, and setting privacy protection budgets epsilon and delta f. Wherein epsilon is manually specified and is generally smaller than 1, and the smaller epsilon, the higher the data privacy protection degree is, and the lower the data availability is; Δ f is L of the data set H and its neighboring data sets 1 Greater distance, Δ f, indicates more noise needs to be added, L for each bucket of the histogram 1 The distance is 1.
Step two, using the first histogram bucket H 1 Is a base barrel H i Adding it to the ordered histogram sequence H * And the bucket is deleted from H.
Step three, calculating a base barrel H i Neighbor bucket set L (H) i ) And an exponential scoring function u (H) i ) According to a scoring function u (H) i ) In proportion to
Figure BDA0002878853260000051
From L (H) to i ) Select out H j Where privacy budget ε i = ε/2, reaction of H with j Addition to orderHistogram sequence H * Then, H is introduced j As a base barrel:
in order to obtain a better grouping result during grouping, the bucket frequency is sorted by an exponential mechanism to obtain a more precise sequence. According to a scoring function u (H) i ) From the base bucket H i Neighbor bucket set L (H) i ) Selecting out the buckets with frequency similar to that of the base bucket, wherein L (H) i ) And u (H) i ) Calculated according to the formula (1) and the formula (2) respectively,
L(H i )={H j :|H j -H i |≤δ} (1)
u(H i )=-(|H j -H i |+|j-i|) (2)
wherein δ is a threshold value controlling the number of buckets in the set of neighbor buckets, which can be adjusted according to the overall bucket count value. In this example, δ is taken to be 50. If bucket H j Frequency and base barrel H i If the difference of frequency is within the range of threshold value delta, the bucket H j Neighbor bucket set L (H) at base bucket i ) In (1). Exponential scoring function u (H) i ) From H j And H i The absolute value of the frequency difference value and the opposite number of the sum of the absolute values of the sequence difference value are jointly formed, and the exponential mechanism is defined as follows:
let the random algorithm M input the data set as H and output as an entity object H j e.R, u (H) is an exponential mechanism scoring function, Δ u is the sensitivity of the function u (H), if proportional to
Figure BDA0002878853260000061
Is selected from the input and output H j Then the algorithm M provides epsilon-differential privacy protection.
As can be seen from the definition of equation (2) and the indexing mechanism, the indexing mechanism scores each output by using a scoring function u and assigns a higher probability of being indexed to the output with a higher score, i.e., the higher the result of the scoring function, the higher the probability of being selected. So scoring function u (H) i )=-(|H j -H i L (H) can be continuously collected from the neighbor buckets by an exponential mechanism i ) In the selection and the last base bucket H i Buckets of similar frequency forming an ordered histogram H *
And step four, repeating the step three until the original histogram data set H is empty.
Step five, the ordered histogram sequence H is paired * Performing dynamic programming grouping according to the global error Err, and selecting a histogram grouping structure H with the minimum global error G
Setting the error evaluation function of the dynamic programming as a global error Err (, H) l ,H r ) As shown in the formula (3),
Figure BDA0002878853260000062
wherein
Figure BDA0002878853260000063
Frequency mean, G, representing the group i And | represents the number of buckets in the group.
Figure BDA0002878853260000064
In order to reconstruct the error AE,
Figure BDA0002878853260000065
is a noise error; wherein the privacy budget ε 2 = ε/2, which is determined to be the group mean
Figure BDA0002878853260000066
Magnitude of added Laplace noise, to
Figure BDA0002878853260000067
The added Laplace noise has a size of Lap (1/epsilon) 2 )/|G i |。
Then, the sorted histograms H are dynamically planned and grouped according to the global error Err,
1) When the number of packets k =1, H is calculated * Err (, H) divided into 1 groups by the item of the middle first i (1. Ltoreq. I.ltoreq.n) 1 ,H i ) It is denoted as
Figure BDA0002878853260000068
The calculation method is shown in formula (4):
Figure BDA0002878853260000069
in the above formula
Figure BDA00028788532600000610
Represents D * The mean of the counts from the 1 st bucket to the ith bucket;
2) When k is>1, H can be calculated according to the thought of dynamic programming * The smallest global error, denoted by the minimum k-sets of the top i terms
Figure BDA00028788532600000611
The state escape formula is shown as (5)
Figure BDA0002878853260000071
3) To H * In order to reduce the operation amount and improve the efficiency, the reconstruction is mainly realized by adopting a grouping strategy of a dynamic programming idea, and the minimum global error of each group is recorded by grouping the n buckets in total from 1 group, 2 groups and …, k groups
Figure BDA0002878853260000072
Select so that T Err The minimum grouping, and record the optimal dividing structure and the optimal grouping number k under the grouping number, as shown in formula (6):
Figure BDA0002878853260000073
where n is the number of original histogram buckets, and k is the number of all possible grouped clusters 1. Ltoreq. K. Ltoreq.n.
Step six, grouping the histogram H G Taking group mean value of frequency of each group of buckets in the group G i Middle histogram barrel frequencyThe number is as follows:
Figure BDA0002878853260000074
Figure BDA0002878853260000075
post-pair Laplace noise Lap (b) per barrel frequency, where b =1/ε 2 Obtaining a histogram sequence after adding noise
Figure BDA0002878853260000076
And issue, therein
Figure BDA0002878853260000077
The noise adding process of the original data set comprises the steps of constructing a probability density function obeying Laplace distribution, solving an inverse cumulative distribution function according to the probability density function, and then inputting uniformly distributed random variables into the function to obtain Laplace noise; the method comprises the following specific steps:
s1, setting a constructed obedience position parameter mu to be 0, setting Laplace distribution with a scale parameter b to be Lap (b), and enabling a probability density function p (x) to be as shown in a formula (7),
Figure BDA0002878853260000078
s2, random variables alpha-U (0,1) meeting the uniform distribution are substituted into an inverse function of the Laplace cumulative distribution function, and then the noise value meeting the condition can be obtained as shown in a formula (8):
Figure BDA0002878853260000079
s3, uniformly distributing alpha to U (-0.5,0.5), and combining the piecewise function of the formula (8) into a formula (9) as follows:
F -1 (x)=0-b*sign(α)*ln(1-2abs(α)) (9)
wherein, sign function is used to obtain positive and negative of parameter, abs functionFor obtaining the absolute value of the parameter. The noise error of Laplace can be obtained by generating pseudo-random numbers which are consistent with alpha-U (-0.5,0.5) through a computer and substituting the pseudo-random numbers into alpha in the formula (9), and the Laplace noise is added into the frequency of the barrel to obtain data after noise addition
Figure BDA00028788532600000710
With reference to fig. 2, after the exponential mechanism sorting is performed, the histogram is reconstructed and grouped, the mean value of the sorted and grouped data is added with noise, and after appropriate laplace noise is added, the final histogram can be issued.
The present invention and its embodiments have been described above schematically, without limitation, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.

Claims (2)

1. A grouped accurate histogram data publishing method based on differential privacy is characterized by comprising the following steps:
step one, obtaining a numerical histogram statistical data field, and inputting the histogram frequency into a histogram data set H = (H) 1 ,H 2 ,…H n ) In the method, privacy protection budgets epsilon and delta f are given at the same time, and delta f is L of a data set H and a data set adjacent to the data set H 1 A distance;
step two, using the first histogram bucket H 1 Is a base barrel H i Adding it to the ordered histogram sequence H * And delete the bucket from H;
step three, calculating a base barrel H i Neighbor bucket set L (H) i ) And an exponential scoring function u (H) i ) According to a scoring function u (H) i ) In proportion to
Figure QLYQS_1
From L (H) to i ) Select out H j Where privacy budget ε 1 = ε/2, reaction of H with j Adding to an ordered histogram sequence H * Then, H is introduced j As a base barrel;
step four, repeating the step three until the original histogram data set H is empty;
step five, the ordered histogram sequence H is paired * Performing dynamic programming grouping according to the global error Err, and selecting a histogram grouping structure H with the minimum global error G
Step six, describing the bucket frequency of the grouping by the grouping average number, and adding Laplace noise Lap (1/epsilon) to each bucket frequency 2 ) Obtaining a histogram sequence after adding noise
Figure QLYQS_2
And releasing;
in the fifth step, the error evaluation function of the dynamic programming is set as a global error Err (, H) l ,H r ) As shown in the formula (3),
Figure QLYQS_3
wherein
Figure QLYQS_4
Frequency mean, G, representing the group i L represents the number of buckets in the group,
Figure QLYQS_5
in order to reconstruct the error AE,
Figure QLYQS_6
is a noise error; wherein the privacy budget ε 2 = ε/2, which is determined to be the group mean
Figure QLYQS_7
The magnitude of the added Laplace noise
Figure QLYQS_8
The added Laplace noise has a size of Lap (1/epsilon) 2 )/|G i |;
In the fifth step, grouping the histogram H by adopting the dynamic programming idea, and recording the minimum global error of each grouping structure
Figure QLYQS_9
Select T therein Err Lowest packet structure H G And recording the optimal grouping number k, as shown in formula (6):
Figure QLYQS_10
wherein n is the number of histogram buckets, k is the number of all possible groupings, and k is greater than or equal to 1 and less than or equal to n;
in the third step, according to the scoring function u (H) i ) Slave base barrel H i Neighbor bucket set L (H) i ) Selecting out the buckets with frequency similar to that of the base bucket, wherein L (H) i ) And u (H) i ) Calculated according to the formula (1) and the formula (2) respectively,
L(H i )=(H j :|H j -H i |≤δ} (1)
u(H i )=-(|H j -H i |+|j-i|) (2)
wherein, δ is a threshold value for controlling the number of buckets in the neighbor bucket set;
in the sixth step, the grouped histogram H is subjected to G Taking group mean value of frequency of each group of buckets in group G i The middle histogram bucket frequency is:
Figure QLYQS_11
post-pair Laplace noise Lap (b) per barrel frequency, where b =1/ε 2 Obtaining the histogram sequence after adding noise
Figure QLYQS_12
Wherein
Figure QLYQS_13
In the sixth step, the noise adding process of the original data set is to construct a probability density function obeying Laplace distribution, an inverse cumulative distribution function of the probability density function is obtained according to the probability density function, and then uniformly distributed random variables are input into the function, so that Laplace noise can be obtained;
the specific steps for obtaining the laplace noise are,
s1, setting a constructed obedience position parameter mu to be 0, setting Laplace distribution with a scale parameter b to be Lap (b), and enabling a probability density function p (x) to be shown as a formula (7),
Figure QLYQS_14
s2, random variables alpha-U (0,1) meeting the uniform distribution are brought into the inverse function of the Laplace cumulative distribution function, the noise value meeting the condition can be obtained as shown in a formula (8),
Figure QLYQS_15
s3, taking uniformly distributed alpha to U (-0.5,0.5), merging the piecewise function of the formula (8) into a formula (9) as shown in the following,
F -1 (x)=0-b*sign(α)*ln(1-2abs(α)) (9)
the sign function is used for acquiring the positive and negative of the parameter, and the abs function is used for acquiring the absolute value of the parameter.
2. The grouped precise histogram data distribution method based on differential privacy as claimed in claim 1, wherein: in the step one, each H i The privacy protection budget epsilon is less than 1 for the frequency of the unit interval.
CN202011637291.7A 2020-12-31 2020-12-31 Grouped accurate histogram data publishing method based on differential privacy Active CN112667712B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011637291.7A CN112667712B (en) 2020-12-31 2020-12-31 Grouped accurate histogram data publishing method based on differential privacy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011637291.7A CN112667712B (en) 2020-12-31 2020-12-31 Grouped accurate histogram data publishing method based on differential privacy

Publications (2)

Publication Number Publication Date
CN112667712A CN112667712A (en) 2021-04-16
CN112667712B true CN112667712B (en) 2023-03-17

Family

ID=75413692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011637291.7A Active CN112667712B (en) 2020-12-31 2020-12-31 Grouped accurate histogram data publishing method based on differential privacy

Country Status (1)

Country Link
CN (1) CN112667712B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672979B (en) * 2021-08-19 2024-02-09 安徽工业大学 Differential privacy non-equidistant histogram release method and device based on barrel structure division

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446568A (en) * 2018-03-19 2018-08-24 西北大学 A kind of histogram data dissemination method going trend analysis difference secret protection

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7965643B1 (en) * 2008-07-10 2011-06-21 At&T Intellectual Property Ii, L.P. Method and apparatus for using histograms to produce data summaries
US11727124B2 (en) * 2017-12-12 2023-08-15 Google Llc Oblivious access with differential privacy
CN109492047A (en) * 2018-11-22 2019-03-19 河南财经政法大学 A kind of dissemination method of the accurate histogram based on difference privacy
CN111737744B (en) * 2020-06-22 2022-09-30 安徽工业大学 Data publishing method based on differential privacy

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446568A (en) * 2018-03-19 2018-08-24 西北大学 A kind of histogram data dissemination method going trend analysis difference secret protection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙岚 ; 吴英杰 ; 张玺霖 ; 谢怡 ; .基于桶划分的差分隐私直方图发布贪心算法.2013,第52卷(第06期),第770-775页. *

Also Published As

Publication number Publication date
CN112667712A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN111737744B (en) Data publishing method based on differential privacy
Huang et al. Behavior pattern clustering in blockchain networks
CN109726587B (en) Spatial data partitioning method based on differential privacy
O'Brien et al. Locally estimating core numbers
Bukhari et al. Patched network and its vertex-edge metric-based dimension
CN112667712B (en) Grouped accurate histogram data publishing method based on differential privacy
Krushinsky et al. An exact model for cell formation in group technology
Beg et al. Novel crossover and mutation operation in genetic algorithm for clustering
CN115438227A (en) Network data publishing method based on difference privacy and compactness centrality
CN117407921A (en) Differential privacy histogram release method and system based on must-connect and don-connect constraints
CN113704787B (en) Privacy protection clustering method based on differential privacy
CN111539023B (en) Moving track data privacy protection matching method based on multiple iterative filtering
CN110969253A (en) Big data processing method based on granularity calculation in cloud environment
Weiner et al. Automatic decomposition of mixed integer programs for lagrangian relaxation using a multiobjective approach
CN114553711A (en) Network node clustering method, system and computer readable storage medium
Li et al. A sort-based interest matching algorithm with two exclusive judging conditions for region overlap
Ab Mutalib et al. Forecasting unemployment based on fuzzy time series with different degree of confidence
JP2020095437A (en) Clustering device, clustering method, and clustering program
JP2020161044A (en) System, method, and program for managing data
CN113760955B (en) Joint multi-factor grouping method considering box fractal dimension and occurrence
US20220260963A1 (en) Selection Controller Artificial Neural Network - SCANN
Zhou et al. Adaptive Grid Decomposition Algorithm based on Standard Deviation Circle Radius
GB2606795A (en) Selection controller artificial neutral network-scann
Mu et al. A memetic algorithm using local structural information for detecting community structure in complex networks
Xiao et al. Supporting the comparison of choropleth maps using an evolutionary algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant