CN115495778A - Differential privacy histogram publishing method and device based on grouping combination - Google Patents
Differential privacy histogram publishing method and device based on grouping combination Download PDFInfo
- Publication number
- CN115495778A CN115495778A CN202211109967.4A CN202211109967A CN115495778A CN 115495778 A CN115495778 A CN 115495778A CN 202211109967 A CN202211109967 A CN 202211109967A CN 115495778 A CN115495778 A CN 115495778A
- Authority
- CN
- China
- Prior art keywords
- merging
- histogram
- packet
- packets
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 230000006870 function Effects 0.000 claims description 30
- 238000000638 solvent extraction Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 15
- 230000007246 mechanism Effects 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 8
- 230000035945 sensitivity Effects 0.000 claims description 8
- 238000012935 Averaging Methods 0.000 claims description 7
- 238000004891 communication Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 206010035148 Plague Diseases 0.000 description 1
- 241000607479 Yersinia pestis Species 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a differential privacy histogram publishing method and device based on grouping combination. The method comprises the following steps: step S1 to step S5. According to the invention, the data is reasonably and accurately divided by adopting a grouping and merging mode for the histogram, so that the accuracy of grouping and dividing is effectively improved, the error of data distribution is reduced, the usability of the data is improved, the optimal grouping of the histogram can be realized, the influence of noise on the accuracy of the data is greatly reduced, and the usability and the distribution efficiency of the data are effectively improved while the differential privacy constraint is satisfied.
Description
Technical Field
The embodiment of the invention relates to the technical field of histogram data privacy protection, in particular to a differential privacy histogram issuing method and device based on grouping combination.
Background
Histogram publishing is a widely used technology in the field of data publishing, the histogram can visually and clearly reflect the statistical characteristics of data, and a user can quickly acquire information required by the user according to the histogram. However, an attacker can easily launch an attack on the published histogram, stealing user privacy. Therefore, in order to prevent the leakage of data privacy, a data publisher usually performs a privacy-removing process on a histogram by using a differential privacy technology, but the differential privacy also causes a problem in protecting the data privacy: the data availability decreases; this problem plagues researchers in the field of histogram data distribution. Therefore, it is an urgent technical problem in the art to develop a method and an apparatus for issuing a differential privacy histogram based on grouping and merging, which can effectively overcome the above-mentioned drawbacks in the related art.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a differential privacy histogram issuing method and device based on grouping and merging.
In a first aspect, an embodiment of the present invention provides a differential privacy histogram publishing method based on packet merging, including: s1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And ε 2 (ii) a S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Every bucket in the tree is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then merging the groups in the group G pairwise, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an exponential mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H = 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is a total of 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing the initial packet set before merging; g 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Represents the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; s3: averaging each group of G to obtainWhereinRepresenting a mean grouping set after mean value calculation;represents the Kth mean grouping; s4: to pairAdding Laplace noise to obtainWherein:representing a noise grouping set obtained by adding noise to the mean grouping set;a packet representing the kth added noise; s5: for is toRecovering the original histogram order to obtain the issued differential privacy histogramWherein:is represented byRecovering the difference privacy histogram obtained by the original histogram sequence;a bucket representing the added noise in the histogram; n denotes the total number of differential privacy histogram buckets.
Based on the content of the above method embodiments, in the differential privacy histogram distribution method based on packet merging provided in the embodiments of the present invention, the privacy budget epsilon in step S1 is a given positive value, and epsilon 1 For packet merging, ∈ 2 For packet-added noise.
On the basis of the content of the foregoing method embodiment, the implementation of step S2 in the differential privacy histogram issuing method based on packet merging provided in the embodiment of the present invention specifically includes: s2.1 sets the final number of packets K, where K =1,2. n is the number of groups of the histogram; s2.2 original histogram H = { H = 1 ,h 2 ,...,h n Every bucket in the tree is treated as a single packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a S2.3 pairwise merging the groups in the histogram, and traversing all possible merging schemes P (P) 1 ,p 2 ,...,p y ) (ii) a Wherein: p (P) 1 ,p 2 ,...,p y ) Representing a set of merging schemes; p is a radical of 1 ,p 2 ,...,p y Indicating that there are all possible merging scenarios; y represents the total number of recipes; s2.4 calculate the solution distance u (p, p) for each merging solution s ) And set it as utility function:
wherein: p is a radical of formula s (g i ,g j ) For a merging scheme of a set of schemes P, P s In which two packets g are included i And g j H is g i A certain bucket in (1), h' represents g j The amount of the water in the certain bucket,representing the minimum distance between the packets, which is used to measure the similarity between the packets, the utility function setting should satisfy the requirement: the smaller the distance between the packets, the greater the probability of being merged; for the subsequent probability calculation to meet the requirement, the opposite number of the grouping distance is adoptedTo construct a utility function; s2.5 calculating the sampling probability Pr (p, p) of each merging scheme by using an exponential mechanism and combining scheme distances s ):
Wherein, the probability Pr (p, p) of the merging scheme s ) Represents a merging scheme P s Probability of being selected, epsilon 1 Is a privacy budget; Δ u is global sensitivity; u (p, p) s ) Is P s A utility function of (a); as can be seen from the definition of global sensitivity, the effect of deleting any record in the data set on the utility function is at most 1, so Δ u =1, y represents the total number of solutions;for merging scheme P s A fitness function of; the numerator calculates the merging scheme P s The denominator calculates the sum of the fitness values of all the merging schemes; s2.6 according to the sampling probability of each merging scheme; selecting a merging scheme by using a wheel disc; s2.7, merging the selected merging schemes, and regarding the merging schemes as a new group to replace the original two groups; s2.8, repeating S2.3-S2.7 until K groups are combined, and ending the cycle; s2.9 Return to the Final Merge scheme G (G) 1 ,G 2 ,...,G k )。
Based on the content of the foregoing method embodiment, in the differential privacy histogram distribution method based on grouping combination provided in the embodiment of the present invention, in step S2, if the number of buckets of the histogram is n, the number of groups is also n.
Based on the content of the above method embodiment, in the differential privacy histogram distribution method based on packet merging provided in the embodiment of the present invention, in step S4, the content of the packet merging is compared with that of the differential privacy histogram distribution methodThe magnitude of the added Laplace noise is epsilon 2 。
Based on the content of the foregoing method embodiment, the differential privacy histogram issuing method based on packet merging provided in the embodiment of the present invention is the differential privacy histogram issuing method in step S5Is one-dimensional.
In a second aspect, an embodiment of the present invention provides a differential privacy histogram distribution apparatus based on packet merging, including: a first master module, configured to implement S1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And ε 2 (ii) a A second master module, configured to implement S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Every bucket in the tree is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then merging the groups in the group G pairwise, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an exponential mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1, G 2 ,...,G k ) (ii) a Wherein: h = { H 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing the initial packet set before merging; g 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Represents the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; a third master module, configured to implement S3: averaging each group of G to obtainWhereinRepresenting a mean grouping set after mean value calculation;representing the Kth mean grouping; a fourth master module, configured to implement S4: to pairAdding Laplace noise to obtainWherein:representing a noise grouping set obtained by adding noise to the mean grouping set;a packet representing the kth added noise; a fifth master module, configured to implement S5: to pairRestoring the original histogram sequence to obtain the published differential privacy histogramWherein:is represented byRecovering the difference privacy histogram obtained by the original histogram sequence;a bucket representing the added noise in the histogram; n represents the total number of differential privacy histogram buckets.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, and the processor invokes the program instructions to perform the method for issuing a differential privacy histogram based on grouping combination provided in any of the various implementations of the first aspect.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a differential privacy histogram publication method based on packet merging, as provided in any of the various implementations of the first aspect.
According to the differential privacy histogram publishing method and device based on grouping and merging, the data are reasonably and accurately partitioned by adopting a grouping and merging mode for the histogram, the accuracy of grouping and partitioning is effectively improved, the error of publishing the data is reduced, the usability of the data is improved, the optimal partitioning of the histogram grouping can be realized, the influence of noise on the accuracy of the data is greatly reduced, and the usability and the publishing efficiency of the data are effectively improved while the differential privacy constraint is met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below to the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is also possible for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a differential privacy histogram publication method based on packet merging according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a differential privacy histogram distribution apparatus based on packet merging according to an embodiment of the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In addition, the technical features of the various embodiments or individual embodiments provided in the present invention may be arbitrarily combined with each other to form a feasible technical solution, and the combination is not limited by the sequence of steps and/or the structural composition mode, but must be based on the realization of the capability of a person skilled in the art, and when the technical solution combination is contradictory or cannot be realized, the technical solution combination should be considered to be absent and not to be within the protection scope of the present invention.
The invention divides the privacy budget; and setting the final grouping number K, regarding each bucket in the histogram as a single grouping, and traversing all possible merging schemes in a grouping merging mode. Approximately selecting the merging schemes by calculating a scheme distance combination index mechanism of each merging scheme, and merging the selected merging schemes to replace the original two groups; then repeatedly divide intoThe merging process is combined until the final grouping scheme is formed. And finally, grouping the obtained histograms to obtain an average value, adding Laplace noise and recovering the original sequence of the Laplace noise to obtain the issued differential privacy histogram. Based on this idea, an embodiment of the present invention provides a differential privacy histogram distribution method based on packet merging, where, referring to fig. 1, the method includes: s1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And ε 2 (ii) a S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Every bucket in the tree is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then merging the groups in the group G pairwise, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an exponential mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H = 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is a total of 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1, g 2 ,...,g n ) Representing the initial packet set before merging; g is a radical of formula 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Represents the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; s3: averaging each group of G to obtainWhereinIndicates the mean value ofA set of value groupings;representing the Kth mean grouping; s4: to pairAdding Laplace noise to obtainWherein:representing a noise packet set obtained by adding noise to the mean packet set;a packet representing the kth added noise; s5: to pairRecovering the original histogram order to obtain the issued differential privacy histogramWherein:is represented byRecovering the difference privacy histogram obtained by the original histogram sequence;a bucket representing the added noise in the histogram; n represents the total number of differential privacy histogram buckets.
Based on the content of the foregoing method embodiment, as an optional embodiment, in the differential privacy histogram distribution method based on packet merging provided in the embodiment of the present invention, the privacy budget epsilon in step S1 is a given positive value, and epsilon 1 For packet merging,. Epsilon 2 For packet-added noise.
Based on the content of the foregoing method embodiment, as an optional embodiment, the implementation of step S2 in the differential privacy histogram issuing method based on packet merging provided in the embodiment of the present invention specifically includes: s2.1 sets the final number of packets K, where K =1,2. n is the grouping number of the histogram; s2.2 original histogram H = { H = 1 ,h 2 ,...,h n Each bucket in the set of buckets is treated as a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a S2.3, pairwise merging is carried out on the groups in the histogram, and all possible merging schemes P (P) are traversed 1 ,p 2 ,...,p y ) (ii) a Wherein: p (P) 1 ,p 2 ,...,p y ) Representing a set of merging schemes; p is a radical of 1 ,p 2 ,...,p y Indicating that there are all possible merging scenarios; y represents the total number of recipes; s2.4 calculate the solution distance u (p, p) for each merging solution s ) And set it as utility function:
wherein: p is a radical of s (g i ,g j ) For a merging scheme of a set of schemes P, P s In which two packets g are included i And g j H is g i A certain bucket in (1), h' represents g j Is provided to the one of the buckets in the drum,representing the minimum distance between the packets, which is used to measure the similarity between the packets, the utility function setting should satisfy the requirement: the smaller the distance between the packets, the greater the probability of being merged; for the subsequent probability calculation to meet the requirement, the inverse number of the grouping distance is adoptedTo construct a utility function; s2.5 calculating the sampling probability Pr (p, p) of each merging scheme by using an exponential mechanism and combining the scheme distances s ):
Wherein, the probability Pr (p, p) of the merging scheme s ) Represents a merging scheme P s Probability of being selected, epsilon 1 Is a privacy budget; Δ u is the global sensitivity; u (p, p) s ) Is P s A utility function of (a); as can be seen from the definition of global sensitivity, deleting any record in the data set has a maximum influence on the utility function of 1, so Δ u =1, y represents the total number of solutions;for merging scheme P s A fitness function of; the numerator calculates the merging scheme P s The denominator calculates the sum of the fitness values of all the merging schemes; s2.6 according to the sampling probability of each merging scheme; selecting a merging scheme by using a wheel disc; s2.7, merging the selected merging schemes, and regarding the merging schemes as a new group to replace the original two groups; s2.8, repeating S2.3-S2.7 until the groups are combined into K groups, and ending the cycle; s2.9 Return to the Final Merge scheme G (G) 1 ,G 2 ,...,G k )。
Based on the content of the foregoing method embodiment, as an optional embodiment, in the differential privacy histogram distribution method based on grouping combination provided in the embodiment of the present invention, in step S2, if the number of buckets of the histogram is n, the number of the groups is also n.
Based on the content of the foregoing method embodiment, as an optional embodiment, in the differential privacy histogram distribution method based on packet merging provided in the embodiment of the present invention, the pair in step S4 is used forThe magnitude of the added Laplace noise is epsilon 2 。
Based on the content of the above method embodiment, as an optional embodiment, the grouping-based merging provided in the embodiment of the present inventionThe method for distributing a differential privacy histogram of (1), the differential privacy histogram in step S5Is one-dimensional. In particular, privacy is demonstrated as follows: setting packet merging procedure to M 1 The averaging and noise adding process is M 2 . First, the merging procedure is proportional toThe probability of the combination scheme is selected, the whole process cannot cause privacy disclosure, and therefore the combination process meets the requirement of epsilon 1 -differential privacy. Since the noise added to the mean packet is ε 2 Thus process M 2 Satisfies epsilon 2 -differential privacy. And because ε = ε 1 +ε 2 As can be seen from the combined nature of differential privacy, the method satisfies epsilon-differential privacy as a whole.
According to the differential privacy histogram publishing method based on grouping and merging provided by the embodiment of the invention, the data is reasonably and accurately partitioned by adopting a grouping and merging mode for the histogram, the accuracy of grouping and partitioning is effectively improved, the error of publishing the data is further reduced, the usability of the data is improved, the optimal partitioning of the histogram grouping can be realized, the influence of noise on the accuracy of the data is greatly reduced, and the usability and the publishing efficiency of the data are effectively improved while the differential privacy constraint is met.
The implementation basis of the various embodiments of the present invention is realized by programmed processing performed by a device having a processor function. Therefore, in engineering practice, the technical solutions and functions thereof of the embodiments of the present invention can be packaged into various modules. Based on this reality, on the basis of the foregoing embodiments, embodiments of the present invention provide a differential privacy histogram distribution apparatus based on packet merging, which is used for executing the differential privacy histogram distribution method based on packet merging in the foregoing method embodiments. Referring to fig. 2, the apparatus includes: a first master module, configured to implement S1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And epsilon 2 (ii) a A second master module, configured to implement S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Each bucket in the set of buckets is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then, pairwise merging the groups in the group G, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an index mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is a total of 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing the initial packet set before merging; g 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Representing the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; a third master module, configured to implement S3: averaging each group of G to obtainWhereinRepresenting a mean grouping set after mean value calculation;represents the Kth mean grouping; a fourth master module, configured to implement S4: to pairAdding Laplace noise to obtainWherein:representing a noise packet set obtained by adding noise to the mean packet set;a packet representing the kth added noise; a fifth master module, configured to implement S5: to pairRestoring the original histogram sequence to obtain the published differential privacy histogramWherein:is represented byRecovering the difference privacy histogram obtained by the original histogram sequence;a bucket representing the added noise in the histogram; n represents the total number of differential privacy histogram buckets.
According to the differential privacy histogram publishing device based on grouping and merging, provided by the embodiment of the invention, the data are reasonably and accurately divided by adopting a grouping and merging mode for the histogram through adopting a plurality of modules in the graph shown in fig. 2, so that the accuracy of grouping and dividing is effectively improved, the error of publishing the data is further reduced, the usability of the data is improved, the optimal grouping and dividing of the histogram can be realized, the influence of noise on the accuracy of the data is greatly reduced, and the usability and the publishing efficiency of the data are effectively improved while the differential privacy constraint is met.
It should be noted that, the apparatus in the apparatus embodiment provided by the present invention may be used for implementing methods in other method embodiments provided by the present invention, except that corresponding function modules are provided, and the principle of the apparatus embodiment provided by the present invention is basically the same as that of the apparatus embodiment provided by the present invention, so long as a person skilled in the art obtains corresponding technical means by combining technical features on the basis of the apparatus embodiment described above, and obtains a technical solution formed by these technical means, on the premise of ensuring that the technical solution has practicability, the apparatus in the apparatus embodiment described above may be modified, so as to obtain a corresponding apparatus class embodiment, which is used for implementing methods in other method class embodiments. For example:
based on the content of the foregoing device embodiment, as an optional embodiment, the differential privacy histogram distribution device based on packet merging provided in the embodiment of the present invention further includes: a first submodule for implementing that the privacy budget ε in step S1 is a given positive value, and ε 1 is used for packet merging ε 2 For packet-added noise.
Based on the content of the foregoing device embodiment, as an optional embodiment, the differential privacy histogram distribution device based on packet merging provided in the embodiment of the present invention further includes: the second sub-module is configured to implement the step S2 specifically including: s2.1 sets the final number of packets K, where K =1,2. n is the grouping number of the histogram; s2.2 original histogram H = { H = 1 ,h 2 ,...,h n Each bucket in the set of buckets is treated as a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a S2.3, pairwise merging is carried out on the groups in the histogram, and all possible merging schemes P (P) are traversed 1 ,p 2 ,...,p y ) (ii) a Wherein: p (P) 1 ,p 2 ,...,p y ) Representing a set of merging schemes; p is a radical of formula 1 ,p 2 ,...,p y Indicating that there are all possible merging scenarios; y represents the total number of recipes; s2.4 calculating the scheme distance of each merging schemeFrom u (p, p) s ) And set it as utility function:
wherein: p is a radical of formula s (g i ,g j ) For a merging scheme of a set of schemes P, P s Contains two groups g i And g j H is g i A certain bucket in (1), h' represents g j The amount of the water in the certain bucket,representing the minimum distance between the packets, which is used to measure the similarity between the packets, the utility function setting should satisfy the requirement: the smaller the distance between the packets, the greater the probability of being merged; for the subsequent probability calculation to meet the requirement, the opposite number of the grouping distance is adoptedTo construct a utility function; s2.5 calculating the sampling probability Pr (p, p) of each merging scheme by using an exponential mechanism and combining scheme distances s ):
Wherein, the probability Pr (p, p) of the merging scheme s ) Represents a merging scheme P s Probability of being selected, epsilon 1 Is a privacy budget; Δ u is the global sensitivity; u (p, p) s ) Is P s A utility function of (a); as can be seen from the definition of global sensitivity, the effect of deleting any record in the data set on the utility function is at most 1, so Δ u =1, y represents the total number of solutions;for merging scheme P s A fitness function of; the numerator calculates the merging scheme P s The denominator calculates the sum of the fitness values of all the merging schemes;s2.6, according to the sampling probability of each merging scheme; selecting a merging scheme by using a wheel disc; s2.7, merging the selected merging schemes, and regarding the merged scheme as a new group to replace the original two groups; s2.8, repeating S2.3-S2.7 until K groups are combined, and ending the cycle; s2.9 Return to Final Merge scheme G (G) 1 ,G 2 ,...,G k )。
Based on the content of the foregoing device embodiment, as an optional embodiment, the differential privacy histogram distribution device based on packet merging provided in the embodiment of the present invention further includes: and a third sub-module, configured to implement step S2, if the number of buckets of the histogram is n, the number of groups is also n.
Based on the content of the foregoing device embodiment, as an optional embodiment, the differential privacy histogram distribution device based on packet merging provided in the embodiment of the present invention further includes: a fourth sub-module for implementing the pair in step S4The magnitude of the added Laplace noise is epsilon 2 。
Based on the content of the foregoing device embodiment, as an optional embodiment, the differential privacy histogram distribution device based on packet merging provided in the embodiment of the present invention further includes: a fifth sub-module for implementing the difference privacy histogram of step S5Is one-dimensional.
The method of the embodiment of the invention is realized by depending on the electronic equipment, so that the related electronic equipment is necessarily introduced. To this end, an embodiment of the present invention provides an electronic apparatus, as shown in fig. 3, including: the system comprises at least one processor (processor), a communication Interface (communication Interface), at least one memory (memory) and a communication bus, wherein the at least one processor, the communication Interface and the at least one memory are communicated with each other through the communication bus. The at least one processor may invoke logic instructions in the at least one memory to perform all or a portion of the steps of the methods provided by the various method embodiments described above.
In addition, the logic instructions in the at least one memory may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the method embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. Based on this recognition, each block in the flowchart or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In this patent, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (9)
1. A differential privacy histogram release method based on packet merging is characterized by comprising the following steps: s1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And epsilon 2 (ii) a S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Every bucket in the tree is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then, pairwise merging the groups in the group G, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an index mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H = 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is a total of 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing an initial packet set before merging; g 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Representing the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; s3: averaging each group of G to obtainWhereinRepresenting a mean grouping set after the mean value is solved;represents the Kth mean grouping; s4: to pairAdding Laplace noise to obtainWherein:representing a noise packet set obtained by adding noise to the mean packet set;a packet representing the kth added noise; s5: to pairRecovering the original histogram order to obtain the issued differential privacy histogramWherein:is represented byRecovering the difference privacy histogram obtained by the original histogram sequence;a bucket representing the added noise in the histogram; n denotes the total number of differential privacy histogram buckets.
2. The packet-based of claim 1Merged differential privacy histogram distribution method, characterized in that the privacy budget ε in step S1 is a given positive value, and ε 1 is used for packet merging 2 For packet-added noise.
3. The method for distributing the differential privacy histogram based on the grouping combination according to claim 2, wherein the step S2 is implemented specifically by: s2.1 setting a final number of groups K, where K =1, 2.., n, n is the number of groups of the histogram; s2.2 original histogram H = { H = 1 ,h 2 ,...,h n Every bucket in the tree is treated as a single packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a S2.3, pairwise merging is carried out on the groups in the histogram, and all possible merging schemes P (P) are traversed 1 ,p 2 ,...,p y ) (ii) a Wherein: p (P) 1 ,p 2 ,...,p y ) Representing a set of merging schemes; p is a radical of formula 1 ,p 2 ,...,p y Indicating that there are all possible merging scenarios; y represents the total number of recipes; s2.4 calculate the solution distance u (p, p) for each merging solution s ) And set it as utility function:
wherein: p is a radical of formula s (g i ,g j ) For a merging scheme of a set of schemes P, P s In which two packets g are included i And g j H is g i A certain bucket in (1), h' represents g j Is provided to the one of the buckets in the drum,representing the minimum distance between the packets, which is used to measure the similarity between the packets, the utility function setting should satisfy the requirement: the smaller the distance between the packets, the greater the probability of being merged; for the subsequent probability calculation to meet the requirement, the inverse number of the grouping distance is adoptedTo construct a utility function; s2.5 calculating the sampling probability Pr (p, p) of each merging scheme by using an exponential mechanism and combining scheme distances s ):
Wherein, the probability Pr (p, p) of the merging scheme s ) Represents a merging scheme P s Probability of being selected, epsilon 1 Is a privacy budget; Δ u is global sensitivity; u (p, p) s ) Is P s A utility function of (a); as can be seen from the definition of global sensitivity, deleting any record in the data set has a maximum influence on the utility function of 1, so Δ u =1, y represents the total number of solutions;for merging scheme P s A fitness function of (a); the numerator calculates the merging scheme P s The denominator calculates the sum of the fitness values of all the merging schemes; s2.6, according to the sampling probability of each merging scheme; selecting a merging scheme by using a wheel disc; s2.7, merging the selected merging schemes, and regarding the merging schemes as a new group to replace the original two groups; s2.8, repeating S2.3-S2.7 until the groups are combined into K groups, and ending the cycle; s2.9 Return to the Final Merge scheme G (G) 1 ,G 2 ,...,G k )。
4. The method according to claim 3, wherein in step S2, if the number of the histogram buckets is n, the number of the groups is n.
7. A differential privacy histogram distribution apparatus based on packet merging, comprising: a first master module, configured to implement S1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And ε 2 (ii) a A second master module, configured to implement S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Each bucket in the set of buckets is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then merging the groups in the group G pairwise, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an exponential mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing an initial packet set before merging; g is a radical of formula 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Representing the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in a final set of packets,G k Represents the kth packet; a third main module, configured to implement S3: averaging each group of G to obtainWhereinRepresenting a mean grouping set after mean value calculation;represents the Kth mean grouping; a fourth master module, configured to implement S4: to pairAdding Laplace noise to obtainWherein:representing a noise packet set obtained by adding noise to the mean packet set;a packet representing the kth added noise; a fifth master module, configured to implement S5: to pairRestoring the original histogram sequence to obtain the published differential privacy histogramWherein:is represented byRecovering the original histogram sequence to obtain a differential privacy histogram;a bucket representing the added noise in the histogram; n represents the total number of differential privacy histogram buckets.
8. An electronic device, comprising:
at least one processor, at least one memory, and a communication interface; wherein,
the processor, the memory and the communication interface are communicated with each other;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211109967.4A CN115495778A (en) | 2022-09-13 | 2022-09-13 | Differential privacy histogram publishing method and device based on grouping combination |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211109967.4A CN115495778A (en) | 2022-09-13 | 2022-09-13 | Differential privacy histogram publishing method and device based on grouping combination |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115495778A true CN115495778A (en) | 2022-12-20 |
Family
ID=84467535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211109967.4A Pending CN115495778A (en) | 2022-09-13 | 2022-09-13 | Differential privacy histogram publishing method and device based on grouping combination |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115495778A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117688614A (en) * | 2024-02-01 | 2024-03-12 | 杭州海康威视数字技术股份有限公司 | Differential privacy protection data availability enhancement method and device and electronic equipment |
-
2022
- 2022-09-13 CN CN202211109967.4A patent/CN115495778A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117688614A (en) * | 2024-02-01 | 2024-03-12 | 杭州海康威视数字技术股份有限公司 | Differential privacy protection data availability enhancement method and device and electronic equipment |
CN117688614B (en) * | 2024-02-01 | 2024-04-30 | 杭州海康威视数字技术股份有限公司 | Differential privacy protection data availability enhancement method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107181797B (en) | Block compression method and system of block chain | |
CN106682906B (en) | Risk identification and service processing method and equipment | |
EP3321819A1 (en) | Device, method and program for securely reducing an amount of records in a database | |
CN106327340B (en) | Abnormal node set detection method and device for financial network | |
CN110517029B (en) | Method, device, equipment and blockchain system for verifying blockchain cross-chain transaction | |
CN106909575B (en) | Text clustering method and device | |
CN110348238B (en) | Privacy protection grading method and device for application | |
CN115495778A (en) | Differential privacy histogram publishing method and device based on grouping combination | |
CN111723159A (en) | Data verification method and device based on block chain | |
CN112214402B (en) | Code verification algorithm selection method, device and storage medium | |
JP6310345B2 (en) | Privacy protection device, privacy protection method, and database creation method | |
CN104239753B (en) | Tamper detection method for text documents in cloud storage environment | |
US20240340293A1 (en) | Reconstructing a Dataset After Detection of a Network Security Threat in a Network | |
CN110276050B (en) | Method and device for comparing high-dimensional vector similarity | |
CN109977131A (en) | A kind of house type matching system | |
CN116306831A (en) | Model authentication method and device for generating countermeasure network | |
CN113111687A (en) | Data processing method and system and electronic equipment | |
CN112528068B (en) | Voiceprint feature storage method, voiceprint feature matching method, voiceprint feature storage device and electronic equipment | |
CN109800823B (en) | Clustering method and device for POS terminals | |
CN114443629A (en) | Cluster bloom filter data duplication removing method, terminal equipment and storage medium | |
CN115577390A (en) | Differential privacy histogram publishing method and device based on sampling grouping | |
CN111431869A (en) | Method and device for acquiring vulnerability information heat | |
CN116797781B (en) | Target detection method and device, electronic equipment and storage medium | |
CN113723915A (en) | Nuclear power plant business form encoding method, device, equipment and readable storage medium | |
CN113127712B (en) | Filing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |