CN115495778A - Differential privacy histogram publishing method and device based on grouping combination - Google Patents

Differential privacy histogram publishing method and device based on grouping combination Download PDF

Info

Publication number
CN115495778A
CN115495778A CN202211109967.4A CN202211109967A CN115495778A CN 115495778 A CN115495778 A CN 115495778A CN 202211109967 A CN202211109967 A CN 202211109967A CN 115495778 A CN115495778 A CN 115495778A
Authority
CN
China
Prior art keywords
merging
histogram
packet
packets
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211109967.4A
Other languages
Chinese (zh)
Inventor
孟博
张国兴
王德军
李子茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Kongtian Software Technology Co ltd
South Central Minzu University
Original Assignee
Wuhan Kongtian Software Technology Co ltd
South Central University for Nationalities
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Kongtian Software Technology Co ltd, South Central University for Nationalities filed Critical Wuhan Kongtian Software Technology Co ltd
Priority to CN202211109967.4A priority Critical patent/CN115495778A/en
Publication of CN115495778A publication Critical patent/CN115495778A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a differential privacy histogram publishing method and device based on grouping combination. The method comprises the following steps: step S1 to step S5. According to the invention, the data is reasonably and accurately divided by adopting a grouping and merging mode for the histogram, so that the accuracy of grouping and dividing is effectively improved, the error of data distribution is reduced, the usability of the data is improved, the optimal grouping of the histogram can be realized, the influence of noise on the accuracy of the data is greatly reduced, and the usability and the distribution efficiency of the data are effectively improved while the differential privacy constraint is satisfied.

Description

Differential privacy histogram publishing method and device based on grouping combination
Technical Field
The embodiment of the invention relates to the technical field of histogram data privacy protection, in particular to a differential privacy histogram issuing method and device based on grouping combination.
Background
Histogram publishing is a widely used technology in the field of data publishing, the histogram can visually and clearly reflect the statistical characteristics of data, and a user can quickly acquire information required by the user according to the histogram. However, an attacker can easily launch an attack on the published histogram, stealing user privacy. Therefore, in order to prevent the leakage of data privacy, a data publisher usually performs a privacy-removing process on a histogram by using a differential privacy technology, but the differential privacy also causes a problem in protecting the data privacy: the data availability decreases; this problem plagues researchers in the field of histogram data distribution. Therefore, it is an urgent technical problem in the art to develop a method and an apparatus for issuing a differential privacy histogram based on grouping and merging, which can effectively overcome the above-mentioned drawbacks in the related art.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a differential privacy histogram issuing method and device based on grouping and merging.
In a first aspect, an embodiment of the present invention provides a differential privacy histogram publishing method based on packet merging, including: s1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And ε 2 (ii) a S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Every bucket in the tree is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then merging the groups in the group G pairwise, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an exponential mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H = 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is a total of 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing the initial packet set before merging; g 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Represents the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; s3: averaging each group of G to obtain
Figure BDA0003842726240000011
Wherein
Figure BDA0003842726240000012
Representing a mean grouping set after mean value calculation;
Figure BDA0003842726240000013
represents the Kth mean grouping; s4: to pair
Figure BDA0003842726240000014
Adding Laplace noise to obtain
Figure BDA0003842726240000015
Wherein:
Figure BDA0003842726240000016
representing a noise grouping set obtained by adding noise to the mean grouping set;
Figure BDA0003842726240000017
a packet representing the kth added noise; s5: for is to
Figure BDA0003842726240000021
Recovering the original histogram order to obtain the issued differential privacy histogram
Figure BDA0003842726240000022
Wherein:
Figure BDA0003842726240000023
is represented by
Figure BDA0003842726240000024
Recovering the difference privacy histogram obtained by the original histogram sequence;
Figure BDA0003842726240000025
a bucket representing the added noise in the histogram; n denotes the total number of differential privacy histogram buckets.
Based on the content of the above method embodiments, in the differential privacy histogram distribution method based on packet merging provided in the embodiments of the present invention, the privacy budget epsilon in step S1 is a given positive value, and epsilon 1 For packet merging, ∈ 2 For packet-added noise.
On the basis of the content of the foregoing method embodiment, the implementation of step S2 in the differential privacy histogram issuing method based on packet merging provided in the embodiment of the present invention specifically includes: s2.1 sets the final number of packets K, where K =1,2. n is the number of groups of the histogram; s2.2 original histogram H = { H = 1 ,h 2 ,...,h n Every bucket in the tree is treated as a single packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a S2.3 pairwise merging the groups in the histogram, and traversing all possible merging schemes P (P) 1 ,p 2 ,...,p y ) (ii) a Wherein: p (P) 1 ,p 2 ,...,p y ) Representing a set of merging schemes; p is a radical of 1 ,p 2 ,...,p y Indicating that there are all possible merging scenarios; y represents the total number of recipes; s2.4 calculate the solution distance u (p, p) for each merging solution s ) And set it as utility function:
Figure BDA0003842726240000026
wherein: p is a radical of formula s (g i ,g j ) For a merging scheme of a set of schemes P, P s In which two packets g are included i And g j H is g i A certain bucket in (1), h' represents g j The amount of the water in the certain bucket,
Figure BDA0003842726240000027
representing the minimum distance between the packets, which is used to measure the similarity between the packets, the utility function setting should satisfy the requirement: the smaller the distance between the packets, the greater the probability of being merged; for the subsequent probability calculation to meet the requirement, the opposite number of the grouping distance is adopted
Figure BDA0003842726240000028
To construct a utility function; s2.5 calculating the sampling probability Pr (p, p) of each merging scheme by using an exponential mechanism and combining scheme distances s ):
Figure BDA0003842726240000029
Wherein, the probability Pr (p, p) of the merging scheme s ) Represents a merging scheme P s Probability of being selected, epsilon 1 Is a privacy budget; Δ u is global sensitivity; u (p, p) s ) Is P s A utility function of (a); as can be seen from the definition of global sensitivity, the effect of deleting any record in the data set on the utility function is at most 1, so Δ u =1, y represents the total number of solutions;
Figure BDA0003842726240000031
for merging scheme P s A fitness function of; the numerator calculates the merging scheme P s The denominator calculates the sum of the fitness values of all the merging schemes; s2.6 according to the sampling probability of each merging scheme; selecting a merging scheme by using a wheel disc; s2.7, merging the selected merging schemes, and regarding the merging schemes as a new group to replace the original two groups; s2.8, repeating S2.3-S2.7 until K groups are combined, and ending the cycle; s2.9 Return to the Final Merge scheme G (G) 1 ,G 2 ,...,G k )。
Based on the content of the foregoing method embodiment, in the differential privacy histogram distribution method based on grouping combination provided in the embodiment of the present invention, in step S2, if the number of buckets of the histogram is n, the number of groups is also n.
Based on the content of the above method embodiment, in the differential privacy histogram distribution method based on packet merging provided in the embodiment of the present invention, in step S4, the content of the packet merging is compared with that of the differential privacy histogram distribution method
Figure BDA00038427262400000310
The magnitude of the added Laplace noise is epsilon 2
Based on the content of the foregoing method embodiment, the differential privacy histogram issuing method based on packet merging provided in the embodiment of the present invention is the differential privacy histogram issuing method in step S5
Figure BDA00038427262400000311
Is one-dimensional.
In a second aspect, an embodiment of the present invention provides a differential privacy histogram distribution apparatus based on packet merging, including: a first master module, configured to implement S1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And ε 2 (ii) a A second master module, configured to implement S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Every bucket in the tree is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then merging the groups in the group G pairwise, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an exponential mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1, G 2 ,...,G k ) (ii) a Wherein: h = { H 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing the initial packet set before merging; g 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Represents the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; a third master module, configured to implement S3: averaging each group of G to obtain
Figure BDA0003842726240000032
Wherein
Figure BDA0003842726240000033
Representing a mean grouping set after mean value calculation;
Figure BDA0003842726240000034
representing the Kth mean grouping; a fourth master module, configured to implement S4: to pair
Figure BDA0003842726240000035
Adding Laplace noise to obtain
Figure BDA0003842726240000036
Wherein:
Figure BDA0003842726240000037
representing a noise grouping set obtained by adding noise to the mean grouping set;
Figure BDA0003842726240000038
a packet representing the kth added noise; a fifth master module, configured to implement S5: to pair
Figure BDA0003842726240000039
Restoring the original histogram sequence to obtain the published differential privacy histogram
Figure BDA0003842726240000041
Wherein:
Figure BDA0003842726240000042
is represented by
Figure BDA0003842726240000043
Recovering the difference privacy histogram obtained by the original histogram sequence;
Figure BDA0003842726240000044
a bucket representing the added noise in the histogram; n represents the total number of differential privacy histogram buckets.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, and the processor invokes the program instructions to perform the method for issuing a differential privacy histogram based on grouping combination provided in any of the various implementations of the first aspect.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a differential privacy histogram publication method based on packet merging, as provided in any of the various implementations of the first aspect.
According to the differential privacy histogram publishing method and device based on grouping and merging, the data are reasonably and accurately partitioned by adopting a grouping and merging mode for the histogram, the accuracy of grouping and partitioning is effectively improved, the error of publishing the data is reduced, the usability of the data is improved, the optimal partitioning of the histogram grouping can be realized, the influence of noise on the accuracy of the data is greatly reduced, and the usability and the publishing efficiency of the data are effectively improved while the differential privacy constraint is met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below to the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is also possible for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a differential privacy histogram publication method based on packet merging according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a differential privacy histogram distribution apparatus based on packet merging according to an embodiment of the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In addition, the technical features of the various embodiments or individual embodiments provided in the present invention may be arbitrarily combined with each other to form a feasible technical solution, and the combination is not limited by the sequence of steps and/or the structural composition mode, but must be based on the realization of the capability of a person skilled in the art, and when the technical solution combination is contradictory or cannot be realized, the technical solution combination should be considered to be absent and not to be within the protection scope of the present invention.
The invention divides the privacy budget; and setting the final grouping number K, regarding each bucket in the histogram as a single grouping, and traversing all possible merging schemes in a grouping merging mode. Approximately selecting the merging schemes by calculating a scheme distance combination index mechanism of each merging scheme, and merging the selected merging schemes to replace the original two groups; then repeatedly divide intoThe merging process is combined until the final grouping scheme is formed. And finally, grouping the obtained histograms to obtain an average value, adding Laplace noise and recovering the original sequence of the Laplace noise to obtain the issued differential privacy histogram. Based on this idea, an embodiment of the present invention provides a differential privacy histogram distribution method based on packet merging, where, referring to fig. 1, the method includes: s1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And ε 2 (ii) a S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Every bucket in the tree is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then merging the groups in the group G pairwise, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an exponential mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H = 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is a total of 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1, g 2 ,...,g n ) Representing the initial packet set before merging; g is a radical of formula 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Represents the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; s3: averaging each group of G to obtain
Figure BDA0003842726240000051
Wherein
Figure BDA0003842726240000052
Indicates the mean value ofA set of value groupings;
Figure BDA0003842726240000053
representing the Kth mean grouping; s4: to pair
Figure BDA0003842726240000054
Adding Laplace noise to obtain
Figure BDA0003842726240000055
Wherein:
Figure BDA0003842726240000056
representing a noise packet set obtained by adding noise to the mean packet set;
Figure BDA0003842726240000057
a packet representing the kth added noise; s5: to pair
Figure BDA0003842726240000058
Recovering the original histogram order to obtain the issued differential privacy histogram
Figure BDA0003842726240000059
Wherein:
Figure BDA00038427262400000510
is represented by
Figure BDA00038427262400000511
Recovering the difference privacy histogram obtained by the original histogram sequence;
Figure BDA00038427262400000512
a bucket representing the added noise in the histogram; n represents the total number of differential privacy histogram buckets.
Based on the content of the foregoing method embodiment, as an optional embodiment, in the differential privacy histogram distribution method based on packet merging provided in the embodiment of the present invention, the privacy budget epsilon in step S1 is a given positive value, and epsilon 1 For packet merging,. Epsilon 2 For packet-added noise.
Based on the content of the foregoing method embodiment, as an optional embodiment, the implementation of step S2 in the differential privacy histogram issuing method based on packet merging provided in the embodiment of the present invention specifically includes: s2.1 sets the final number of packets K, where K =1,2. n is the grouping number of the histogram; s2.2 original histogram H = { H = 1 ,h 2 ,...,h n Each bucket in the set of buckets is treated as a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a S2.3, pairwise merging is carried out on the groups in the histogram, and all possible merging schemes P (P) are traversed 1 ,p 2 ,...,p y ) (ii) a Wherein: p (P) 1 ,p 2 ,...,p y ) Representing a set of merging schemes; p is a radical of 1 ,p 2 ,...,p y Indicating that there are all possible merging scenarios; y represents the total number of recipes; s2.4 calculate the solution distance u (p, p) for each merging solution s ) And set it as utility function:
Figure BDA0003842726240000061
wherein: p is a radical of s (g i ,g j ) For a merging scheme of a set of schemes P, P s In which two packets g are included i And g j H is g i A certain bucket in (1), h' represents g j Is provided to the one of the buckets in the drum,
Figure BDA0003842726240000062
representing the minimum distance between the packets, which is used to measure the similarity between the packets, the utility function setting should satisfy the requirement: the smaller the distance between the packets, the greater the probability of being merged; for the subsequent probability calculation to meet the requirement, the inverse number of the grouping distance is adopted
Figure BDA0003842726240000063
To construct a utility function; s2.5 calculating the sampling probability Pr (p, p) of each merging scheme by using an exponential mechanism and combining the scheme distances s ):
Figure BDA0003842726240000064
Wherein, the probability Pr (p, p) of the merging scheme s ) Represents a merging scheme P s Probability of being selected, epsilon 1 Is a privacy budget; Δ u is the global sensitivity; u (p, p) s ) Is P s A utility function of (a); as can be seen from the definition of global sensitivity, deleting any record in the data set has a maximum influence on the utility function of 1, so Δ u =1, y represents the total number of solutions;
Figure BDA0003842726240000065
for merging scheme P s A fitness function of; the numerator calculates the merging scheme P s The denominator calculates the sum of the fitness values of all the merging schemes; s2.6 according to the sampling probability of each merging scheme; selecting a merging scheme by using a wheel disc; s2.7, merging the selected merging schemes, and regarding the merging schemes as a new group to replace the original two groups; s2.8, repeating S2.3-S2.7 until the groups are combined into K groups, and ending the cycle; s2.9 Return to the Final Merge scheme G (G) 1 ,G 2 ,...,G k )。
Based on the content of the foregoing method embodiment, as an optional embodiment, in the differential privacy histogram distribution method based on grouping combination provided in the embodiment of the present invention, in step S2, if the number of buckets of the histogram is n, the number of the groups is also n.
Based on the content of the foregoing method embodiment, as an optional embodiment, in the differential privacy histogram distribution method based on packet merging provided in the embodiment of the present invention, the pair in step S4 is used for
Figure BDA0003842726240000071
The magnitude of the added Laplace noise is epsilon 2
Based on the content of the above method embodiment, as an optional embodiment, the grouping-based merging provided in the embodiment of the present inventionThe method for distributing a differential privacy histogram of (1), the differential privacy histogram in step S5
Figure BDA0003842726240000072
Is one-dimensional. In particular, privacy is demonstrated as follows: setting packet merging procedure to M 1 The averaging and noise adding process is M 2 . First, the merging procedure is proportional to
Figure BDA0003842726240000073
The probability of the combination scheme is selected, the whole process cannot cause privacy disclosure, and therefore the combination process meets the requirement of epsilon 1 -differential privacy. Since the noise added to the mean packet is ε 2 Thus process M 2 Satisfies epsilon 2 -differential privacy. And because ε = ε 12 As can be seen from the combined nature of differential privacy, the method satisfies epsilon-differential privacy as a whole.
According to the differential privacy histogram publishing method based on grouping and merging provided by the embodiment of the invention, the data is reasonably and accurately partitioned by adopting a grouping and merging mode for the histogram, the accuracy of grouping and partitioning is effectively improved, the error of publishing the data is further reduced, the usability of the data is improved, the optimal partitioning of the histogram grouping can be realized, the influence of noise on the accuracy of the data is greatly reduced, and the usability and the publishing efficiency of the data are effectively improved while the differential privacy constraint is met.
The implementation basis of the various embodiments of the present invention is realized by programmed processing performed by a device having a processor function. Therefore, in engineering practice, the technical solutions and functions thereof of the embodiments of the present invention can be packaged into various modules. Based on this reality, on the basis of the foregoing embodiments, embodiments of the present invention provide a differential privacy histogram distribution apparatus based on packet merging, which is used for executing the differential privacy histogram distribution method based on packet merging in the foregoing method embodiments. Referring to fig. 2, the apparatus includes: a first master module, configured to implement S1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And epsilon 2 (ii) a A second master module, configured to implement S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Each bucket in the set of buckets is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then, pairwise merging the groups in the group G, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an index mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is a total of 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing the initial packet set before merging; g 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Representing the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; a third master module, configured to implement S3: averaging each group of G to obtain
Figure BDA0003842726240000081
Wherein
Figure BDA0003842726240000082
Representing a mean grouping set after mean value calculation;
Figure BDA0003842726240000083
represents the Kth mean grouping; a fourth master module, configured to implement S4: to pair
Figure BDA0003842726240000084
Adding Laplace noise to obtain
Figure BDA0003842726240000085
Wherein:
Figure BDA0003842726240000086
representing a noise packet set obtained by adding noise to the mean packet set;
Figure BDA0003842726240000087
a packet representing the kth added noise; a fifth master module, configured to implement S5: to pair
Figure BDA0003842726240000088
Restoring the original histogram sequence to obtain the published differential privacy histogram
Figure BDA0003842726240000089
Wherein:
Figure BDA00038427262400000810
is represented by
Figure BDA00038427262400000811
Recovering the difference privacy histogram obtained by the original histogram sequence;
Figure BDA00038427262400000812
a bucket representing the added noise in the histogram; n represents the total number of differential privacy histogram buckets.
According to the differential privacy histogram publishing device based on grouping and merging, provided by the embodiment of the invention, the data are reasonably and accurately divided by adopting a grouping and merging mode for the histogram through adopting a plurality of modules in the graph shown in fig. 2, so that the accuracy of grouping and dividing is effectively improved, the error of publishing the data is further reduced, the usability of the data is improved, the optimal grouping and dividing of the histogram can be realized, the influence of noise on the accuracy of the data is greatly reduced, and the usability and the publishing efficiency of the data are effectively improved while the differential privacy constraint is met.
It should be noted that, the apparatus in the apparatus embodiment provided by the present invention may be used for implementing methods in other method embodiments provided by the present invention, except that corresponding function modules are provided, and the principle of the apparatus embodiment provided by the present invention is basically the same as that of the apparatus embodiment provided by the present invention, so long as a person skilled in the art obtains corresponding technical means by combining technical features on the basis of the apparatus embodiment described above, and obtains a technical solution formed by these technical means, on the premise of ensuring that the technical solution has practicability, the apparatus in the apparatus embodiment described above may be modified, so as to obtain a corresponding apparatus class embodiment, which is used for implementing methods in other method class embodiments. For example:
based on the content of the foregoing device embodiment, as an optional embodiment, the differential privacy histogram distribution device based on packet merging provided in the embodiment of the present invention further includes: a first submodule for implementing that the privacy budget ε in step S1 is a given positive value, and ε 1 is used for packet merging ε 2 For packet-added noise.
Based on the content of the foregoing device embodiment, as an optional embodiment, the differential privacy histogram distribution device based on packet merging provided in the embodiment of the present invention further includes: the second sub-module is configured to implement the step S2 specifically including: s2.1 sets the final number of packets K, where K =1,2. n is the grouping number of the histogram; s2.2 original histogram H = { H = 1 ,h 2 ,...,h n Each bucket in the set of buckets is treated as a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a S2.3, pairwise merging is carried out on the groups in the histogram, and all possible merging schemes P (P) are traversed 1 ,p 2 ,...,p y ) (ii) a Wherein: p (P) 1 ,p 2 ,...,p y ) Representing a set of merging schemes; p is a radical of formula 1 ,p 2 ,...,p y Indicating that there are all possible merging scenarios; y represents the total number of recipes; s2.4 calculating the scheme distance of each merging schemeFrom u (p, p) s ) And set it as utility function:
Figure BDA0003842726240000091
wherein: p is a radical of formula s (g i ,g j ) For a merging scheme of a set of schemes P, P s Contains two groups g i And g j H is g i A certain bucket in (1), h' represents g j The amount of the water in the certain bucket,
Figure BDA0003842726240000092
representing the minimum distance between the packets, which is used to measure the similarity between the packets, the utility function setting should satisfy the requirement: the smaller the distance between the packets, the greater the probability of being merged; for the subsequent probability calculation to meet the requirement, the opposite number of the grouping distance is adopted
Figure BDA0003842726240000093
To construct a utility function; s2.5 calculating the sampling probability Pr (p, p) of each merging scheme by using an exponential mechanism and combining scheme distances s ):
Figure BDA0003842726240000094
Wherein, the probability Pr (p, p) of the merging scheme s ) Represents a merging scheme P s Probability of being selected, epsilon 1 Is a privacy budget; Δ u is the global sensitivity; u (p, p) s ) Is P s A utility function of (a); as can be seen from the definition of global sensitivity, the effect of deleting any record in the data set on the utility function is at most 1, so Δ u =1, y represents the total number of solutions;
Figure BDA0003842726240000095
for merging scheme P s A fitness function of; the numerator calculates the merging scheme P s The denominator calculates the sum of the fitness values of all the merging schemes;s2.6, according to the sampling probability of each merging scheme; selecting a merging scheme by using a wheel disc; s2.7, merging the selected merging schemes, and regarding the merged scheme as a new group to replace the original two groups; s2.8, repeating S2.3-S2.7 until K groups are combined, and ending the cycle; s2.9 Return to Final Merge scheme G (G) 1 ,G 2 ,...,G k )。
Based on the content of the foregoing device embodiment, as an optional embodiment, the differential privacy histogram distribution device based on packet merging provided in the embodiment of the present invention further includes: and a third sub-module, configured to implement step S2, if the number of buckets of the histogram is n, the number of groups is also n.
Based on the content of the foregoing device embodiment, as an optional embodiment, the differential privacy histogram distribution device based on packet merging provided in the embodiment of the present invention further includes: a fourth sub-module for implementing the pair in step S4
Figure BDA0003842726240000101
The magnitude of the added Laplace noise is epsilon 2
Based on the content of the foregoing device embodiment, as an optional embodiment, the differential privacy histogram distribution device based on packet merging provided in the embodiment of the present invention further includes: a fifth sub-module for implementing the difference privacy histogram of step S5
Figure BDA0003842726240000102
Is one-dimensional.
The method of the embodiment of the invention is realized by depending on the electronic equipment, so that the related electronic equipment is necessarily introduced. To this end, an embodiment of the present invention provides an electronic apparatus, as shown in fig. 3, including: the system comprises at least one processor (processor), a communication Interface (communication Interface), at least one memory (memory) and a communication bus, wherein the at least one processor, the communication Interface and the at least one memory are communicated with each other through the communication bus. The at least one processor may invoke logic instructions in the at least one memory to perform all or a portion of the steps of the methods provided by the various method embodiments described above.
In addition, the logic instructions in the at least one memory may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the method embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. Based on this recognition, each block in the flowchart or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In this patent, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A differential privacy histogram release method based on packet merging is characterized by comprising the following steps: s1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And epsilon 2 (ii) a S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Every bucket in the tree is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then, pairwise merging the groups in the group G, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an index mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H = 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is a total of 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing an initial packet set before merging; g 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Representing the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in the final set of packets, G k Represents the kth packet; s3: averaging each group of G to obtain
Figure FDA0003842726230000011
Wherein
Figure FDA0003842726230000012
Representing a mean grouping set after the mean value is solved;
Figure FDA0003842726230000013
represents the Kth mean grouping; s4: to pair
Figure FDA0003842726230000014
Adding Laplace noise to obtain
Figure FDA0003842726230000015
Wherein:
Figure FDA0003842726230000016
representing a noise packet set obtained by adding noise to the mean packet set;
Figure FDA0003842726230000017
a packet representing the kth added noise; s5: to pair
Figure FDA0003842726230000018
Recovering the original histogram order to obtain the issued differential privacy histogram
Figure FDA0003842726230000019
Wherein:
Figure FDA00038427262300000110
is represented by
Figure FDA00038427262300000111
Recovering the difference privacy histogram obtained by the original histogram sequence;
Figure FDA00038427262300000112
a bucket representing the added noise in the histogram; n denotes the total number of differential privacy histogram buckets.
2. The packet-based of claim 1Merged differential privacy histogram distribution method, characterized in that the privacy budget ε in step S1 is a given positive value, and ε 1 is used for packet merging 2 For packet-added noise.
3. The method for distributing the differential privacy histogram based on the grouping combination according to claim 2, wherein the step S2 is implemented specifically by: s2.1 setting a final number of groups K, where K =1, 2.., n, n is the number of groups of the histogram; s2.2 original histogram H = { H = 1 ,h 2 ,...,h n Every bucket in the tree is treated as a single packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a S2.3, pairwise merging is carried out on the groups in the histogram, and all possible merging schemes P (P) are traversed 1 ,p 2 ,...,p y ) (ii) a Wherein: p (P) 1 ,p 2 ,...,p y ) Representing a set of merging schemes; p is a radical of formula 1 ,p 2 ,...,p y Indicating that there are all possible merging scenarios; y represents the total number of recipes; s2.4 calculate the solution distance u (p, p) for each merging solution s ) And set it as utility function:
Figure FDA0003842726230000021
wherein: p is a radical of formula s (g i ,g j ) For a merging scheme of a set of schemes P, P s In which two packets g are included i And g j H is g i A certain bucket in (1), h' represents g j Is provided to the one of the buckets in the drum,
Figure FDA0003842726230000022
representing the minimum distance between the packets, which is used to measure the similarity between the packets, the utility function setting should satisfy the requirement: the smaller the distance between the packets, the greater the probability of being merged; for the subsequent probability calculation to meet the requirement, the inverse number of the grouping distance is adopted
Figure FDA0003842726230000023
To construct a utility function; s2.5 calculating the sampling probability Pr (p, p) of each merging scheme by using an exponential mechanism and combining scheme distances s ):
Figure FDA0003842726230000024
Wherein, the probability Pr (p, p) of the merging scheme s ) Represents a merging scheme P s Probability of being selected, epsilon 1 Is a privacy budget; Δ u is global sensitivity; u (p, p) s ) Is P s A utility function of (a); as can be seen from the definition of global sensitivity, deleting any record in the data set has a maximum influence on the utility function of 1, so Δ u =1, y represents the total number of solutions;
Figure FDA0003842726230000025
for merging scheme P s A fitness function of (a); the numerator calculates the merging scheme P s The denominator calculates the sum of the fitness values of all the merging schemes; s2.6, according to the sampling probability of each merging scheme; selecting a merging scheme by using a wheel disc; s2.7, merging the selected merging schemes, and regarding the merging schemes as a new group to replace the original two groups; s2.8, repeating S2.3-S2.7 until the groups are combined into K groups, and ending the cycle; s2.9 Return to the Final Merge scheme G (G) 1 ,G 2 ,...,G k )。
4. The method according to claim 3, wherein in step S2, if the number of the histogram buckets is n, the number of the groups is n.
5. The method for distributing the differential privacy histogram based on the grouping combination as claimed in claim 4, wherein the step S4 is performed on
Figure FDA0003842726230000031
The magnitude of the added Laplace noise is epsilon 2
6. The method according to claim 5, wherein the differential privacy histogram distribution method in step S5 is characterized in that the differential privacy histogram distribution method in step S5 is implemented by using a plurality of different privacy histograms
Figure FDA0003842726230000032
Is one-dimensional.
7. A differential privacy histogram distribution apparatus based on packet merging, comprising: a first master module, configured to implement S1: privacy budget partitioning, partitioning privacy budget ε into ε 1 And ε 2 (ii) a A second master module, configured to implement S2: grouping and merging, firstly setting the final merging grouping number K, and setting an original histogram H = { H = { (H) } 1 ,h 2 ,...,h n Each bucket in the set of buckets is a separate packet, resulting in a set of packets g (g) 1 ,g 2 ,...,g n ) (ii) a Then merging the groups in the group G pairwise, traversing all merging schemes, calculating the scheme distance of each merging scheme, selecting the merging schemes through an exponential mechanism for approximate merging, regarding the merged scheme as a new group to replace the original two groups, repeating the merging process until the final group number K is reached, obtaining the final merging scheme, and forming a histogram group G (G) 1 ,G 2 ,...,G k ) (ii) a Wherein: h = { H 1 ,h 2 ,...,h n Denotes the original histogram sequence; h is 1 ,h 2 ,...,h n Representing the buckets in the histogram, n being the total number of original histogram buckets; g (g) 1 ,g 2 ,...,g n ) Representing an initial packet set before merging; g is a radical of formula 1 ,g 2 ,...,g n Representing the packets in an initial set of packets, each packet consisting of one bucket, so the initial total number of packets is n; g (G) 1 ,G 2 ,...,G k ) Representing the final set of packets resulting from the merging, G 1 ,G 2 ,...,G k Representing packets in a final set of packets,G k Represents the kth packet; a third main module, configured to implement S3: averaging each group of G to obtain
Figure FDA0003842726230000033
Wherein
Figure FDA0003842726230000034
Representing a mean grouping set after mean value calculation;
Figure FDA0003842726230000035
represents the Kth mean grouping; a fourth master module, configured to implement S4: to pair
Figure FDA0003842726230000036
Adding Laplace noise to obtain
Figure FDA0003842726230000037
Wherein:
Figure FDA0003842726230000038
representing a noise packet set obtained by adding noise to the mean packet set;
Figure FDA0003842726230000039
a packet representing the kth added noise; a fifth master module, configured to implement S5: to pair
Figure FDA00038427262300000310
Restoring the original histogram sequence to obtain the published differential privacy histogram
Figure FDA00038427262300000311
Wherein:
Figure FDA00038427262300000312
is represented by
Figure FDA00038427262300000313
Recovering the original histogram sequence to obtain a differential privacy histogram;
Figure FDA00038427262300000314
a bucket representing the added noise in the histogram; n represents the total number of differential privacy histogram buckets.
8. An electronic device, comprising:
at least one processor, at least one memory, and a communication interface; wherein,
the processor, the memory and the communication interface are communicated with each other;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 6.
CN202211109967.4A 2022-09-13 2022-09-13 Differential privacy histogram publishing method and device based on grouping combination Pending CN115495778A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211109967.4A CN115495778A (en) 2022-09-13 2022-09-13 Differential privacy histogram publishing method and device based on grouping combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211109967.4A CN115495778A (en) 2022-09-13 2022-09-13 Differential privacy histogram publishing method and device based on grouping combination

Publications (1)

Publication Number Publication Date
CN115495778A true CN115495778A (en) 2022-12-20

Family

ID=84467535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211109967.4A Pending CN115495778A (en) 2022-09-13 2022-09-13 Differential privacy histogram publishing method and device based on grouping combination

Country Status (1)

Country Link
CN (1) CN115495778A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688614A (en) * 2024-02-01 2024-03-12 杭州海康威视数字技术股份有限公司 Differential privacy protection data availability enhancement method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688614A (en) * 2024-02-01 2024-03-12 杭州海康威视数字技术股份有限公司 Differential privacy protection data availability enhancement method and device and electronic equipment
CN117688614B (en) * 2024-02-01 2024-04-30 杭州海康威视数字技术股份有限公司 Differential privacy protection data availability enhancement method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN107181797B (en) Block compression method and system of block chain
CN106682906B (en) Risk identification and service processing method and equipment
EP3321819A1 (en) Device, method and program for securely reducing an amount of records in a database
CN106327340B (en) Abnormal node set detection method and device for financial network
CN110517029B (en) Method, device, equipment and blockchain system for verifying blockchain cross-chain transaction
CN106909575B (en) Text clustering method and device
CN110348238B (en) Privacy protection grading method and device for application
CN115495778A (en) Differential privacy histogram publishing method and device based on grouping combination
CN111723159A (en) Data verification method and device based on block chain
CN112214402B (en) Code verification algorithm selection method, device and storage medium
JP6310345B2 (en) Privacy protection device, privacy protection method, and database creation method
CN104239753B (en) Tamper detection method for text documents in cloud storage environment
US20240340293A1 (en) Reconstructing a Dataset After Detection of a Network Security Threat in a Network
CN110276050B (en) Method and device for comparing high-dimensional vector similarity
CN109977131A (en) A kind of house type matching system
CN116306831A (en) Model authentication method and device for generating countermeasure network
CN113111687A (en) Data processing method and system and electronic equipment
CN112528068B (en) Voiceprint feature storage method, voiceprint feature matching method, voiceprint feature storage device and electronic equipment
CN109800823B (en) Clustering method and device for POS terminals
CN114443629A (en) Cluster bloom filter data duplication removing method, terminal equipment and storage medium
CN115577390A (en) Differential privacy histogram publishing method and device based on sampling grouping
CN111431869A (en) Method and device for acquiring vulnerability information heat
CN116797781B (en) Target detection method and device, electronic equipment and storage medium
CN113723915A (en) Nuclear power plant business form encoding method, device, equipment and readable storage medium
CN113127712B (en) Filing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination