CN114662152B - Real-time data-oriented localization differential privacy data stream publishing method - Google Patents

Real-time data-oriented localization differential privacy data stream publishing method Download PDF

Info

Publication number
CN114662152B
CN114662152B CN202210352928.0A CN202210352928A CN114662152B CN 114662152 B CN114662152 B CN 114662152B CN 202210352928 A CN202210352928 A CN 202210352928A CN 114662152 B CN114662152 B CN 114662152B
Authority
CN
China
Prior art keywords
data
privacy
sliding window
histogram
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210352928.0A
Other languages
Chinese (zh)
Other versions
CN114662152A (en
Inventor
陶陶
张福南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Technology AHUT
Original Assignee
Anhui University of Technology AHUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Technology AHUT filed Critical Anhui University of Technology AHUT
Priority to CN202210352928.0A priority Critical patent/CN114662152B/en
Publication of CN114662152A publication Critical patent/CN114662152A/en
Application granted granted Critical
Publication of CN114662152B publication Critical patent/CN114662152B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention discloses a localization differential privacy data stream release method for real-time data, and belongs to the technical field of data privacy protection. The method combines localized differential privacy with the sliding window model, calculates the similarity between the sliding window data at adjacent moments by adopting a similarity measurement method, and adds random disturbance into a similarity result; when the similarity result is positive, adopting a greedy clustering algorithm to reduce errors and perform noise processing; and then, a reasonable PBA privacy budget allocation strategy is adopted, excessive privacy budget consumption is avoided, and finally, a histogram meeting the requirement of noise addition is issued. The invention not only can resist the attack of an unreliable third party, but also can effectively reduce the error of the histogram release, thereby leading the localized differential privacy technology theory to have wider prospect in practical application.

Description

Real-time data-oriented localization differential privacy data stream publishing method
Technical Field
The invention belongs to the technical field of data privacy protection, and particularly relates to a localization differential privacy data stream release method for real-time data.
Background
Privacy protection has become particularly important in the big data age where information is rapidly evolving. For example, access records for a large number of users collected on an internet search engine: the purchase records of the users and comments of the users can be subjected to targeted analysis on the current popular trend; in the construction of the intelligent government, the data sharing of departments is realized, and the cooperation office among the departments is facilitated. If the data sets contain personal sensitive information, statistical analysis of the data on the premise of ensuring that the personal privacy is not revealed is a hot spot of data privacy protection research nowadays. Localized differential privacy provides a stronger guarantee than centralized differential privacy, as it can not only resist attacks with any background knowledge, but also prevent privacy attacks from untrusted third parties. Currently, apple, aleba, etc. companies have used LDP models to collect some relevant information that users set up at the default browser home page and search engine.
At present, various algorithms are used for static data in the field of data privacy protection, but application scenes of real-time data streams are wider, but histogram release methods facing the application scenes of the real-time data streams are few, so that the method for protecting the privacy of research real-time data is particularly needed.
Through retrieval, the Chinese patent application number is: 2019107977157, filing date: 8.27 days 2019, the invention name is: a multi-dimensional crowd-sourced data true value discovery method based on localized differential privacy. According to the application, the problems that the adversary with any background knowledge leaks user sensitive data and cannot acquire accurate answers from noisy data sets can be solved, meanwhile, any third party can estimate original data distribution under the condition that the user sensitive information is not known, and therefore the purpose that the privacy of user data can be ensured and accurate results in each crowdsourcing project can be effectively obtained is achieved. The method also adopts the localized differential privacy concept, but only aims at crowdsourcing data and static data statistical analysis, so far, the application scene of real-time data flow is wider, and the research on the privacy protection method of real-time data is less.
For another example, chinese patent application No.: 2018105071444, filing date: 2018, 5 months and 24 days, the invention name is: a fractal dimension-based streaming data differential privacy protection release method. The application uses a sliding window technology to divide the data stream, and the data stream meeting the conditions is displayed in the sliding window in a static mode; then, carrying out initial clustering on the data, calculating fractal dimensions of various types according to initial clustering results, and constructing a fractal tree; sending the data of the dividing window in the first step to a fractal clustering module for data clustering analysis, carrying out fractal dimension calculation, carrying out fractal clustering on the arrived data, carrying out class statistics on a clustering result of the fractal clustering to form groups to be distributed, calculating an inter-group difference value set as an approximate group judgment reference when approximate groups are fused, replacing the similar group judgment reference by a similar group average value, carrying out noise interference on the groups after the grouping fusion optimization, and distributing the group data after the noise interference; when the packet data amount reaches the size of the sliding window, the window is shifted forward, and the steps are repeated to finish final data release. The method adopts the combination of differential privacy and sliding window, and adopts the idea of centralized differential privacy to protect the data privacy, and has the premise that: depending on the trusted third party. This is not true in practical applications, limiting to some extent the application of traditional differential privacy.
Based on the above analysis, there is a need in the art for a data distribution method that satisfies both statistical distribution of real-time data and that does not rely on trusted third parties.
Disclosure of Invention
1. Technical problem to be solved by the invention
Most of the current research methods of real-time data are based on the idea of centralized differential privacy, and although the method can provide a certain guarantee for the privacy of the real-time data, the problems of low availability of the released data, large data release error and exhaustion of privacy budget caused by noise accumulation exist. In view of the above problems in the prior art, the present invention provides a method for publishing a localized differential privacy data stream for real-time data; the present invention combines localized differential privacy with a sliding window model to provide statistical similarity for each user locally rather than for all users on average. After the similarity measurement between the sliding window data of adjacent moments is obtained, a random disturbance algorithm is added to the similarity measurement result. And then a more reasonable privacy budget allocation strategy is adopted during the noise adding process, and finally a noise adding histogram meeting the privacy protection is issued, so that the user privacy can be ensured and the data availability can be improved.
2. Technical proposal
In order to achieve the above purpose, the technical scheme provided by the invention is as follows:
the invention discloses a method for publishing a localized differential privacy data stream oriented to real-time data, which comprises the following steps:
step 1, input raw dataset D= { D 1 ,D 2 ,…,D i I1 is not less than i is not less than N, and initializing parameters to determine privacy protection budget epsilon;
step 2, calculating the correlation distance between the adjacent moment sliding window data according to a decision algorithm, and judging the magnitude of the correlation distance and a threshold value to obtain a similarity result v i Adding random perturbations to the similarity measure v i Obtained v i ′;
Step 3, according to the result of the decision algorithm, if v i The value is positive, the error between the value and the value is reduced through a greedy clustering algorithm, then privacy budget is allocated for data noise adding, then a noise adding histogram at the current moment i is issued, and otherwise, the noise adding histogram at the moment i-1 is issued directly;
and 4, reasonably distributing the privacy according to the PBA privacy distribution strategy, and issuing a noise adding histogram according to the privacy distribution strategy.
Further, the original data set input in step 1 is statistical data, and the privacy preserving budget epsilon is smaller than 1.
Further, in step 2, the similarity v between the adjacent time sliding window data is calculated i ' to determine if a new noisy histogram needs to be issued and add random perturbations to the similarity result to get v i ′。
Further, the decision algorithm in step 2 is specifically as follows:
1) Calculating a correlation distance between data in the sliding window at adjacent moments as shown in formula (1):
Figure BDA0003581495990000031
wherein d (x i ,y i ) For the correlation distance between adjacent time sliding window data, x ik For the original k data at time i, x jk K pieces of data after noise adding processing at the moment j;
2) According to the result of the determination between the correlation distance and the threshold, i.e. the similarity result v i Adding random disturbance to the similarity result to obtain v i ' as formula (2):
Figure BDA0003581495990000032
in the formula, v i A result of the correlation distance being greater than a threshold; vi' is the similarity result after adding random perturbations; if the correlation distance is greater than the threshold value, the similarity result vi' is assigned a value of 1 with the probability of p; if the correlation distance is smaller than the threshold value, the similarity result vi' is assigned as 0 according to the probability of p; otherwise, the similarity result is not processed with the probability of 1-2 Xp.
Furthermore, step 2 performs random disturbance processing on the similarity result of the sliding window data sets at adjacent moments, and the random disturbance algorithm satisfies localized differential privacy.
Still further, the decision algorithm satisfies w-event level privacy, which is assigned as
Figure BDA0003581495990000033
w is the size length of the sliding window.
Further, step 3 calculates the merge error and the non-merge error of each frequency number and the adjacent frequency number through the formula (4); selecting the smallest error between the two to carry out interval grouping; replacing the group frequency by using the average value in the group, then distributing privacy budget according to a privacy distribution strategy, and then issuing a final noisy histogram;
Figure BDA0003581495990000034
Figure BDA0003581495990000035
in formula (4), y 1 Combining errors of each frequency number and adjacent frequency numbers; in formula (5), y 2 Non-merging errors of each frequency number and adjacent frequency numbers; wherein D is i Is the original data at the moment i,
Figure BDA0003581495990000036
and (3) adding noise to the data at the moment j, wherein w is the size of a sliding window, k is the data quantity in the sliding window, n is the total data in the original histogram at the moment i, j is the moment of the recently released noise-added histogram data, and epsilon is the privacy budget.
Still further, the greedy clustering algorithm satisfies w-event level privacy.
Furthermore, the privacy allocation policy in step 4 is to pre-divide epsilon for w data in the window i In which
Figure BDA0003581495990000041
If the correlation distance between the sliding window data at adjacent moments is less than the threshold value, thenIssuing a noisy histogram for the previous instant, reserving the privacy budget ε that was allocated at that instant i The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, setting the number of the previously uncorrupted histograms recorded by the parameter k, and adding all the privacy budgets reserved in the previously skipped histograms to obtain the latest epsilon i ,/>
Figure BDA0003581495990000042
The data satisfying the noise addition is then found and the remaining privacy budget is allocated to the histogram at that time.
3. Advantageous effects
Compared with the prior art, the technical scheme provided by the invention has the following remarkable effects:
(1) According to the real-time data-oriented localization differential privacy data stream publishing method, localization disturbance is carried out on real data, so that only users can know the real data, and the risk of data leakage of a third party can be avoided. And adding random disturbance to the similarity measurement result by calculating the correlation distance between the adjacent time sliding window data sets. Compared with centralized differential privacy, the method can not only resist attacks with any background knowledge, but also avoid attacks from untrusted third parties.
(2) According to the localization differential privacy data stream release method for the real-time data, when the correlation distance between the sliding window data at adjacent moments is larger than the threshold value, a greedy clustering algorithm is adopted to further reduce the statistical error in order to reduce the statistical error between the correlation distance and the threshold value. Calculating a combination error and a non-combination error of each frequency number and adjacent frequency numbers; selecting the smallest error between the two to carry out interval grouping; the average value in the group is utilized to replace the frequency number of the group, the privacy budget is reasonably distributed according to the privacy budget distribution strategy, then the data is subjected to noise adding, and finally a noise adding histogram meeting the requirement is issued, so that the error of the histogram issuing is effectively reduced, and the localized differential privacy technology theory has wider prospect in practical application.
Drawings
FIG. 1 is an algorithmic framework diagram of a real-time data-oriented localized differential privacy data stream publishing method of the present invention;
fig. 2 is a flow chart of the localized differential privacy processing for real-time data of the present invention.
Detailed Description
Most of the existing data protection methods based on localized differential privacy are concentrated on single-value attributes, however, privacy disclosure is easy to occur after data are extracted for many times. For continuous data, two different privacy protection methods are proposed by Dwork: user-level privacy and event-level privacy. Event-level privacy refers to protecting users at a single specific stage throughout the data stream, but not the user privacy throughout the data stream; user-level privacy, in contrast, refers to protecting user privacy by noise-adding the entire data stream, but this reduces the usability of the data. The w-event level privacy is proposed to balance the drawbacks of both. when w is 0, w-event level privacy becomes event level privacy, and when w is infinity, w-event privacy becomes user level privacy. However, there is still a problem at present: the design of w-event level privacy is directed to centralized differential privacy and is therefore not only vulnerable to privacy attacks from untrusted third parties, but also where the object of data similarity computation is all users, which consumes a significant amount of privacy budget.
In order to solve the defect that centralized differential privacy is vulnerable, and the problem that the average value of all users is adopted for similarity statistics to cause excessive consumption of privacy budget. The invention combines localized differential privacy with a sliding window model, adopts localized differential privacy to act on local calculation similarity statistics in a decision stage, acts on each user instead of the traditional whole users, and provides a PBA privacy allocation strategy for reasonably allocating privacy budget to avoid excessive consumption of privacy budget.
The invention improves the traditional data release method for real-time data based on differential privacy, and achieves higher usability while protecting private data.
For a further understanding of the present invention, the present invention will be described in detail with reference to the drawings and specific examples.
Example 1
Referring to fig. 1, the method for publishing a localized differential privacy data stream for real-time data in this embodiment includes the following steps:
step 1, input raw dataset D= { D 1 ,D 2 ,…,D i I1 is not less than i is not less than N, and initializing parameters to determine privacy protection budget epsilon;
firstly, in this embodiment, the numerical data field to be issued is selected to be read from a data source such as a database or csv, the read data is preprocessed, the units are divided, the frequency of the data of each unit is input into the data set D, and the input of the original data set D is completed. The input raw data set is statistical data. The initialization parameters determine the privacy protection budget epsilon, the epsilon value is inversely proportional to the privacy protection degree and is directly proportional to the data availability, namely, the smaller epsilon represents the more noise is added into the data, the higher the protection degree of the data is, and the worse the data availability is. The privacy preserving budget epsilon is in this embodiment smaller than 1.
Step 2, calculating the similarity v between the sliding window data at adjacent moments i ' to determine if a new noisy histogram needs to be issued and add random perturbations to the similarity result to get v i 'A'; namely, the decision algorithm is mainly divided into two steps:
1) Calculating a correlation distance between data in the sliding window at adjacent moments as shown in formula (1):
Figure BDA0003581495990000051
/>
wherein d (x i ,y i ) For the correlation distance (hereinafter denoted by T) between adjacent time sliding window data, x ik For the original k data at time i, x jk K pieces of data after noise adding processing at the j moment.
2) According to the judging result between the correlation distance and the threshold value, namely the similarity result, adding random disturbance into the similarity result to obtain v i ' as in formula (2):
Figure BDA0003581495990000052
the correlation distance T between adjacent time sliding window data can be obtained by the formula (1), in the formula (2), v i For the correlation distance T and the threshold T 0 Comparing the results of the larger sizes, namely similarity results; vi' is the similarity result after adding random perturbations. If the correlation distance T is greater than the threshold T 0 The similarity result vi' is assigned a value of 1 with the probability of p; if the correlation distance T is smaller than the threshold T 0 The similarity result vi' is assigned 0 with the probability of p; otherwise, the similarity result is not processed with the probability of 1-2 Xp.
It should be noted that, given a Localized Differential Privacy (LDP) and a domain Dom (M), a value Range (M), if given any input pair x, x' both satisfy the domain Dom (M), after being perturbed by an algorithm, y e Range (M) is output, and the following inequality is satisfied, the perturbation mechanism M satisfies epsilon-LDP.
Figure BDA0003581495990000061
Wherein the epsilon (privacy budget) value is inversely proportional to the degree of privacy protection and directly proportional to the availability of data, i.e. the smaller the epsilon value, the better the privacy protection of the data and the worse the data availability.
As shown in formula (3): compared with the traditional similarity measurement method, the method for measuring the similarity of the sliding window data sets at adjacent moments carries out random disturbance processing on the similarity results of the sliding window data sets at adjacent moments, the random disturbance algorithm meets localized differential privacy, and the method for measuring the similarity in the embodiment has the advantages that the random disturbance algorithm is added locally, so that not only can attack of a third party be avoided, but also the privacy of a user can be further guaranteed.
The decision algorithm of the embodiment satisfies w-event privacy, and privacy allocation is as follows
Figure BDA0003581495990000062
The following was demonstrated:
vi: correlation distance T and threshold T 0 Comparing the results of the sizes, i.e., similarity results, vi': similarity results after random perturbations are added.
Figure BDA0003581495990000063
And (3) the same principle: can be found when input V i When=0, probability of similarity result obtained by decision algorithm
Figure BDA0003581495990000064
As can be obtained from the two formulas, the decision algorithm of the present embodiment satisfies the following
Figure BDA0003581495990000065
The sequence combination characteristic of differential privacy is as follows: for a sliding window of size length w, the privacy budget through the M1 algorithm is the sum of the privacy budgets of M1.K, i.e. the privacy budget of the M1 algorithm in the sliding window of size w is: />
Figure BDA0003581495990000066
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0003581495990000071
thus, M1 can be confirmed to satisfy->
Figure BDA0003581495990000072
Step 3, reducing error between adjacent moment sliding window data-greedy clustering algorithm:
and (3) calculating the correlation distance between the adjacent time sliding window data through a formula (1), and reducing the error between the correlation distance of the adjacent time sliding window data and the threshold value in a greedy clustering algorithm if the similarity vi' is 1. The method specifically comprises the following steps: calculating a merging error and a non-merging error of each frequency number and an adjacent frequency number through a formula (4); selecting the interval group with the smallest error between the two; and replacing the group frequency by using the average value in the group, then distributing privacy budget according to a privacy distribution strategy, and then issuing a final noisy histogram.
Figure BDA0003581495990000073
Figure BDA0003581495990000074
In formula (4), y 1 Combining errors of each frequency number and adjacent frequency numbers; in formula (5), y 2 Non-merging errors of each frequency number and adjacent frequency numbers; wherein D is i Is the original data at the moment i,
Figure BDA0003581495990000075
and (3) adding noise to the data at the moment j, wherein w is the size of a sliding window, k is the data quantity in the sliding window, n is the total data in the original histogram at the moment i, j is the moment of the recently released noise-added histogram data, and epsilon is the privacy budget. The greedy clustering algorithm described in this embodiment satisfies w-event privacy, and is proved as follows:
for the sliding window with the size length of w, the greedy clustering algorithm satisfies the differential privacy according to the differential privacy parallel combination attribute in the group, and the attribute is combined according to the differential privacy sequence among the groups. Thus, it can be confirmed that: greedy clustering algorithms satisfy differential privacy.
Step 4, distributing privacy budget:
the present embodiment proposes a reasonable privacy budget allocation policy PBA (Privacy Budget Absorption). The main idea of PBA is to pre-divide epsilon for w data in the window i In which
Figure BDA0003581495990000076
If the correlation distance T between adjacent time sliding window data is smaller than the threshold value T 0 Then a noisy histogram is issued at the previous instant, reserving the privacy budget ε that was allocated at that instant i The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, setting the number of the previously uncorrupted histograms recorded by the parameter k, and adding all the privacy budgets reserved in the previously skipped histograms to obtain the latest epsilon i ,/>
Figure BDA0003581495990000077
And uses it to issue this noisy histogram.
The present embodiment combines localized differential privacy with a sliding window model to provide statistical similarity for each user locally rather than averaging for all users. After the similarity measurement between the sliding window data of adjacent moments is obtained, a random disturbance algorithm is added to the similarity measurement result. And then, carrying out noise adding processing, and finally, releasing a noise adding histogram meeting privacy protection, thereby ensuring the privacy of a user and improving the usability of data.
The invention and its embodiments have been described above by way of illustration and not limitation, and the invention is illustrated in the accompanying drawings and described in the drawings in which the actual structure is not limited thereto. Therefore, if one of ordinary skill in the art is informed by this disclosure, the structural mode and the embodiments similar to the technical scheme are not creatively designed without departing from the gist of the present invention.

Claims (7)

1. A real-time data-oriented localization differential privacy data stream release method is characterized by comprising the following steps:
step 1, input raw dataset D= { D 1 ,D 2 ,…,D i I1 is not less than i is not less than N, and initializing parameters to determine privacy protection budget epsilon;
step 2, calculating the correlation distance between the adjacent moment sliding window data according to a decision algorithm, and judging the magnitude of the correlation distance and a threshold value to obtain a similarity result v i Will randomly disturbAdded to the similarity result v i Obtained v i 'A'; wherein, by calculating the similarity between the sliding window data of adjacent moments, it is determined whether a new noise-added histogram needs to be issued, and the random disturbance is added into the similarity result to obtain v i 'A'; the decision algorithm is specifically as follows:
1) Calculating a correlation distance between data in the sliding window at adjacent moments as shown in formula (1):
Figure FDA0004180461030000011
in formula (1), d (x i ,y i ) For the correlation distance between adjacent time sliding window data, x ik For the original k data at time i, x jk K pieces of data after noise adding processing at the moment j;
2) According to the result of the determination between the correlation distance and the threshold, i.e. the similarity result v i Adding random disturbance to the similarity result to obtain v i ' as formula (2):
Figure FDA0004180461030000012
in formula (2), v i A result of the correlation distance being greater than a threshold; vi' is the similarity result after adding random perturbations; if the correlation distance is greater than the threshold value, the similarity result vi' is assigned a value of 1 with the probability of p; if the correlation distance is smaller than the threshold value, the similarity result vi' is assigned as 0 according to the probability of p; otherwise, the similarity result is not processed with the probability of 1-2 Xp;
step 3, if v i The value is positive, the data errors in the sliding windows of adjacent moments are reduced through a greedy clustering algorithm, privacy budget is allocated for data noise adding, then a noise adding histogram of the current moment i is issued, and otherwise, the noise adding histogram of the moment i-1 is issued directly;
and 4, reasonably distributing the privacy according to the PBA privacy distribution strategy, and issuing a noise adding histogram according to the privacy distribution strategy.
2. The method for distributing localized differential privacy data stream for real-time data according to claim 1, wherein the method comprises the following steps: the original data set input in the step 1 is statistical data, and the privacy protection budget epsilon is smaller than 1.
3. The method for distributing localized differential privacy data stream for real-time data according to claim 2, wherein the method comprises the steps of: and 2, carrying out random disturbance processing on the similarity result of the sliding window data sets at adjacent moments, wherein a random disturbance algorithm meets localized differential privacy.
4. A method for distributing localized differential privacy data stream for real-time data according to claim 3, wherein: the decision algorithm satisfies w-event level privacy, which is assigned as
Figure FDA0004180461030000021
w is the size length of the sliding window.
5. The method for distributing localized differential privacy data stream for real-time data according to claim 4, wherein the method comprises the steps of: step 3, calculating the combination error and the non-combination error of each frequency number and the adjacent frequency number through a formula (4); selecting the minimum error between the merging error and the non-merging error for interval grouping; replacing the group frequency by using the average value in the group, then distributing privacy budget according to a privacy distribution strategy, and then issuing a final noisy histogram;
Figure FDA0004180461030000022
/>
Figure FDA0004180461030000023
in formula (4), y 1 Combining errors of each frequency number and adjacent frequency numbers; in formula (5), y 2 Non-merging errors of each frequency number and adjacent frequency numbers; wherein D is i Is the original data at the moment i,
Figure FDA0004180461030000024
and (3) adding noise to the data at the moment j, wherein w is the size of a sliding window, k is the data quantity in the sliding window, n is the total data in the original histogram at the moment i, j is the moment of the recently released noise-added histogram data, and epsilon is the privacy budget.
6. The method for distributing localized differential privacy data stream for real-time data according to claim 5, wherein the method comprises the steps of: the greedy clustering algorithm satisfies w-event level privacy.
7. The method for distributing localized differential privacy data stream for real-time data according to claim 6, wherein the method comprises the steps of: the privacy allocation strategy in step 4 is to pre-divide epsilon for w data in the window i In which
Figure FDA0004180461030000025
If the correlation distance between the sliding window data of adjacent moments is smaller than the threshold value, a noise adding histogram of the previous moment is issued, and the privacy budget epsilon allocated at the moment is reserved i The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, setting the number of the previously uncorrupted histograms recorded by the parameter k, and adding all the privacy budgets reserved in the previously skipped histograms to obtain the latest epsilon i ,/>
Figure FDA0004180461030000026
The data satisfying the noise addition is then found and the remaining privacy budget is allocated to the histogram at that time. />
CN202210352928.0A 2022-04-06 2022-04-06 Real-time data-oriented localization differential privacy data stream publishing method Active CN114662152B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210352928.0A CN114662152B (en) 2022-04-06 2022-04-06 Real-time data-oriented localization differential privacy data stream publishing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210352928.0A CN114662152B (en) 2022-04-06 2022-04-06 Real-time data-oriented localization differential privacy data stream publishing method

Publications (2)

Publication Number Publication Date
CN114662152A CN114662152A (en) 2022-06-24
CN114662152B true CN114662152B (en) 2023-05-12

Family

ID=82035824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210352928.0A Active CN114662152B (en) 2022-04-06 2022-04-06 Real-time data-oriented localization differential privacy data stream publishing method

Country Status (1)

Country Link
CN (1) CN114662152B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329898B (en) * 2022-10-10 2023-01-24 国网浙江省电力有限公司杭州供电公司 Multi-attribute data publishing method and system based on differential privacy policy

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131605A (en) * 2020-09-24 2020-12-25 合肥城市云数据中心股份有限公司 Differential privacy dynamic data publishing method based on mutual information correlation technology
CN112307514A (en) * 2020-11-26 2021-02-02 哈尔滨工程大学 Difference privacy greedy grouping method adopting Wasserstein distance

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101935528B1 (en) * 2017-11-28 2019-01-04 서강대학교 산학협력단 System and method for traffic volume publication applying differential privacy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131605A (en) * 2020-09-24 2020-12-25 合肥城市云数据中心股份有限公司 Differential privacy dynamic data publishing method based on mutual information correlation technology
CN112307514A (en) * 2020-11-26 2021-02-02 哈尔滨工程大学 Difference privacy greedy grouping method adopting Wasserstein distance

Also Published As

Publication number Publication date
CN114662152A (en) 2022-06-24

Similar Documents

Publication Publication Date Title
CN107871087B (en) Personalized differential privacy protection method for high-dimensional data release in distributed environment
Piao et al. Privacy-preserving governmental data publishing: A fog-computing-based differential privacy approach
Al-Hussaeni et al. Privacy-preserving trajectory stream publishing
CN105046160B (en) A kind of Data Flow Oriented difference privacy dissemination method based on histogram
CN108763956B (en) Fractal dimension-based streaming data differential privacy protection publishing method
CN110874488A (en) Stream data frequency counting method, device and system based on mixed differential privacy and storage medium
CN110471957B (en) Localized differential privacy protection frequent item set mining method based on frequent pattern tree
Kotsogiannis et al. One-sided differential privacy
CN114662152B (en) Real-time data-oriented localization differential privacy data stream publishing method
Zhao et al. Novel trajectory privacy-preserving method based on prefix tree using differential privacy
CN106980795A (en) Community network data-privacy guard method
CN115130119B (en) Utility optimization set data protection method based on local differential privacy
Ding et al. Differentially private publication of streaming trajectory data
CN109344643B (en) Privacy protection method and system for triangle data release in facing graph
Yan et al. Dynamic release of big location data based on adaptive sampling and differential privacy
Xu et al. Privacy preserving online matching on ridesharing platforms
CN110457940B (en) Differential privacy measurement method based on graph theory and mutual information quantity
KR101632073B1 (en) Method, device, system and non-transitory computer-readable recording medium for providing data profiling based on statistical analysis
CN108111968B (en) Generalization-based location privacy protection method
Ju et al. Local differential privacy-based privacy-preserving data range query scheme for electric vehicle charging
CN115033915A (en) Sensitive tag track data differential privacy publishing method based on generation countermeasure network
CN109450889B (en) Privacy protection release method for converged data streams in Internet of things
CN112464276A (en) Sparse position track privacy protection method
Yang et al. Differentially private geospatial data publication based on grid clustering
Zhao et al. A new collaborative filtering algorithm with combination of explicit trust and implicit trust

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant