CN114662152A - Real-time data-oriented localized differential privacy data stream publishing method - Google Patents

Real-time data-oriented localized differential privacy data stream publishing method Download PDF

Info

Publication number
CN114662152A
CN114662152A CN202210352928.0A CN202210352928A CN114662152A CN 114662152 A CN114662152 A CN 114662152A CN 202210352928 A CN202210352928 A CN 202210352928A CN 114662152 A CN114662152 A CN 114662152A
Authority
CN
China
Prior art keywords
data
privacy
sliding window
similarity
histogram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210352928.0A
Other languages
Chinese (zh)
Other versions
CN114662152B (en
Inventor
陶陶
张福南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Technology AHUT
Original Assignee
Anhui University of Technology AHUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Technology AHUT filed Critical Anhui University of Technology AHUT
Priority to CN202210352928.0A priority Critical patent/CN114662152B/en
Publication of CN114662152A publication Critical patent/CN114662152A/en
Application granted granted Critical
Publication of CN114662152B publication Critical patent/CN114662152B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention discloses a real-time data-oriented localized differential privacy data stream publishing method, and belongs to the technical field of data privacy protection. The method combines the localized differential privacy with a sliding window model, calculates the similarity between sliding window data at adjacent moments by adopting a similarity measurement method, and adds random disturbance to a similarity result; when the similarity result is positive, adopting a greedy clustering algorithm to reduce errors and add noise; and then, a reasonable PBA privacy budget allocation strategy is adopted, excessive privacy budget consumption is avoided, and finally, a histogram meeting the noise adding requirement is issued. The method can resist the attack of an untrusted third party and effectively reduce the error of histogram release, so that the localized differential privacy technology theory has wider prospect in practical application.

Description

Real-time data-oriented localized differential privacy data stream publishing method
Technical Field
The invention belongs to the technical field of data privacy protection, and particularly relates to a real-time data-oriented localized differential privacy data stream publishing method.
Background
In the big data age where information is rapidly developing, privacy protection becomes especially important. For example, a large number of users' access records collected on an internet search engine: the purchase records of the users and the comments of the users can obtain targeted analysis on the current popular trend; in the construction of intelligent government, the department data sharing is realized, and the cooperative office among departments is facilitated. The data sets contain personal sensitive information, and how to perform statistical analysis on data on the premise of ensuring that the privacy of individuals is not disclosed is a hotspot of data privacy protection research at present. Localized differential privacy provides a more powerful guarantee than centralized differential privacy, as it can not only defend against attacks that possess any background knowledge, but also prevent privacy attacks from untrusted third parties. Currently apple, Alibba, etc. companies have used the LDP model for collecting some relevant information that users set up at the default browser home page and search engine.
At present, various algorithms are used for static data in the field of data privacy protection, application scenes of real-time data streams are wide, but histogram distribution methods facing the application scenes of the real-time data streams are few, so that a privacy protection method for researching the real-time data is particularly needed.
Through search, the Chinese patent application numbers are: 2019107977157, filing date: 8, month and 27 days 2019, the invention name is: a multidimensional crowdsourcing data truth finding method based on localized differential privacy. The application can solve the problems that an enemy with any background knowledge reveals user sensitive data and cannot acquire accurate answers from noisy data sets, and meanwhile, any third party can estimate original data distribution under the condition that user sensitive information is unknown, so that the purpose of effectively obtaining accurate results in each crowdsourcing project is achieved while the privacy of user data is ensured. Although the method also adopts the localization differential privacy idea, the method only aims at crowdsourcing data and static data statistical analysis, so far, the application scenes of real-time data streams are wide, and the research on the privacy protection method of the real-time data is less.
For another example, the chinese patent application No. is: 2018105071444, filing date: 24 months and 5 months in 2018, the invention name is: a stream data differential privacy protection issuing method based on fractal dimension is disclosed. The application utilizes a sliding window technology to segment the data stream, and displays the data stream meeting the conditions in a sliding window in a static mode; then, carrying out initial clustering on the data, calculating various fractal dimensions of initial clustering results, and constructing a fractal tree; sending the data of the segmentation window in the first step to a fractal clustering module for data clustering analysis, calculating fractal dimension, performing fractal clustering on the arrived data, performing class-based statistics on fractal clustering results to form a to-be-issued packet, calculating a difference value set between the packets as an approximate packet judgment reference during approximate packet fusion, substituting for a similar packet mean value, performing noise interference on the packet after packet fusion optimization, and distributing the packet data after the noise interference; when the packet data amount reaches the sliding window size, the window is shifted forward, and the above steps are repeated to complete the final data distribution. The method adopts the combination of differential privacy and a sliding window, and has a precondition hypothesis for protecting data privacy by adopting the idea of centralized differential privacy: relying on a trusted third party. This is not true in practical applications, and limits the application of the traditional differential privacy to a certain extent.
Based on the above analysis, there is a need in the art for a data distribution method that does not rely on a trusted third party, but satisfies statistical distribution of real-time data.
Disclosure of Invention
1. Technical problem to be solved by the invention
At present, most of research methods of real-time data are based on the idea of centralized differential privacy, and although certain guarantee can be provided for the privacy of the real-time data, the problems of low availability of published data, large data publishing error caused by noise accumulation, exhaustion of privacy budget and the like exist. In view of the above problems in the prior art, the present invention provides a method for publishing a localized differential privacy data stream oriented to real-time data; the present invention combines localized differential privacy with a sliding window model to provide statistical similarity for each user locally rather than for all users on average. And after the similarity measurement between the sliding window data of the adjacent time is obtained, adding a random disturbance algorithm to the similarity measurement result. And then, a more reasonable privacy budget allocation strategy is adopted during noise adding processing, and finally, a noise adding histogram meeting privacy protection is issued, so that the privacy of a user can be ensured, and the data availability can be improved.
2. Technical scheme
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the invention discloses a real-time data-oriented localized differential privacy data stream publishing method, which comprises the following steps:
step 1, inputting an original data set D ═ D1,D2,…,DiI is more than or equal to 1 and less than or equal to N, and the privacy protection budget epsilon is determined by the initialization parameters;
step 2, calculating the correlation distance between the sliding window data at the adjacent moments according to a decision algorithm, and judging the correlation distance and the threshold value to obtain a similarity result viAdding random perturbations to the similarity measure result viTo obtain vi′;
Step 3, according to the result of the decision algorithm, if viThe value is positive, the error between the two is reduced through a greedy clustering algorithm, then the privacy budget is allocated for data noise adding, then the noise adding histogram at the current i moment is released, and otherwise, the noise adding histogram at the i-1 moment is directly released;
and 4, reasonably distributing the privacy according to the PBA privacy distribution strategy, and accordingly issuing the noise histogram.
Further, the original data set input in step 1 is statistical data, and the privacy protection budget ε is less than 1.
Further, the similarity v between the sliding window data of adjacent time instants is calculated in step 2i' to decide if a new noisy histogram needs to be issued and add random perturbations to the similarity results to get vi′。
Further, the decision algorithm in step 2 is specifically as follows:
1) calculating the correlation distance between the data in the sliding window at the adjacent time, as shown in formula (1):
Figure BDA0003581495990000031
in the formula, d (x)i,yi) For the correlation distance, x, between sliding window data at adjacent timesikFor the original k data at time i, xjkIs subjected to noise processing at time jThe latter k data;
2) according to the result of the judgment between the correlation distance and the threshold value, namely the similarity result viAdding random disturbance to the similarity result to obtain vi' as formula (2):
Figure BDA0003581495990000032
in the formula, viThe result is that the correlation distance is larger or smaller than the threshold value; vi' is the similarity result after random perturbation is added; if the correlation distance is larger than the threshold value, the similarity result vi' is assigned to be 1 according to the probability of p; if the correlation distance is smaller than the threshold value, the similarity result vi' is assigned to be 0 by the probability of p; otherwise, no processing is done on the similarity result with a probability of 1-2 × p.
Furthermore, step 2 carries out random disturbance processing on the similarity result of the sliding window data sets at adjacent moments, and the random disturbance algorithm meets the requirement of localized differential privacy.
Further, the decision algorithm satisfies w-event level privacy, and the privacy is distributed as
Figure BDA0003581495990000033
w is the length of the size of the sliding window.
Further, step 3 calculates the merging error and the non-merging error of each frequency and the adjacent frequency by formula (4); selecting the minimum error between the two to carry out interval grouping; replacing the group frequency with the average value in the group, then distributing privacy budget according to a privacy distribution strategy, and then releasing a histogram which is finally subjected to noise addition;
Figure BDA0003581495990000034
Figure BDA0003581495990000035
in the formula (4), y1Merging errors of each frequency count and adjacent frequency counts; in the formula (5), y2For each frequency count incombination errors with adjacent frequency counts; wherein D isiIs the original data at the time of i,
Figure BDA0003581495990000036
and j is data after noise is added at the moment j, w is the size of a sliding window, k is the data amount in the sliding window, n is the total number of data in the original histogram at the current moment i, j is the moment of recently issued noise-added histogram data, and epsilon is the privacy budget.
Further, the greedy clustering algorithm satisfies w-event level privacy.
Furthermore, the privacy allocation policy in step 4 is to divide epsilon in advance for w data in the windowiIn a privacy budget of (1), wherein
Figure BDA0003581495990000041
If the correlation distance between the sliding window data of the adjacent time instants is smaller than a threshold value, releasing a noise histogram of the previous time instant, and reserving the allocated privacy budget epsilon at the momenti(ii) a Otherwise, setting a parameter k to record the number of the histograms which are not subjected to noise before, and adding all privacy budgets reserved in the skipped histograms to obtain the latest epsiloni
Figure BDA0003581495990000042
The noisy data is then found and the remaining privacy budget is assigned to the histogram at that time.
3. Advantageous effects
Compared with the prior art, the technical scheme provided by the invention has the following remarkable effects:
(1) according to the method for releasing the localized differential privacy data stream oriented to the real-time data, the real data is locally disturbed, so that only a user can know the real data, and the risk of data leakage of a third party can be avoided. Random perturbation is added to the similarity measure results by calculating the correlation distance between the sliding window datasets at adjacent times. In contrast to centralized differential privacy, attacks from untrusted third parties can be avoided as well as attacks with any background knowledge.
(2) According to the method for releasing the localized differential privacy data stream facing to the real-time data, when the correlation distance between the sliding window data at the adjacent moments is larger than the threshold value, in order to reduce the statistical error between the correlation distance and the threshold value, a greedy clustering algorithm is adopted to further reduce the statistical error. Calculating the merging error and the non-merging error of each frequency number and the adjacent frequency number; selecting the minimum error between the two to carry out interval grouping; the group frequency is replaced by the average value in the group, the privacy budget is reasonably distributed according to the privacy budget distribution strategy, then the data is subjected to noise adding, and finally the noise adding histogram meeting the requirement is issued, so that the error of histogram issuing is effectively reduced, and the localized differential privacy technology theory has a wider prospect in practical application.
Drawings
FIG. 1 is a diagram of an algorithmic framework of a real-time data-oriented localized differential privacy data stream publication method of the present invention;
fig. 2 is a flow diagram of the present invention for real-time data oriented localized differential privacy processing.
Detailed Description
Most of the existing data protection methods based on localized differential privacy are concentrated on single-value attributes, but privacy leakage easily occurs after data is extracted for many times. For continuous data, Dwork proposes two different privacy protection methods: user-level privacy and event-level privacy. Event-level privacy refers to protecting users at a single specific stage in the entire data stream, but not protecting user privacy in the entire data stream; user-level privacy is the opposite, meaning that user privacy is protected by adding noise to the entire data stream, but this reduces the usability of the data. w-event level privacy was proposed to balance the drawbacks of both. when w is 0, the w-event level privacy becomes event level privacy, and when w is infinite, the w-event privacy becomes user level privacy. However, there is still a problem that: w-event level privacy is designed for centralized differential privacy and is therefore not only vulnerable to privacy attacks from untrusted third parties, but also where the data similarity calculations are subject to all users, which also consumes a large amount of privacy budget.
The method aims to solve the defects that centralized differential privacy is vulnerable and the problem that the privacy budget is excessively consumed due to the fact that the similarity statistics are obtained by adopting the average value of all users is solved. The method combines the localized differential privacy with a sliding window model, adopts the localized differential privacy to act on the local calculation similarity statistics in the decision stage, acts on each user instead of the traditional whole user, and provides a PBA privacy allocation strategy for reasonably allocating privacy budget to avoid excessive consumption of the privacy budget.
The invention improves the traditional data publishing method aiming at real-time data based on differential privacy, and achieves higher usability while protecting the privacy data.
For a further understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings.
Example 1
With reference to fig. 1, the method for publishing a localized differential privacy data stream oriented to real-time data according to this embodiment includes the steps of:
step 1, inputting an original data set D ═ D1,D2,…,DiI is more than or equal to 1 and less than or equal to N, and the privacy protection budget epsilon is determined by the initialization parameters;
first, the present embodiment selects to read a numerical data field to be published from a data source such as a database or csv, performs a preprocessing operation on the read data, performs unit division, and inputs the data frequency of each unit into the data set D, thereby completing the input of the original data set D. The input raw data set is statistical type data. The initialization parameter determines the privacy preserving budget epsilon, the value of epsilon is inversely proportional to the degree of privacy preservation and directly proportional to the data availability, i.e. a smaller epsilon indicates more noise is added to the data, and the higher the degree of protection to the data, the worse the data availability. In this embodiment, the privacy protection budget epsilon is less than 1.
Step 2, calculating the similarity v between the sliding window data at the adjacent momentsi' to decide if a new noisy histogram needs to be issued and add random perturbations to the similarity results to get vi'; namely, the decision algorithm is mainly divided into two steps:
1) calculating the correlation distance between the data in the sliding window at the adjacent time, as shown in formula (1):
Figure BDA0003581495990000051
in the formula, d (x)i,yi) Is the correlation distance (hereinafter denoted by T) between sliding window data of adjacent time instants, xikFor the original k data at time i, xjkAnd k data after noise processing at the time j.
2) According to the judgment result between the correlation distance and the threshold value, namely the similarity result, adding the random disturbance into the similarity result to obtain vi', as in formula (2):
Figure BDA0003581495990000052
the correlation distance T between the sliding window data of adjacent time instants can be obtained from formula (1), and in formula (2), viFor correlating the distance T with a threshold value T0Larger and smaller results, i.e., similarity results; vi' is the similarity result after random perturbation is added. If the correlation distance T is greater than the threshold value T0The similarity result vi' is assigned with the probability of p as 1; if the correlation distance T is less than the threshold value T0The similarity result vi' is assigned with the probability of p as 0; otherwise, no processing is done on the similarity result with a probability of 1-2 × p.
It is worth to be noted that, the Localized Differential Privacy (LDP) gives a perturbation mechanism M and a domain dom (M), and a value domain range (M), if any input pair x, x' is given and both satisfy the domain dom (M), y ∈ range (M) is output after algorithm perturbation, and the following inequality is satisfied, then the perturbation mechanism M satisfies ∈ -LDP.
Figure BDA0003581495990000061
Wherein the epsilon (privacy budget) value is inversely proportional to the degree of privacy protection and directly proportional to the data availability, i.e. the smaller the epsilon value, the better the privacy protection for the data and the worse the data availability.
As shown in equation (3): compared with the traditional similarity measurement method, the similarity measurement method has the advantages that the random disturbance algorithm is added locally, so that not only can the attack of a third party be avoided, but also the user privacy can be further ensured.
The decision algorithm of the embodiment meets w-event privacy, and the privacy is allocated as
Figure BDA0003581495990000062
The following was demonstrated:
vi: correlation distance T and threshold value T0Larger and smaller results, i.e. similarity results, vi': similarity results after adding random perturbations.
Figure BDA0003581495990000063
The same principle is that: can obtain the appropriate input ViProbability of similarity result obtained by decision algorithm when equal to 0
Figure BDA0003581495990000064
The decision algorithm of the present embodiment satisfies the two formulas
Figure BDA0003581495990000065
The sequence combination property of differential privacy is: for a slide with a dimension length of wAnd the privacy budget of the moving window through the M1 algorithm is the sum of the privacy budgets of M1.K, namely the privacy budget of the M1 algorithm in the sliding window with the size w is as follows:
Figure BDA0003581495990000066
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003581495990000071
thus, it can be verified that M1 satisfies
Figure BDA0003581495990000072
Step 3, reducing the error between the sliding window data at the adjacent moments, namely a greedy clustering algorithm:
and (3) calculating the correlation distance between the sliding window data at the adjacent moments through a formula (1), and reducing the error between the correlation distance between the sliding window data at the adjacent moments and a threshold value if the similarity vi' is 1 in a greedy clustering algorithm. The method specifically comprises the following steps: calculating a merging error and a non-merging error of each frequency number and an adjacent frequency number through a formula (4); selecting the interval grouping with the minimum error between the two; and replacing the group frequency with the average value in the group, then allocating privacy budgets according to a privacy allocation strategy, and then issuing a histogram which is finally subjected to noise addition.
Figure BDA0003581495990000073
Figure BDA0003581495990000074
In the formula (4), y1Merging errors of each frequency count and adjacent frequency counts; in the formula (5), y2For each frequency count incombination errors with adjacent frequency counts; wherein D isiIs the original data at the time of i,
Figure BDA0003581495990000075
and j is data after noise is added at the moment j, w is the size of a sliding window, k is the data amount in the sliding window, n is the total number of data in the original histogram at the current moment i, j is the moment of recently issued noise-added histogram data, and epsilon is the privacy budget. The greedy clustering algorithm in this embodiment satisfies w-event privacy, which proves as follows:
for a sliding window with the size and the length of w, the attributes are combined in parallel according to the differential privacy in the groups, the greedy clustering algorithm meets the differential privacy, and the attributes are combined according to the differential privacy sequence between the groups. Thus, it can be verified that: the greedy clustering algorithm satisfies differential privacy.
Step 4, allocating the privacy budgets:
this embodiment proposes a reasonable privacy Budget allocation policy pba (privacy Budget abstraction). The main idea of PBA is to pre-divide ε equally for w data within a windowiIn a privacy budget of (1), wherein
Figure BDA0003581495990000076
If the correlation distance T between the sliding window data of adjacent time instants is less than the threshold value T0Then a noisy histogram of the previous moment is issued, and the privacy budget epsilon given to the allocation at this moment is preservedi(ii) a Otherwise, setting a parameter k to record the number of the histograms which are not subjected to noise before, and adding all privacy budgets reserved in the skipped histograms to obtain the latest epsiloni
Figure BDA0003581495990000077
And uses it to publish the noisy histogram for this time.
The present embodiment combines localized differential privacy with a sliding window model to provide statistical similarity for each user locally rather than for all users on average. And after the similarity measurement between the sliding window data at the adjacent time is obtained, adding a random disturbance algorithm to the similarity measurement result. And then, noise adding processing is carried out, and finally, a noise adding histogram meeting privacy protection is issued, so that the user privacy can be ensured, and the data availability can be improved.
The present invention and its embodiments have been described above schematically, without limitation, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.

Claims (8)

1. A localization differential privacy data stream publishing method facing real-time data is characterized by comprising the following steps:
step 1, inputting an original data set D ═ D1,D2,…,DiI is more than or equal to 1 and less than or equal to N, and the privacy protection budget epsilon is determined by the initialization parameters;
step 2, calculating the correlation distance between the sliding window data at the adjacent moments according to a decision algorithm, and judging the correlation distance and the threshold value to obtain a similarity result viAdding random perturbations to the similarity measure result viTo obtain vi′;
Step 3, according to the result of the decision algorithm, if viThe value is positive, the error between the two is reduced through a greedy clustering algorithm, then the privacy budget is allocated for data noise addition, then a noise addition histogram at the current i moment is issued, and otherwise, the noise addition histogram at the i-1 moment is directly issued;
and 4, reasonably distributing the privacy according to the PBA privacy distribution strategy, and accordingly issuing the noise histogram.
2. The real-time data-oriented localized differential privacy data stream publishing method according to claim 1, wherein: the original data set input in the step 1 is statistical data, and the privacy protection budget epsilon is less than 1.
3. The real-time data-oriented localized differential privacy data stream publishing method according to claim 1 or 2, characterized in that: calculating the similarity v between the sliding window data of the adjacent moments in step 2i' to decide if a new noisy histogram needs to be issued and add random perturbations to the similarity results to get vi'; the decision algorithm is specifically as follows:
1) calculating the correlation distance between the data in the sliding window at the adjacent time, as shown in formula (1):
Figure FDA0003581495980000011
in the formula (1), d (x)i,yi) For the correlation distance, x, between sliding window data at adjacent timesikFor the original k data at time i, xjkK data after noise processing at the moment j;
2) according to the result of the judgment between the correlation distance and the threshold value, namely the similarity result viAdding random disturbance to the similarity result to obtain vi' as formula (2):
Figure FDA0003581495980000012
in the formula (2), viThe result is that the correlation distance is larger or smaller than the threshold value; vi' is the similarity result after random perturbation is added; if the correlation distance is larger than the threshold value, the similarity result vi' is assigned to be 1 according to the probability of p; if the correlation distance is smaller than the threshold value, the similarity result vi' is assigned to be 0 by the probability of p; otherwise, no processing is done on the similarity result with a probability of 1-2 × p.
4. The real-time data-oriented localized differential privacy data stream publishing method according to claim 3, wherein: and 2, carrying out random disturbance processing on the similarity result of the sliding window data sets at adjacent moments, wherein a random disturbance algorithm meets the requirement of localized differential privacy.
5. The real-time data oriented localized differential privacy number of claim 4The data stream issuing method is characterized in that: the decision algorithm satisfies w-event level privacy, and the privacy is distributed as
Figure FDA0003581495980000021
w is the length of the size of the sliding window.
6. The real-time data-oriented localized differential privacy data stream publishing method according to claim 5, wherein: step 3, calculating the merging error and the non-merging error of each frequency number and the adjacent frequency number through a formula (4); selecting the minimum error between the two to carry out interval grouping; replacing the group frequency with the average value in the group, then distributing privacy budget according to a privacy distribution strategy, and then releasing a histogram which is finally subjected to noise addition;
Figure FDA0003581495980000022
Figure FDA0003581495980000023
in the formula (4), y1Merging errors of each frequency count and adjacent frequency counts; in the formula (5), y2For each frequency count incombination errors with adjacent frequency counts; wherein D isiIs the original data at the time of i,
Figure FDA0003581495980000024
and j is data after noise is added at the moment j, w is the size of a sliding window, k is the data amount in the sliding window, n is the total number of data in the original histogram at the current moment i, j is the moment of recently issued noise-added histogram data, and epsilon is the privacy budget.
7. The real-time data-oriented localized differential privacy data stream publishing method according to claim 6, wherein: the greedy clustering algorithm satisfies w-event-level privacy.
8. The real-time data-oriented localized differential privacy data stream publishing method according to claim 7, wherein: step 4, the privacy allocation strategy is to divide epsilon equally for w data in the window in advanceiIn a privacy budget of (1), wherein
Figure FDA0003581495980000025
Issuing a noisy histogram of a previous moment if the correlation distance between sliding window data of adjacent moments is smaller than a threshold value, preserving the privacy budget epsilon given to the allocation at that momenti(ii) a Otherwise, setting a parameter k to record the number of the histograms which are not subjected to noise before, and adding all privacy budgets reserved in the skipped histograms to obtain the latest epsiloni
Figure FDA0003581495980000026
The noisy data is then found and the remaining privacy budget is assigned to the histogram at that time.
CN202210352928.0A 2022-04-06 2022-04-06 Real-time data-oriented localization differential privacy data stream publishing method Active CN114662152B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210352928.0A CN114662152B (en) 2022-04-06 2022-04-06 Real-time data-oriented localization differential privacy data stream publishing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210352928.0A CN114662152B (en) 2022-04-06 2022-04-06 Real-time data-oriented localization differential privacy data stream publishing method

Publications (2)

Publication Number Publication Date
CN114662152A true CN114662152A (en) 2022-06-24
CN114662152B CN114662152B (en) 2023-05-12

Family

ID=82035824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210352928.0A Active CN114662152B (en) 2022-04-06 2022-04-06 Real-time data-oriented localization differential privacy data stream publishing method

Country Status (1)

Country Link
CN (1) CN114662152B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329898A (en) * 2022-10-10 2022-11-11 国网浙江省电力有限公司杭州供电公司 Distributed machine learning method and system based on differential privacy policy

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190166100A1 (en) * 2017-11-28 2019-05-30 Sogang University Research Foundation System and method for traffic volume publication applying differential privacy
CN112131605A (en) * 2020-09-24 2020-12-25 合肥城市云数据中心股份有限公司 Differential privacy dynamic data publishing method based on mutual information correlation technology
CN112307514A (en) * 2020-11-26 2021-02-02 哈尔滨工程大学 Difference privacy greedy grouping method adopting Wasserstein distance

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190166100A1 (en) * 2017-11-28 2019-05-30 Sogang University Research Foundation System and method for traffic volume publication applying differential privacy
CN112131605A (en) * 2020-09-24 2020-12-25 合肥城市云数据中心股份有限公司 Differential privacy dynamic data publishing method based on mutual information correlation technology
CN112307514A (en) * 2020-11-26 2021-02-02 哈尔滨工程大学 Difference privacy greedy grouping method adopting Wasserstein distance

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329898A (en) * 2022-10-10 2022-11-11 国网浙江省电力有限公司杭州供电公司 Distributed machine learning method and system based on differential privacy policy

Also Published As

Publication number Publication date
CN114662152B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN107871087B (en) Personalized differential privacy protection method for high-dimensional data release in distributed environment
CN108763956B (en) Fractal dimension-based streaming data differential privacy protection publishing method
CN107862219B (en) Method for protecting privacy requirements in social network
CN110020546B (en) Privacy data grading protection method
CN110602145B (en) Track privacy protection method based on location-based service
CN114662152A (en) Real-time data-oriented localized differential privacy data stream publishing method
Li et al. A cloaking algorithm based on spatial networks for location privacy
CN114884682A (en) Crowd sensing data stream privacy protection method based on self-adaptive local differential privacy
CN115130119A (en) Local differential privacy-based utility optimization set data protection method
Dong et al. PADP-FedMeta: A personalized and adaptive differentially private federated meta learning mechanism for AIoT
Okegbile et al. Differentially Private federated multi-task learning framework for enhancing human-to-virtual connectivity in human digital twin
Zhao et al. A privacy-preserving trajectory publication method based on secure start-points and end-points
Acs et al. Probabilistic km-anonymity efficient anonymization of large set-valued datasets
Aldhyani et al. An integrated model for prediction of loading packets in network traffic
CN108111968B (en) Generalization-based location privacy protection method
Zhang et al. A novel attributes anonymity scheme in continuous query
Xu et al. IFTS: A location privacy protection method based on initial and final trajectory segments
Yang et al. SPoFC: A framework for stream data aggregation with local differential privacy
Liu et al. Fair differential privacy can mitigate the disparate impact on model accuracy
Gomez Rodriguez et al. Bridging offline and online social graph dynamics
Chen et al. Differentially private aggregated mobility data publication using moving characteristics
Yang et al. P4mobi: A probabilistic privacy-preserving framework for publishing mobility datasets
Huang et al. An Incentive-Based Differential Privacy-Preserving Truth Discovery over Streaming Data
Ning et al. Group relational privacy protection on time-constrained point of interests
CN114884688B (en) Federal anomaly detection method across multi-attribute networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant