CN114417423A - Infinite data stream real-time privacy protection method and system based on dynamic budget allocation - Google Patents

Infinite data stream real-time privacy protection method and system based on dynamic budget allocation Download PDF

Info

Publication number
CN114417423A
CN114417423A CN202210098965.3A CN202210098965A CN114417423A CN 114417423 A CN114417423 A CN 114417423A CN 202210098965 A CN202210098965 A CN 202210098965A CN 114417423 A CN114417423 A CN 114417423A
Authority
CN
China
Prior art keywords
strategy
privacy budget
budget
privacy
unbiased estimation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210098965.3A
Other languages
Chinese (zh)
Other versions
CN114417423B (en
Inventor
杨树森
任雪斌
赵鹏
石亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Cumulus Technology Co ltd
Original Assignee
Hangzhou Cumulus Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Cumulus Technology Co ltd filed Critical Hangzhou Cumulus Technology Co ltd
Priority to CN202210098965.3A priority Critical patent/CN114417423B/en
Publication of CN114417423A publication Critical patent/CN114417423A/en
Application granted granted Critical
Publication of CN114417423B publication Critical patent/CN114417423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for real-time privacy protection of an infinite data stream based on dynamic budget allocation, which comprises the following steps: setting the length of a time window and the total privacy budget, processing the length of the time window and the total privacy budget, and respectively acquiring the privacy budget selected by deviation calculation and a release strategy; the method comprises the steps of firstly distributing the current available budget, randomly disturbing original data locally by using a part of budget, then converging disturbed data by a server, estimating the deviation between the statistic of the current moment and the statistic distribution value of the previous moment, and then selecting a distribution strategy by comparing the deviation with the error generated when the statistic estimation is carried out by consuming another part of budget. The invention can provide higher-level local differential privacy protection at the user side, and simultaneously improves the utility of the issued data by balancing the real-time change and the estimation error of the data stream. The method has rich application scenes and better practical effect, and has simple and easily realized process and strong expandability.

Description

Infinite data stream real-time privacy protection method and system based on dynamic budget allocation
Technical Field
The invention belongs to the field of privacy protection, and particularly relates to an infinite data stream real-time privacy protection method and system based on dynamic budget allocation.
Background
In recent years, with the rapid development of the internet of things and the 5G technology, a large number of intelligent devices are filled in the production life of people and interact with people and the surrounding environment all the time, so that massive data are brought, and are widely collected, gathered, stored, processed and analyzed, which creates great value for social production life and greatly improves the living standard of people, such as traffic flow monitoring, personalized service recommendation and the like. However, because these smart devices are closely related to people, the generated data necessarily contains a lot of privacy information of individuals, such as GPS tracks, health monitoring data, browsing records, etc., and the random use and distribution of these data may bring about a great risk of privacy disclosure, even endanger the rights and interests of individuals and safety, and cause uneasiness and panic. Particularly, when an individual is in a perception system, sensitive data are monitored for a long time and are statistically released, compared with traditional static data, data flow has the characteristics of continuity, limitless property, real-time property, time correlation and the like, and the characteristics bring more serious challenges to privacy protection of the data flow, and the urgency and the necessity of the privacy protection in a real-time sensitive data monitoring statistical system can be seen.
With respect to the way of privacy protection, various approaches have been proposed by many researchers. The differential privacy technology, because it strictly defines the intensity of privacy protection, is simple to implement, and has become a popular privacy protection model in recent years. Under the situation that a data collector is not credible, a researcher provides a localized differential privacy technology, data of a user can be uploaded to a server after being disturbed locally, and the technology is applied to companies such as Google, apple and Samson. However, the existing localized differential privacy technology is only directed at static scenes, and for the problem of privacy protection of real-time statistical distribution of infinite data streams in a perception system, the existing related solutions are all to design a data stream real-time statistical distribution system with privacy protection in a centralized scene, and lack direct protection of original local data streams, and the original data are collected by a data collector in a centralized manner, which still has great hidden danger. Therefore, the method has important significance and value for designing the real-time local privacy protection technology for the infinite data stream, and meanwhile, the problem of paying great attention to how to improve the utility of data distribution is also very valuable.
Disclosure of Invention
The invention aims to solve the problems in the prior art, provides a method and a system for real-time privacy protection of an infinite data stream based on dynamic budget allocation, can improve the privacy protection degree of the data stream in a perception system, realizes the real-time performance and high utility of the statistical release of private data, and has strong expansibility.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
the infinite data stream real-time privacy protection method based on dynamic budget allocation comprises the following steps:
step 1: setting the length of a time window and the total privacy budget, processing the length of the time window and the total privacy budget, and respectively acquiring the privacy budget selected by deviation calculation and a release strategy;
step 2: acquiring the privacy budget selected by the issuing strategy at the current moment based on the privacy budget selected by the issuing strategy and the privacy budget saved from the moment of absorbing the previous selected disturbance strategy to the current moment;
and step 3: based on the privacy budget calculated by deviation, randomly disturbing the original data of all users to obtain the unbiased estimation of the true frequency calculated by deviation;
and 4, step 4: obtaining the unbiased estimation quantity of the deviation between the statistic of the current moment and the statistic release value of the previous adjacent moment based on the unbiased estimation of the real frequency calculated by the deviation;
and 5: based on the privacy budget selected by the strategy issued at the current moment, randomly disturbing the original data of all users to obtain unbiased estimation of the real frequency selected based on the strategy;
step 6: processing the unbiased estimation of the real frequency selected based on the strategy to obtain the mean square error of the unbiased estimation;
and 7: judging whether the unbiased estimation quantity is larger than the mean square error or not, if so, selecting a disturbance strategy, and outputting unbiased estimation of the real frequency selected based on the strategy; if not, selecting an approximate strategy, and outputting the release value of the previous adjacent moment as the approximate value of the release value of the current moment without consuming privacy budget.
The invention is further improved in that:
processing the length of the time window and the total privacy budget, and respectively acquiring the privacy budgets selected by the deviation calculation and the release strategy, specifically:
setting the length w of a time window and a total privacy budget Epsilon, equally dividing the total privacy budget into 2w parts, using the privacy budgets by a multiple of Epsilon/(2 w), selecting and allocating the privacy budgets Epsilon/(2 w) for deviation calculation and release strategy at each moment, and setting the privacy budget Epsilon for deviation calculation at the current moment tt,1Epsilon/(2 w), the privacy budget for the release policy selection is epsilont,2=ε/(2w)。
The method comprises the following steps of acquiring the privacy budget selected by a current-time release strategy based on the privacy budget selected by the release strategy and the privacy budget saved from the moment of absorbing a previous selected disturbance strategy to the current moment, and specifically comprises the following steps:
the privacy budget selected based on the issuing strategy absorbs the privacy budget saved from the moment l of selecting the perturbation strategy from the previous one to the current moment, and the privacy budget epsilon cancelled by the moment l of selecting the perturbation strategy from the previous one at the current moment tN=εl,2ε/(2w), the sum of the absorbed privacy budget and the pre-allocated budget being εA=(t-l)*ε/(2w)-εNPrivacy budget ε for policy selection at time tt,2Comprises the following steps:
εt,2=min{εA,ε/2}
wherein epsilonl,2The privacy budget at time/at which the perturbation policy was selected for the previous one.
Based on the privacy budget calculated by deviation, randomly disturbing the original data of all users to obtain the unbiased estimation of the true frequency calculated by deviation, which specifically comprises the following steps:
based on the privacy budget of deviation calculation, the local data is processed by using a self-adaptive selection random disturbance algorithmRaw data for each user
Figure BDA0003488408420000031
Into one of all the value fields of possible values of N users
Figure BDA0003488408420000032
The disturbance value
Figure BDA0003488408420000033
Sending the frequency value to a server, wherein j is 1,2, …, N, and correcting a real frequency value c according to the frequency result of the collected disturbance valuet1Unbiased estimation of
Figure BDA0003488408420000034
The unbiased estimation of the real frequency based on the deviation calculation is used for obtaining the unbiased estimation quantity of the deviation between the current time statistic and the previous adjacent time statistic release value, and specifically comprises the following steps:
server utilization
Figure BDA0003488408420000035
Calculating a true frequency ctAnd the previous adjacent time statistics release value rt-1The unbiased estimate dis of the difference between them, the difference being the squared distance.
Based on the privacy budget selected by the release strategy at the current moment, randomly disturbing the original data of all users to obtain unbiased estimation of the real frequency selected based on the strategy, which specifically comprises the following steps:
based on the privacy budget selected by the current release strategy, the self-adaptive random disturbance algorithm is used for processing the local data, and the original data of each user is processed
Figure BDA0003488408420000041
Into one of all the value fields of possible values of N users
Figure BDA0003488408420000042
The disturbance value
Figure BDA0003488408420000043
Sending the frequency value to a server, wherein j is 1,2, …, N, and correcting a real frequency value c according to the frequency result of the collected disturbance valuet2Unbiased estimation of
Figure BDA0003488408420000044
Processing the unbiased estimation of the real frequency selected based on the strategy to obtain the mean square error of the unbiased estimation, specifically:
the specific calculation formula of the mean square error err is as follows:
Figure BDA0003488408420000045
wherein d is the number of possible states of all users.
Judging whether the unbiased estimation quantity is larger than the mean square error or not, if so, selecting a disturbance strategy, and outputting unbiased estimation of the real frequency selected based on the strategy; if not, selecting an approximate strategy, outputting a previous adjacent moment release value as an approximate value of the current moment release value, and consuming no privacy budget, specifically:
when the unbiased estimation quantity is larger than the mean square error, selecting a disturbance strategy, and selecting a privacy budget epsilon based on the strategy at the current momentt,2Carrying out random disturbance on the original data to obtain a true frequency ctAn unbiased estimation of
Figure BDA0003488408420000046
And issuing the statistical issuing value at the current moment
Figure BDA0003488408420000047
Simultaneously recording the time l of the previous disturbance strategy selection equal to the current time t;
when the unbiased estimation quantity is less than or equal to the mean square error, selecting an approximate strategy and using the release value r of the previous adjacent momentt-1=rlIssuing the current time as an approximate value of the value issued at the current timeCounting the issued value rt=rt-1Without consuming privacy budget, et,2=0。
An infinite data stream real-time privacy protection system based on dynamic budget allocation, comprising:
the processing module is used for setting the length of the time window and the total privacy budget, processing the length of the time window and the total privacy budget and respectively acquiring the privacy budget selected by the deviation calculation and the release strategy;
the privacy budget acquisition module acquires the privacy budget selected by the release strategy at the current moment based on the privacy budget selected by the release strategy and the privacy budget saved from the moment of absorbing the previous selected disturbance strategy to the current moment;
the first random disturbance module carries out random disturbance on the original data of all users based on the privacy budget calculated by deviation to obtain unbiased estimation of the true frequency calculated based on the deviation;
the unbiased estimated quantity obtaining module is used for obtaining unbiased estimated quantity of deviation between the statistic of the current moment and the statistic release value of the previous adjacent moment based on unbiased estimation of real frequency calculated by deviation;
the second random disturbance module is used for carrying out random disturbance on the original data of all users based on the privacy budget selected by the release strategy at the current moment to obtain unbiased estimation of the real frequency selected based on the strategy;
a mean square error obtaining module, configured to process an unbiased estimation of the real frequency selected based on the policy, and obtain a mean square error of the unbiased estimation;
the judging module is used for judging whether the unbiased estimation quantity is larger than the mean square error or not, if so, selecting a disturbance strategy and outputting unbiased estimation of the real frequency selected based on the strategy; if not, selecting an approximate strategy, and outputting the release value of the previous adjacent moment as the approximate value of the release value of the current moment without consuming privacy budget.
Compared with the prior art, the invention has the following beneficial effects:
the method and the device have the advantages that the privacy budget selected by the deviation calculation and the release strategy is obtained by setting the length of the time window and the total privacy budget, sensitive information contained in the data stream in a certain continuous time period is effectively guaranteed, the traditional privacy protection strength is enhanced, and meanwhile, on the basis of not contacting original data, the change of the release data stream is analyzed by estimating the deviation between the statistic of the current moment and the statistic of the previous adjacent moment and the error possibly generated by consuming the privacy budget estimation statistic, so that the release strategy is adjusted, and the effectiveness of the release result is greatly improved.
Drawings
In order to more clearly explain the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
FIG. 2 is an exemplary diagram of a dynamic privacy allocation mechanism;
FIG. 3 is a graph comparing the average relative error of experiments performed with the baseline method of the present invention under different data sets with the total privacy budget ε; wherein fig. 3(a) is performed on the synthetic data set LNS and fig. 3(b) is performed on the real data set Taxi;
FIG. 4 is a graph comparing the average relative error of experiments performed with different data sets for the method of the present invention and the baseline method as a function of the sliding window length w; wherein fig. 4(a) is performed on the synthetic data set LNS and fig. 4(b) is performed on the real data set Taxi;
FIG. 5 is a graph comparing ROC curves obtained by abnormal value monitoring on a real data set Taxi according to the method and the reference method of the present invention;
fig. 6 is a block diagram of an infinite data stream real-time privacy protection system based on dynamic budget allocation according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the embodiments of the present invention, it should be noted that if the terms "upper", "lower", "horizontal", "inner", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings or the orientation or positional relationship which is usually arranged when the product of the present invention is used, the description is merely for convenience and simplicity, and the indication or suggestion that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, cannot be understood as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Furthermore, the term "horizontal", if present, does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the embodiments of the present invention, it should be further noted that unless otherwise explicitly stated or limited, the terms "disposed," "mounted," "connected," and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, the invention discloses a method for real-time privacy protection of an infinite data stream based on dynamic budget allocation, comprising the following steps:
step 1: and setting the length of the time window and the total privacy budget, processing the length of the time window and the total privacy budget, and respectively acquiring the privacy budget selected by the deviation calculation and the release strategy.
Taking a statistics aggregation real-time distribution scenario as an example, the statistics aggregation real-time distribution scenario is assumed to be that in a sensing system including N users, a central server performs real-time statistics aggregation on the same sensing target information to form a frequency distribution sequence R ═ (R ═ R1,…,rt…), wherein the state value of any user is contained in Ω ═ { ω ═ ω }12,...,ωdJ, d ═ Ω |, and rtFrequency estimates, r, containing each valuet=<rt[1],rt[2],…,rt[d]>To protect all users local data sequences
Figure BDA0003488408420000071
The privacy budget is distributed by taking a sliding time window as a whole, the length w of the time window and the total privacy budget epsilon, epsilon are set according to actual conditions and requirements, and also represent the strength of privacy protection, in order to realize the privacy protectionThe consumption of privacy budgets of different tasks at different moments is balanced, the total budget is set to be evenly distributed to w moments before the algorithm is executed, then the total budget is evenly distributed to a deviation calculation strategy and a distribution strategy to select the two tasks, namely the privacy budgets distributed to the different tasks at each moment are epsilon/(2 w), and a disturbance strategy is specified to be always selected to be distributed at the initial moment of the operation of the algorithm.
Step 2: acquiring the privacy budget selected by the issuing strategy at the current moment based on the privacy budget selected by the issuing strategy and the privacy budget saved from the moment of absorbing the previous selected disturbance strategy to the current moment;
maintaining the privacy budget ε for skew calculation at the current time t based on settings in the initializationt,1Epsilon/(2 w), the privacy budget saved from the moment l of last selection of perturbation strategy to the current moment can be absorbed for the privacy budget allocation of issuing strategy selection, and the privacy budget pre-allocated to the later moment for strategy selection task in the same size after absorbing the privacy budget is needed to ensure that the privacy budget in any sliding window does not exceed epsilon, so that the privacy budget epsilon cancelled by the moment l of last selection of perturbation strategy at the current moment tN=εl,2ε/(2w), the sum of the absorbed privacy budget and the pre-allocated budget being εA=(t-l)*ε/(2w)-εNThen the privacy budget epsilon available for the policy selection issued at time tt,2Comprises the following steps:
Figure BDA0003488408420000081
it should be noted that when εAWhen the temperature is less than or equal to 0, the temperature is increased
Figure BDA0003488408420000082
This is in fact representative of the moments when the advance privacy allocation is cancelled, i.e. the moments when no budget is available, and therefore when epsilonAWhen the value is less than or equal to 0, directly selecting an approximate strategy and issuing rt-1I.e. rt=rt-1And remember epsilont,2Is 0, the next step is not requiredAnd (5) carrying out the following steps.
And step 3: based on the privacy budget calculated by deviation, randomly disturbing the original data of all users to obtain the unbiased estimation of the true frequency calculated by deviation;
based on the privacy budget calculated by deviation, processing local data by using a self-adaptive selection random disturbance algorithm, and processing the original data of each user
Figure BDA0003488408420000083
Into one of all the value fields of possible values of N users
Figure BDA0003488408420000084
The disturbance value
Figure BDA0003488408420000085
Sending the frequency value to a server, wherein j is 1,2, …, N, and correcting a real frequency value c according to the frequency result of the collected disturbance valuet1Unbiased estimation of
Figure BDA0003488408420000086
First according to the allocated budget epsilont,1And the possible state value number d of all users, and adaptively selecting the current popular random disturbance method GRR or OUE in the local differential privacy protection, when
Figure BDA0003488408420000087
In this case, the GRR method is selected because the variance is smaller when the frequency estimation is performed, and the other way around OUE is selected, where GRR is taken as an example, and the idea is to perform the estimation on any privacy data
Figure BDA0003488408420000091
The user uploads a value to the server according to the probability of p, and the value is randomly selected from the server according to the probability of q being 1-p
Figure BDA0003488408420000092
The disturbance mechanism M is in the following specific form:
Figure BDA0003488408420000093
after collecting all the disturbance data, the server firstly counts the frequency result of the disturbance data
Figure BDA0003488408420000094
Figure BDA0003488408420000095
Based on which a true frequency c is correctedt=<ct[1],…,ct[d]>Unbiased estimation of
Figure BDA0003488408420000096
Under the GRR perturbation method,
Figure BDA0003488408420000097
the calculation formula and the variance formula thereof are as follows:
Figure BDA0003488408420000098
Figure BDA0003488408420000099
wherein k is 1,2, …, d, fkRepresents omegakThe true frequency of the frequency band,
Figure BDA00034884084200000910
and 4, step 4: and acquiring the unbiased estimation quantity of the deviation between the current time statistic and the statistical release value of the previous adjacent time based on the unbiased estimation of the real frequency calculated by the deviation.
Server utilization
Figure BDA00034884084200000911
Calculating a true frequency ctAnd the previous adjacent time statistics release value rt-1The unbiased estimate dis of the difference between them, the difference being the squared distance.
Measuring the true frequency c of the current moment by using the square distancetAnd the value r issued at the previous adjacent momentt-1=rlIs recorded as dis*
Figure BDA00034884084200000912
This value is currently not calculable and needs to be calculated in the previous step
Figure BDA00034884084200000913
Substitution ct[k]By means of
Figure BDA00034884084200000914
The correction is carried out, so that dis can be obtained under the condition of meeting the local differential privacy protection*Is calculated as follows:
Figure BDA00034884084200000915
and 5: based on the privacy budget selected by the strategy issued at the current moment, randomly disturbing the original data of all users to obtain unbiased estimation of the real frequency selected based on the strategy;
first assume that the privacy budget ε is based on the allocationl,2The random disturbance is carried out again according to the same method as the step 3 to obtain the true frequency ctAn unbiased estimation of
Figure BDA0003488408420000101
Based on the privacy budget selected by the current release strategy, the self-adaptive random disturbance algorithm is used for processing the local data, and the original data of each user is processed
Figure BDA0003488408420000102
Into one of all the value fields of possible values of N users
Figure BDA0003488408420000103
The disturbance value
Figure BDA0003488408420000104
Sending the frequency value to a server, wherein j is 1,2, …, N, and correcting a real frequency value c according to the frequency result of the collected disturbance valuet2Unbiased estimation of
Figure BDA0003488408420000105
Step 6: processing the unbiased estimation of the real frequency selected based on the strategy to obtain the mean square error of the unbiased estimation;
estimated mean square error of
Figure BDA0003488408420000106
Further, based on the GRR disturbance method, the specific calculation formula of err is
Figure BDA0003488408420000107
And 7: judging whether the unbiased estimation quantity is larger than the mean square error or not, if so, selecting a disturbance strategy, and outputting unbiased estimation of the real frequency selected based on the strategy; if not, selecting an approximate strategy, and outputting the release value of the previous adjacent moment as the approximate value of the release value of the current moment without consuming privacy budget.
When the unbiased estimation quantity is larger than the mean square error, selecting a disturbance strategy, and selecting a privacy budget epsilon based on the strategy at the current momentt,2Carrying out random disturbance on the original data to obtain a true frequency ctAn unbiased estimation of
Figure BDA0003488408420000108
And issuing the statistical issuing value at the current moment
Figure BDA0003488408420000109
Simultaneously recording the time l of the previous disturbance strategy selection equal to the current time t;
when the unbiased estimation quantity is less than or equal to the mean square error, selecting an approximate strategy and using the release value r of the previous adjacent momentt-1=rlAs an approximate value of the current-time distribution value, distributing the current-time statistical distribution value rt=rt-1Without consuming privacy budget, et,2=0。
Referring to fig. 2, according to the infinite data stream local privacy protection algorithm based on dynamic budget allocation, a sliding time window is used to dynamically allocate privacy budgets for the whole, and a specific process of dynamic privacy budget allocation is described as an example. Setting w to 3 and epsilon to 3, dividing the total budget into 2w parts, and dividing the total budget into two tasks at each moment, wherein it can be seen that the disturbance strategy is selected at the first two moments, so that the budget allocated in advance does not change, and when t is 3, the approximate distribution strategy is selected, so that the budget allocated in advance to the distribution strategy selection is selected in advance
Figure BDA0003488408420000111
Is saved, and has no real consumption, epsilon3,2By 0, and t 4, the budget allocated to the selection of the release policy becomes
Figure BDA0003488408420000112
This absorbs the budget saved in the previous moment, but after judgment, the approximate strategy is selected, and the budget is not consumed. When t is 5, a perturbation strategy is chosen which absorbs the budget at times 3 and 4, so that ε5,21.5. Since this moment consumes two moments budgets absorbed, the next two moments will directly select an approximation strategy, which aims to ensure that the total consumed privacy budget within any sliding window does not exceed epsilon during the whole process, as can be clearly seen in fig. 2.
Referring to fig. 3(a) and 3(b), frequency statistics experiments are performed on the method (LBA) and the reference method (LBU) of the present invention in the synthesized data set LNS and the real data set Taxi, respectively, and the release statistics sequence and the true publication statistics sequence obtained by the two methods are compared with each otherAnd (3) plotting the variation of the average relative error between the real statistical sequences with the total privacy budget epsilon. Wherein the synthetic data set LNS is passed through ct=ct-1The sum Q of + N (0, Q) is 500, the user size N of the dataset is 100000, the range size d of the user value is 2, and the possible total T is 800 time points, while the real dataset Taxi has N44240, d is 5, and the total T is 886 time points. In the reference method (LBU), after random disturbance is carried out by using the privacy budget of epsilon/w at each moment, the disturbance data are gathered to calculate frequency estimation results. The sliding window size w of the experiment of the two data sets is 20, and the variation range of epsilon is 0.5-2.5. It can be seen from fig. 3 that the average relative errors of the two methods are both reduced with the increase of the privacy budget, which is mainly caused by the random perturbation method for implementing the local differential privacy protection, but the method of the present invention has lower average relative errors under different privacy budgets, and the advantage on the Taxi data set is more obvious, and the smaller the privacy budget is, the greater the advantage of the method of the present invention is compared with the benchmark method, which also indicates that the smaller the budget is, the greater the error of frequency estimation is, and at this time, the number of times of using the perturbation strategy should be reduced to ensure a certain utility.
Referring to fig. 4(a) and fig. 4(b), frequency statistics experiments are performed on the method (LBA) and the reference method (LBU) in the synthesized data set LNS and the real data set Taxi, respectively, and the result graphs of the change of the average relative error between the release statistical sequence and the real statistical sequence obtained by the two methods along with the size w of the sliding window are compared. The experiment parameters are set to have a total privacy budget epsilon of 2.0, and the variation range of the size w of the sliding window is 10-50. It can be seen from fig. 4 that the average relative error of both methods increases with the increase of the sliding window size, mainly because the larger w is, the smaller the privacy budget divided at each time is, and the error of natural estimation increases, but the method of the present invention has lower average relative error under different sliding window sizes, and the advantage on the Taxi data set is more obvious, and the larger w is, the greater the advantage of the method of the present invention is compared with the reference method, which also indicates that the number of times of using the perturbation strategy should be reduced to ensure the utility under the condition of smaller budget.
Referring to fig. 5, the ROC curve results obtained by comparing the method (LBA) of the present invention and the reference method (LBU) of the present invention in the real data set Taxi are shown. The abnormal value detection can also be called event monitoring, and here refers to fixing a threshold, and the frequency estimation sequence obtained by the two methods is used for judging whether the real frequency sequence exceeds the threshold, and checking the judgment correctness of the frequency estimation sequence. The experiment was performed by selecting one state value in the Taxi dataset user value field, with the experiment parameters setting the total privacy budget ∈ 1.0, the sliding window size w ═ 30, and the threshold value set to 0.75 × (max (c) -min (c)) + min (c). As can be seen from FIG. 5, the method of the present invention has higher accuracy, and the AUC value is 0.979 greater than 0.898 of LBU, which indicates that the method of the present invention not only has smaller error, but also retains more information embedded in the deep layer of the original sequence.
Referring to fig. 6, the present invention discloses an infinite data stream real-time privacy protection system based on dynamic budget allocation, which includes:
the processing module is used for setting the length of the time window and the total privacy budget, processing the length of the time window and the total privacy budget and respectively acquiring the privacy budget selected by the deviation calculation and the release strategy;
the privacy budget acquisition module acquires the privacy budget selected by the release strategy at the current moment based on the privacy budget selected by the release strategy and the privacy budget saved from the moment of absorbing the previous selected disturbance strategy to the current moment;
the first random disturbance module carries out random disturbance on the original data of all users based on the privacy budget calculated by deviation to obtain unbiased estimation of the true frequency calculated based on the deviation;
the unbiased estimated quantity obtaining module is used for obtaining unbiased estimated quantity of deviation between the statistic of the current moment and the statistic release value of the previous adjacent moment based on unbiased estimation of real frequency calculated by deviation;
the second random disturbance module is used for carrying out random disturbance on the original data of all users based on the privacy budget selected by the release strategy at the current moment to obtain unbiased estimation of the real frequency selected based on the strategy;
a mean square error obtaining module, configured to process an unbiased estimation of the real frequency selected based on the policy, and obtain a mean square error of the unbiased estimation;
the judging module is used for judging whether the unbiased estimation quantity is larger than the mean square error or not, if so, selecting a disturbance strategy and outputting unbiased estimation of the real frequency selected based on the strategy; if not, selecting an approximate strategy, and outputting the release value of the previous adjacent moment as the approximate value of the release value of the current moment without consuming privacy budget.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. The method for protecting the real-time privacy of the infinite data stream based on dynamic budget allocation is characterized by comprising the following steps:
step 1: setting the length of a time window and the total privacy budget, processing the length of the time window and the total privacy budget, and respectively acquiring the privacy budget selected by deviation calculation and a release strategy;
step 2: acquiring the privacy budget selected by the issuing strategy at the current moment based on the privacy budget selected by the issuing strategy and the privacy budget saved from the moment of absorbing the previous selected disturbance strategy to the current moment;
and step 3: based on the privacy budget calculated by deviation, randomly disturbing the original data of all users to obtain the unbiased estimation of the true frequency calculated by deviation;
and 4, step 4: obtaining the unbiased estimation quantity of the deviation between the statistic of the current moment and the statistic release value of the previous adjacent moment based on the unbiased estimation of the real frequency calculated by the deviation;
and 5: based on the privacy budget selected by the strategy issued at the current moment, randomly disturbing the original data of all users to obtain unbiased estimation of the real frequency selected based on the strategy;
step 6: processing the unbiased estimation of the real frequency selected based on the strategy to obtain the mean square error of the unbiased estimation;
and 7: judging whether the unbiased estimation quantity is larger than the mean square error or not, if so, selecting a disturbance strategy, and outputting unbiased estimation of the real frequency selected based on the strategy; if not, selecting an approximate strategy, and outputting the release value of the previous adjacent moment as the approximate value of the release value of the current moment without consuming privacy budget.
2. The method according to claim 1, wherein the length of the time window and the total privacy budget are processed to obtain privacy budgets selected by the skew calculation and the release policy, respectively, specifically:
setting the length w of a time window and a total privacy budget Epsilon, equally dividing the total privacy budget into 2w parts, using the privacy budgets by a multiple of Epsilon/(2 w), selecting and allocating the privacy budgets Epsilon/(2 w) for deviation calculation and release strategy at each moment, and setting the privacy budget Epsilon for deviation calculation at the current moment tt,1Epsilon/(2 w), the privacy budget for the release policy selection is epsilont,2=ε/(2w)。
3. The method according to claim 2, wherein the privacy budget selected based on the release policy and the privacy budget saved from the moment of absorbing the previous selected perturbation policy to the current moment are obtained by acquiring the privacy budget selected by the release policy at the current moment, specifically:
the privacy budget selected on the basis of the issuing policy absorbs the privacy budget saved from the moment l when the perturbation policy was selected from the previous one to the current moment tThe privacy budget epsilon cancelled at the previous moment l of selecting the perturbation strategyN=εl,2ε/(2w), the sum of the absorbed privacy budget and the pre-allocated budget being εA=(t-l)*ε/(2w)-εNPrivacy budget ε for policy selection at time tt,2Comprises the following steps:
εt,2=min{εA,ε/2}
wherein epsilonl,2The privacy budget at time/at which the perturbation policy was selected for the previous one.
4. The method according to claim 3, wherein the privacy budget based on the deviation calculation randomly perturbs original data of all users to obtain an unbiased estimation of the true frequency based on the deviation calculation, specifically:
based on the privacy budget calculated by deviation, processing local data by using a self-adaptive selection random disturbance algorithm, and processing the original data of each user
Figure FDA0003488408410000021
Into one of all the value fields of possible values of N users
Figure FDA0003488408410000022
The disturbance value
Figure FDA0003488408410000023
Sending the frequency count to a server, wherein j is 1,2t1Unbiased estimation of
Figure FDA0003488408410000024
5. The method according to claim 4, wherein the unbiased estimation of the true frequency based on the deviation calculation obtains an unbiased estimation amount of the deviation between the statistic of the current time and the statistical distribution value of the previous adjacent time, specifically:
server utilization
Figure FDA0003488408410000025
Calculating a true frequency ctAnd the previous adjacent time statistics release value rt-1The unbiased estimate dis of the difference between them, the difference being the squared distance.
6. The method according to claim 5, wherein the privacy budget selected based on the current release policy randomly perturbs the original data of all users to obtain an unbiased estimation of the true frequency selected based on the policy, specifically:
based on the privacy budget selected by the current release strategy, the self-adaptive random disturbance algorithm is used for processing the local data, and the original data of each user is processed
Figure FDA0003488408410000026
Into one of all the value fields of possible values of N users
Figure FDA0003488408410000027
The disturbance value
Figure FDA0003488408410000028
Sending the frequency count to a server, wherein j is 1,2t2Unbiased estimation of
Figure FDA0003488408410000029
7. The method according to claim 6, wherein the unbiased estimation of the real frequency based on the policy selection is processed to obtain a mean square error of the unbiased estimation, specifically:
the specific calculation formula of the mean square error err is as follows:
Figure FDA0003488408410000031
wherein d is the number of possible states of all users.
8. The method according to claim 7, wherein the determining whether the unbiased estimation amount is larger than the mean square error, if so, selecting a perturbation strategy, and outputting an unbiased estimation of the true frequency selected based on the strategy; if not, selecting an approximate strategy, outputting a previous adjacent moment release value as an approximate value of the current moment release value, and consuming no privacy budget, specifically:
when the unbiased estimation quantity is larger than the mean square error, selecting a disturbance strategy, and selecting a privacy budget epsilon based on the strategy at the current momentt,2Carrying out random disturbance on the original data to obtain a true frequency ctAn unbiased estimation of
Figure FDA0003488408410000032
And issuing the statistical issuing value at the current moment
Figure FDA0003488408410000033
Simultaneously recording the time l of the previous disturbance strategy selection equal to the current time t;
when the unbiased estimation quantity is less than or equal to the mean square error, selecting an approximate strategy and using the release value r of the previous adjacent momentt-1=rlAs an approximate value of the current-time distribution value, distributing the current-time statistical distribution value rt=rt-1Without consuming privacy budget, et,2=0。
9. An infinite data stream real-time privacy protection system based on dynamic budget allocation, comprising:
the processing module is used for setting the length of the time window and the total privacy budget, processing the length of the time window and the total privacy budget and respectively acquiring the privacy budget selected by the deviation calculation and the release strategy;
the privacy budget acquisition module acquires the privacy budget selected by the release strategy at the current moment based on the privacy budget selected by the release strategy and the privacy budget saved from the moment of absorbing the previous selected disturbance strategy to the current moment;
the first random disturbance module carries out random disturbance on the original data of all users based on the privacy budget calculated by deviation to obtain unbiased estimation of the true frequency calculated based on the deviation;
the unbiased estimated quantity obtaining module is used for obtaining unbiased estimated quantity of deviation between the statistic of the current moment and the statistic release value of the previous adjacent moment based on unbiased estimation of real frequency calculated by deviation;
the second random disturbance module is used for carrying out random disturbance on the original data of all users based on the privacy budget selected by the release strategy at the current moment to obtain unbiased estimation of the real frequency selected based on the strategy;
a mean square error obtaining module, configured to process an unbiased estimation of the real frequency selected based on the policy, and obtain a mean square error of the unbiased estimation;
the judging module is used for judging whether the unbiased estimation quantity is larger than the mean square error or not, if so, selecting a disturbance strategy and outputting unbiased estimation of the real frequency selected based on the strategy; if not, selecting an approximate strategy, and outputting the release value of the previous adjacent moment as the approximate value of the release value of the current moment without consuming privacy budget.
CN202210098965.3A 2022-01-25 2022-01-25 Real-time privacy protection method and system for unlimited data stream based on dynamic budget allocation Active CN114417423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210098965.3A CN114417423B (en) 2022-01-25 2022-01-25 Real-time privacy protection method and system for unlimited data stream based on dynamic budget allocation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210098965.3A CN114417423B (en) 2022-01-25 2022-01-25 Real-time privacy protection method and system for unlimited data stream based on dynamic budget allocation

Publications (2)

Publication Number Publication Date
CN114417423A true CN114417423A (en) 2022-04-29
CN114417423B CN114417423B (en) 2024-05-17

Family

ID=81279782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210098965.3A Active CN114417423B (en) 2022-01-25 2022-01-25 Real-time privacy protection method and system for unlimited data stream based on dynamic budget allocation

Country Status (1)

Country Link
CN (1) CN114417423B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563616A (en) * 2022-08-19 2023-01-03 广州大学 Defense method for localized differential privacy data virus attack

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101935528B1 (en) * 2017-11-28 2019-01-04 서강대학교 산학협력단 System and method for traffic volume publication applying differential privacy
CN109450889A (en) * 2018-11-02 2019-03-08 西安交通大学 The secret protection dissemination method of data flow is converged in a kind of Internet of Things
CN110378142A (en) * 2019-04-16 2019-10-25 江苏慧中数据科技有限公司 Based on Liapunov optimization to the method for infinite data stream difference secret protection
CN112231749A (en) * 2020-10-14 2021-01-15 西安交通大学 Distributed single-dimensional time sequence data real-time privacy protection publishing method with consistency
US20210374605A1 (en) * 2020-05-28 2021-12-02 Samsung Electronics Company, Ltd. System and Method for Federated Learning with Local Differential Privacy

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101935528B1 (en) * 2017-11-28 2019-01-04 서강대학교 산학협력단 System and method for traffic volume publication applying differential privacy
CN109450889A (en) * 2018-11-02 2019-03-08 西安交通大学 The secret protection dissemination method of data flow is converged in a kind of Internet of Things
CN110378142A (en) * 2019-04-16 2019-10-25 江苏慧中数据科技有限公司 Based on Liapunov optimization to the method for infinite data stream difference secret protection
US20210374605A1 (en) * 2020-05-28 2021-12-02 Samsung Electronics Company, Ltd. System and Method for Federated Learning with Local Differential Privacy
CN112231749A (en) * 2020-10-14 2021-01-15 西安交通大学 Distributed single-dimensional time sequence data real-time privacy protection publishing method with consistency

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨庚;夏春婷;白云璐;: "面向实时数据流的差分隐私直方图发布技术", 南京邮电大学学报(自然科学版), no. 02, 24 May 2018 (2018-05-24) *
毛典辉;李子沁;蔡强;薛子育;: "基于DCGAN反馈的深度差分隐私保护方法", 北京工业大学学报, no. 06, 24 April 2018 (2018-04-24) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563616A (en) * 2022-08-19 2023-01-03 广州大学 Defense method for localized differential privacy data virus attack
CN115563616B (en) * 2022-08-19 2024-04-16 广州大学 Defense method for localized differential privacy data poisoning attack

Also Published As

Publication number Publication date
CN114417423B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
Abdulkareem et al. A review of fog computing and machine learning: concepts, applications, challenges, and open issues
Atat et al. Big data meet cyber-physical systems: A panoramic survey
JP6679673B2 (en) Limitations of customer-oriented networks in distributed systems
Khatari et al. Multi-criteria evaluation and benchmarking for active queue management methods: Open issues, challenges and recommended pathway solutions
US8925037B2 (en) Systems and methods for enforcing data-loss-prevention policies using mobile sensors
CN105376260B (en) A kind of exception flow of network monitoring system based on density peaks cluster
US20130030761A1 (en) Statistically-based anomaly detection in utility clouds
CN103970752B (en) Independent access person&#39;s quantity survey (surveying) method and system
CN106229003B (en) The method and device of radiating fan rotation speed in a kind of adjustment storage device
CN114417423A (en) Infinite data stream real-time privacy protection method and system based on dynamic budget allocation
Parate et al. Leveraging graphical models to improve accuracy and reduce privacy risks of mobile sensing
CN115130119B (en) Utility optimization set data protection method based on local differential privacy
CN112231749A (en) Distributed single-dimensional time sequence data real-time privacy protection publishing method with consistency
Khan et al. DVAEGMM: Dual variational autoencoder with gaussian mixture model for anomaly detection on attributed networks
EP3759613A1 (en) Sensor data based query results
Yan et al. Dynamic release of big location data based on adaptive sampling and differential privacy
CN109740091A (en) A kind of forecasting system and method for the user network behavior of Behavior-based control cognition
CN109450889B (en) Privacy protection release method for converged data streams in Internet of things
Keally et al. A learning-based approach to confident event detection in heterogeneous sensor networks
WO2014065804A1 (en) Event correlation
CN114662152B (en) Real-time data-oriented localization differential privacy data stream publishing method
CN110378142A (en) Based on Liapunov optimization to the method for infinite data stream difference secret protection
Gao et al. Privacy-Preserving for Dynamic Real-Time Published Data Streams Based on Local Differential Privacy
Yang et al. SPoFC: A framework for stream data aggregation with local differential privacy
Huang et al. An adaptive dummy-based mechanism to protect location privacy in smart health care system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant