CN114417423A

CN114417423A - Infinite data stream real-time privacy protection method and system based on dynamic budget allocation

Info

Publication number: CN114417423A
Application number: CN202210098965.3A
Authority: CN
Inventors: 杨树森; 任雪斌; 赵鹏; 石亮
Original assignee: Hangzhou Cumulus Technology Co ltd
Current assignee: Hangzhou Cumulus Technology Co ltd
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-04-29
Anticipated expiration: 2042-01-25
Also published as: CN114417423B

Abstract

The invention discloses a method and a system for real-time privacy protection of an infinite data stream based on dynamic budget allocation, which comprises the following steps: setting the length of a time window and the total privacy budget, processing the length of the time window and the total privacy budget, and respectively acquiring the privacy budget selected by deviation calculation and a release strategy; the method comprises the steps of firstly distributing the current available budget, randomly disturbing original data locally by using a part of budget, then converging disturbed data by a server, estimating the deviation between the statistic of the current moment and the statistic distribution value of the previous moment, and then selecting a distribution strategy by comparing the deviation with the error generated when the statistic estimation is carried out by consuming another part of budget. The invention can provide higher-level local differential privacy protection at the user side, and simultaneously improves the utility of the issued data by balancing the real-time change and the estimation error of the data stream. The method has rich application scenes and better practical effect, and has simple and easily realized process and strong expandability.

Description

Infinite data stream real-time privacy protection method and system based on dynamic budget allocation

Technical Field

The invention belongs to the field of privacy protection, and particularly relates to an infinite data stream real-time privacy protection method and system based on dynamic budget allocation.

Background

In recent years, with the rapid development of the internet of things and the 5G technology, a large number of intelligent devices are filled in the production life of people and interact with people and the surrounding environment all the time, so that massive data are brought, and are widely collected, gathered, stored, processed and analyzed, which creates great value for social production life and greatly improves the living standard of people, such as traffic flow monitoring, personalized service recommendation and the like. However, because these smart devices are closely related to people, the generated data necessarily contains a lot of privacy information of individuals, such as GPS tracks, health monitoring data, browsing records, etc., and the random use and distribution of these data may bring about a great risk of privacy disclosure, even endanger the rights and interests of individuals and safety, and cause uneasiness and panic. Particularly, when an individual is in a perception system, sensitive data are monitored for a long time and are statistically released, compared with traditional static data, data flow has the characteristics of continuity, limitless property, real-time property, time correlation and the like, and the characteristics bring more serious challenges to privacy protection of the data flow, and the urgency and the necessity of the privacy protection in a real-time sensitive data monitoring statistical system can be seen.

With respect to the way of privacy protection, various approaches have been proposed by many researchers. The differential privacy technology, because it strictly defines the intensity of privacy protection, is simple to implement, and has become a popular privacy protection model in recent years. Under the situation that a data collector is not credible, a researcher provides a localized differential privacy technology, data of a user can be uploaded to a server after being disturbed locally, and the technology is applied to companies such as Google, apple and Samson. However, the existing localized differential privacy technology is only directed at static scenes, and for the problem of privacy protection of real-time statistical distribution of infinite data streams in a perception system, the existing related solutions are all to design a data stream real-time statistical distribution system with privacy protection in a centralized scene, and lack direct protection of original local data streams, and the original data are collected by a data collector in a centralized manner, which still has great hidden danger. Therefore, the method has important significance and value for designing the real-time local privacy protection technology for the infinite data stream, and meanwhile, the problem of paying great attention to how to improve the utility of data distribution is also very valuable.

Disclosure of Invention

The invention aims to solve the problems in the prior art, provides a method and a system for real-time privacy protection of an infinite data stream based on dynamic budget allocation, can improve the privacy protection degree of the data stream in a perception system, realizes the real-time performance and high utility of the statistical release of private data, and has strong expansibility.

In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:

the infinite data stream real-time privacy protection method based on dynamic budget allocation comprises the following steps:

step 1: setting the length of a time window and the total privacy budget, processing the length of the time window and the total privacy budget, and respectively acquiring the privacy budget selected by deviation calculation and a release strategy;

step 2: acquiring the privacy budget selected by the issuing strategy at the current moment based on the privacy budget selected by the issuing strategy and the privacy budget saved from the moment of absorbing the previous selected disturbance strategy to the current moment;

and step 3: based on the privacy budget calculated by deviation, randomly disturbing the original data of all users to obtain the unbiased estimation of the true frequency calculated by deviation;

and 4, step 4: obtaining the unbiased estimation quantity of the deviation between the statistic of the current moment and the statistic release value of the previous adjacent moment based on the unbiased estimation of the real frequency calculated by the deviation;

and 5: based on the privacy budget selected by the strategy issued at the current moment, randomly disturbing the original data of all users to obtain unbiased estimation of the real frequency selected based on the strategy;

step 6: processing the unbiased estimation of the real frequency selected based on the strategy to obtain the mean square error of the unbiased estimation;

and 7: judging whether the unbiased estimation quantity is larger than the mean square error or not, if so, selecting a disturbance strategy, and outputting unbiased estimation of the real frequency selected based on the strategy; if not, selecting an approximate strategy, and outputting the release value of the previous adjacent moment as the approximate value of the release value of the current moment without consuming privacy budget.

The invention is further improved in that:

processing the length of the time window and the total privacy budget, and respectively acquiring the privacy budgets selected by the deviation calculation and the release strategy, specifically:

setting the length w of a time window and a total privacy budget Epsilon, equally dividing the total privacy budget into 2w parts, using the privacy budgets by a multiple of Epsilon/(2 w), selecting and allocating the privacy budgets Epsilon/(2 w) for deviation calculation and release strategy at each moment, and setting the privacy budget Epsilon for deviation calculation at the current moment t_t,1Epsilon/(2 w), the privacy budget for the release policy selection is epsilon_t,2＝ε/(2w)。

The method comprises the following steps of acquiring the privacy budget selected by a current-time release strategy based on the privacy budget selected by the release strategy and the privacy budget saved from the moment of absorbing a previous selected disturbance strategy to the current moment, and specifically comprises the following steps:

the privacy budget selected based on the issuing strategy absorbs the privacy budget saved from the moment l of selecting the perturbation strategy from the previous one to the current moment, and the privacy budget epsilon cancelled by the moment l of selecting the perturbation strategy from the previous one at the current moment t_N＝ε_l,2ε/(2w), the sum of the absorbed privacy budget and the pre-allocated budget being ε_A＝(t-l)*ε/(2w)-ε_NPrivacy budget ε for policy selection at time t_t,2Comprises the following steps:

ε_t,2＝min{ε_A,ε/2}

wherein epsilon_l,2The privacy budget at time/at which the perturbation policy was selected for the previous one.

Based on the privacy budget calculated by deviation, randomly disturbing the original data of all users to obtain the unbiased estimation of the true frequency calculated by deviation, which specifically comprises the following steps:

based on the privacy budget of deviation calculation, the local data is processed by using a self-adaptive selection random disturbance algorithmRaw data for each user

Into one of all the value fields of possible values of N users

The disturbance value

Sending the frequency value to a server, wherein j is 1,2, …, N, and correcting a real frequency value c according to the frequency result of the collected disturbance value_t1Unbiased estimation of

The unbiased estimation of the real frequency based on the deviation calculation is used for obtaining the unbiased estimation quantity of the deviation between the current time statistic and the previous adjacent time statistic release value, and specifically comprises the following steps:

server utilization

Calculating a true frequency c_tAnd the previous adjacent time statistics release value r_t-1The unbiased estimate dis of the difference between them, the difference being the squared distance.

Based on the privacy budget selected by the release strategy at the current moment, randomly disturbing the original data of all users to obtain unbiased estimation of the real frequency selected based on the strategy, which specifically comprises the following steps:

based on the privacy budget selected by the current release strategy, the self-adaptive random disturbance algorithm is used for processing the local data, and the original data of each user is processed

Into one of all the value fields of possible values of N users

The disturbance value

Sending the frequency value to a server, wherein j is 1,2, …, N, and correcting a real frequency value c according to the frequency result of the collected disturbance value_t2Unbiased estimation of

Processing the unbiased estimation of the real frequency selected based on the strategy to obtain the mean square error of the unbiased estimation, specifically:

the specific calculation formula of the mean square error err is as follows:

wherein d is the number of possible states of all users.

Judging whether the unbiased estimation quantity is larger than the mean square error or not, if so, selecting a disturbance strategy, and outputting unbiased estimation of the real frequency selected based on the strategy; if not, selecting an approximate strategy, outputting a previous adjacent moment release value as an approximate value of the current moment release value, and consuming no privacy budget, specifically:

when the unbiased estimation quantity is larger than the mean square error, selecting a disturbance strategy, and selecting a privacy budget epsilon based on the strategy at the current moment_t,2Carrying out random disturbance on the original data to obtain a true frequency c_tAn unbiased estimation of

And issuing the statistical issuing value at the current moment

Simultaneously recording the time l of the previous disturbance strategy selection equal to the current time t;

when the unbiased estimation quantity is less than or equal to the mean square error, selecting an approximate strategy and using the release value r of the previous adjacent moment_t-1＝r_lIssuing the current time as an approximate value of the value issued at the current timeCounting the issued value r_t＝r_t-1Without consuming privacy budget, e_t,2＝0。

An infinite data stream real-time privacy protection system based on dynamic budget allocation, comprising:

the processing module is used for setting the length of the time window and the total privacy budget, processing the length of the time window and the total privacy budget and respectively acquiring the privacy budget selected by the deviation calculation and the release strategy;

the privacy budget acquisition module acquires the privacy budget selected by the release strategy at the current moment based on the privacy budget selected by the release strategy and the privacy budget saved from the moment of absorbing the previous selected disturbance strategy to the current moment;

the first random disturbance module carries out random disturbance on the original data of all users based on the privacy budget calculated by deviation to obtain unbiased estimation of the true frequency calculated based on the deviation;

the unbiased estimated quantity obtaining module is used for obtaining unbiased estimated quantity of deviation between the statistic of the current moment and the statistic release value of the previous adjacent moment based on unbiased estimation of real frequency calculated by deviation;

the second random disturbance module is used for carrying out random disturbance on the original data of all users based on the privacy budget selected by the release strategy at the current moment to obtain unbiased estimation of the real frequency selected based on the strategy;

a mean square error obtaining module, configured to process an unbiased estimation of the real frequency selected based on the policy, and obtain a mean square error of the unbiased estimation;

the judging module is used for judging whether the unbiased estimation quantity is larger than the mean square error or not, if so, selecting a disturbance strategy and outputting unbiased estimation of the real frequency selected based on the strategy; if not, selecting an approximate strategy, and outputting the release value of the previous adjacent moment as the approximate value of the release value of the current moment without consuming privacy budget.

Compared with the prior art, the invention has the following beneficial effects:

the method and the device have the advantages that the privacy budget selected by the deviation calculation and the release strategy is obtained by setting the length of the time window and the total privacy budget, sensitive information contained in the data stream in a certain continuous time period is effectively guaranteed, the traditional privacy protection strength is enhanced, and meanwhile, on the basis of not contacting original data, the change of the release data stream is analyzed by estimating the deviation between the statistic of the current moment and the statistic of the previous adjacent moment and the error possibly generated by consuming the privacy budget estimation statistic, so that the release strategy is adjusted, and the effectiveness of the release result is greatly improved.

Drawings

In order to more clearly explain the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flow chart of a method of an embodiment of the present invention;

FIG. 2 is an exemplary diagram of a dynamic privacy allocation mechanism;

FIG. 3 is a graph comparing the average relative error of experiments performed with the baseline method of the present invention under different data sets with the total privacy budget ε; wherein fig. 3(a) is performed on the synthetic data set LNS and fig. 3(b) is performed on the real data set Taxi;

FIG. 4 is a graph comparing the average relative error of experiments performed with different data sets for the method of the present invention and the baseline method as a function of the sliding window length w; wherein fig. 4(a) is performed on the synthetic data set LNS and fig. 4(b) is performed on the real data set Taxi;

FIG. 5 is a graph comparing ROC curves obtained by abnormal value monitoring on a real data set Taxi according to the method and the reference method of the present invention;

fig. 6 is a block diagram of an infinite data stream real-time privacy protection system based on dynamic budget allocation according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the embodiments of the present invention, it should be noted that if the terms "upper", "lower", "horizontal", "inner", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings or the orientation or positional relationship which is usually arranged when the product of the present invention is used, the description is merely for convenience and simplicity, and the indication or suggestion that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, cannot be understood as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.

Furthermore, the term "horizontal", if present, does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the embodiments of the present invention, it should be further noted that unless otherwise explicitly stated or limited, the terms "disposed," "mounted," "connected," and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

The invention is described in further detail below with reference to the accompanying drawings:

referring to fig. 1, the invention discloses a method for real-time privacy protection of an infinite data stream based on dynamic budget allocation, comprising the following steps:

step 1: and setting the length of the time window and the total privacy budget, processing the length of the time window and the total privacy budget, and respectively acquiring the privacy budget selected by the deviation calculation and the release strategy.

Taking a statistics aggregation real-time distribution scenario as an example, the statistics aggregation real-time distribution scenario is assumed to be that in a sensing system including N users, a central server performs real-time statistics aggregation on the same sensing target information to form a frequency distribution sequence R ═ (R ═ R₁,…,r_t…), wherein the state value of any user is contained in Ω ═ { ω ═ ω }₁,ω₂,...,ω_dJ, d ═ Ω |, and r_tFrequency estimates, r, containing each value_t＝<r_t[1],r_t[2],…,r_t[d]>To protect all users local data sequences

The privacy budget is distributed by taking a sliding time window as a whole, the length w of the time window and the total privacy budget epsilon, epsilon are set according to actual conditions and requirements, and also represent the strength of privacy protection, in order to realize the privacy protectionThe consumption of privacy budgets of different tasks at different moments is balanced, the total budget is set to be evenly distributed to w moments before the algorithm is executed, then the total budget is evenly distributed to a deviation calculation strategy and a distribution strategy to select the two tasks, namely the privacy budgets distributed to the different tasks at each moment are epsilon/(2 w), and a disturbance strategy is specified to be always selected to be distributed at the initial moment of the operation of the algorithm.

maintaining the privacy budget ε for skew calculation at the current time t based on settings in the initialization_t,1Epsilon/(2 w), the privacy budget saved from the moment l of last selection of perturbation strategy to the current moment can be absorbed for the privacy budget allocation of issuing strategy selection, and the privacy budget pre-allocated to the later moment for strategy selection task in the same size after absorbing the privacy budget is needed to ensure that the privacy budget in any sliding window does not exceed epsilon, so that the privacy budget epsilon cancelled by the moment l of last selection of perturbation strategy at the current moment t_N＝ε_l,2ε/(2w), the sum of the absorbed privacy budget and the pre-allocated budget being ε_A＝(t-l)*ε/(2w)-ε_NThen the privacy budget epsilon available for the policy selection issued at time t_t,2Comprises the following steps:

it should be noted that when ε_AWhen the temperature is less than or equal to 0, the temperature is increased

This is in fact representative of the moments when the advance privacy allocation is cancelled, i.e. the moments when no budget is available, and therefore when epsilon_AWhen the value is less than or equal to 0, directly selecting an approximate strategy and issuing r_t-1I.e. r_t＝r_t-1And remember epsilon_t,2Is 0, the next step is not requiredAnd (5) carrying out the following steps.

based on the privacy budget calculated by deviation, processing local data by using a self-adaptive selection random disturbance algorithm, and processing the original data of each user

Into one of all the value fields of possible values of N users

The disturbance value

First according to the allocated budget epsilon_t,1And the possible state value number d of all users, and adaptively selecting the current popular random disturbance method GRR or OUE in the local differential privacy protection, when

In this case, the GRR method is selected because the variance is smaller when the frequency estimation is performed, and the other way around OUE is selected, where GRR is taken as an example, and the idea is to perform the estimation on any privacy data

The user uploads a value to the server according to the probability of p, and the value is randomly selected from the server according to the probability of q being 1-p

The disturbance mechanism M is in the following specific form:

after collecting all the disturbance data, the server firstly counts the frequency result of the disturbance data

Based on which a true frequency c is corrected_t＝<c_t[1],…,c_t[d]>Unbiased estimation of

Under the GRR perturbation method,

the calculation formula and the variance formula thereof are as follows:

wherein k is 1,2, …, d, f_kRepresents omega_kThe true frequency of the frequency band,

and 4, step 4: and acquiring the unbiased estimation quantity of the deviation between the current time statistic and the statistical release value of the previous adjacent time based on the unbiased estimation of the real frequency calculated by the deviation.

Server utilization

Measuring the true frequency c of the current moment by using the square distance_tAnd the value r issued at the previous adjacent moment_t-1＝r_lIs recorded as dis^*

This value is currently not calculable and needs to be calculated in the previous step

Substitution c_t[k]By means of

The correction is carried out, so that dis can be obtained under the condition of meeting the local differential privacy protection^*Is calculated as follows:

first assume that the privacy budget ε is based on the allocation_l,2The random disturbance is carried out again according to the same method as the step 3 to obtain the true frequency c_tAn unbiased estimation of

Into one of all the value fields of possible values of N users

The disturbance value

estimated mean square error of

Further, based on the GRR disturbance method, the specific calculation formula of err is

And issuing the statistical issuing value at the current moment

when the unbiased estimation quantity is less than or equal to the mean square error, selecting an approximate strategy and using the release value r of the previous adjacent moment_t-1＝r_lAs an approximate value of the current-time distribution value, distributing the current-time statistical distribution value r_t＝r_t-1Without consuming privacy budget, e_t,2＝0。

Referring to fig. 2, according to the infinite data stream local privacy protection algorithm based on dynamic budget allocation, a sliding time window is used to dynamically allocate privacy budgets for the whole, and a specific process of dynamic privacy budget allocation is described as an example. Setting w to 3 and epsilon to 3, dividing the total budget into 2w parts, and dividing the total budget into two tasks at each moment, wherein it can be seen that the disturbance strategy is selected at the first two moments, so that the budget allocated in advance does not change, and when t is 3, the approximate distribution strategy is selected, so that the budget allocated in advance to the distribution strategy selection is selected in advance

Is saved, and has no real consumption, epsilon_3,2By 0, and t 4, the budget allocated to the selection of the release policy becomes

This absorbs the budget saved in the previous moment, but after judgment, the approximate strategy is selected, and the budget is not consumed. When t is 5, a perturbation strategy is chosen which absorbs the budget at

times

3 and 4, so that ε_5,21.5. Since this moment consumes two moments budgets absorbed, the next two moments will directly select an approximation strategy, which aims to ensure that the total consumed privacy budget within any sliding window does not exceed epsilon during the whole process, as can be clearly seen in fig. 2.

Referring to fig. 3(a) and 3(b), frequency statistics experiments are performed on the method (LBA) and the reference method (LBU) of the present invention in the synthesized data set LNS and the real data set Taxi, respectively, and the release statistics sequence and the true publication statistics sequence obtained by the two methods are compared with each otherAnd (3) plotting the variation of the average relative error between the real statistical sequences with the total privacy budget epsilon. Wherein the synthetic data set LNS is passed through c_t＝c_t-1The sum Q of + N (0, Q) is 500, the user size N of the dataset is 100000, the range size d of the user value is 2, and the possible total T is 800 time points, while the real dataset Taxi has N44240, d is 5, and the total T is 886 time points. In the reference method (LBU), after random disturbance is carried out by using the privacy budget of epsilon/w at each moment, the disturbance data are gathered to calculate frequency estimation results. The sliding window size w of the experiment of the two data sets is 20, and the variation range of epsilon is 0.5-2.5. It can be seen from fig. 3 that the average relative errors of the two methods are both reduced with the increase of the privacy budget, which is mainly caused by the random perturbation method for implementing the local differential privacy protection, but the method of the present invention has lower average relative errors under different privacy budgets, and the advantage on the Taxi data set is more obvious, and the smaller the privacy budget is, the greater the advantage of the method of the present invention is compared with the benchmark method, which also indicates that the smaller the budget is, the greater the error of frequency estimation is, and at this time, the number of times of using the perturbation strategy should be reduced to ensure a certain utility.

Referring to fig. 4(a) and fig. 4(b), frequency statistics experiments are performed on the method (LBA) and the reference method (LBU) in the synthesized data set LNS and the real data set Taxi, respectively, and the result graphs of the change of the average relative error between the release statistical sequence and the real statistical sequence obtained by the two methods along with the size w of the sliding window are compared. The experiment parameters are set to have a total privacy budget epsilon of 2.0, and the variation range of the size w of the sliding window is 10-50. It can be seen from fig. 4 that the average relative error of both methods increases with the increase of the sliding window size, mainly because the larger w is, the smaller the privacy budget divided at each time is, and the error of natural estimation increases, but the method of the present invention has lower average relative error under different sliding window sizes, and the advantage on the Taxi data set is more obvious, and the larger w is, the greater the advantage of the method of the present invention is compared with the reference method, which also indicates that the number of times of using the perturbation strategy should be reduced to ensure the utility under the condition of smaller budget.

Referring to fig. 5, the ROC curve results obtained by comparing the method (LBA) of the present invention and the reference method (LBU) of the present invention in the real data set Taxi are shown. The abnormal value detection can also be called event monitoring, and here refers to fixing a threshold, and the frequency estimation sequence obtained by the two methods is used for judging whether the real frequency sequence exceeds the threshold, and checking the judgment correctness of the frequency estimation sequence. The experiment was performed by selecting one state value in the Taxi dataset user value field, with the experiment parameters setting the total privacy budget ∈ 1.0, the sliding window size w ═ 30, and the threshold value set to 0.75 × (max (c) -min (c)) + min (c). As can be seen from FIG. 5, the method of the present invention has higher accuracy, and the AUC value is 0.979 greater than 0.898 of LBU, which indicates that the method of the present invention not only has smaller error, but also retains more information embedded in the deep layer of the original sequence.

Referring to fig. 6, the present invention discloses an infinite data stream real-time privacy protection system based on dynamic budget allocation, which includes:

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The method for protecting the real-time privacy of the infinite data stream based on dynamic budget allocation is characterized by comprising the following steps:

2. The method according to claim 1, wherein the length of the time window and the total privacy budget are processed to obtain privacy budgets selected by the skew calculation and the release policy, respectively, specifically:

setting the length w of a time window and a total privacy budget Epsilon, equally dividing the total privacy budget into 2w parts, using the privacy budgets by a multiple of Epsilon/(2 w), selecting and allocating the privacy budgets Epsilon/(2 w) for deviation calculation and release strategy at each moment, and setting the privacy budget Epsilon for deviation calculation at the current moment t_t，1Epsilon/(2 w), the privacy budget for the release policy selection is epsilon_t，2＝ε/(2w)。

3. The method according to claim 2, wherein the privacy budget selected based on the release policy and the privacy budget saved from the moment of absorbing the previous selected perturbation policy to the current moment are obtained by acquiring the privacy budget selected by the release policy at the current moment, specifically:

the privacy budget selected on the basis of the issuing policy absorbs the privacy budget saved from the moment l when the perturbation policy was selected from the previous one to the current moment tThe privacy budget epsilon cancelled at the previous moment l of selecting the perturbation strategy_N＝ε_l，2ε/(2w), the sum of the absorbed privacy budget and the pre-allocated budget being ε_A＝(t-l)*ε/(2w)-ε_NPrivacy budget ε for policy selection at time t_t，2Comprises the following steps:

ε_t，2＝min{ε_A，ε/2}

wherein epsilon_l，2The privacy budget at time/at which the perturbation policy was selected for the previous one.

4. The method according to claim 3, wherein the privacy budget based on the deviation calculation randomly perturbs original data of all users to obtain an unbiased estimation of the true frequency based on the deviation calculation, specifically:

Into one of all the value fields of possible values of N users

The disturbance value

Sending the frequency count to a server, wherein j is 1,2_t1Unbiased estimation of

5. The method according to claim 4, wherein the unbiased estimation of the true frequency based on the deviation calculation obtains an unbiased estimation amount of the deviation between the statistic of the current time and the statistical distribution value of the previous adjacent time, specifically:

server utilization

6. The method according to claim 5, wherein the privacy budget selected based on the current release policy randomly perturbs the original data of all users to obtain an unbiased estimation of the true frequency selected based on the policy, specifically:

Into one of all the value fields of possible values of N users

The disturbance value

Sending the frequency count to a server, wherein j is 1,2_t2Unbiased estimation of

7. The method according to claim 6, wherein the unbiased estimation of the real frequency based on the policy selection is processed to obtain a mean square error of the unbiased estimation, specifically:

the specific calculation formula of the mean square error err is as follows:

wherein d is the number of possible states of all users.

8. The method according to claim 7, wherein the determining whether the unbiased estimation amount is larger than the mean square error, if so, selecting a perturbation strategy, and outputting an unbiased estimation of the true frequency selected based on the strategy; if not, selecting an approximate strategy, outputting a previous adjacent moment release value as an approximate value of the current moment release value, and consuming no privacy budget, specifically:

when the unbiased estimation quantity is larger than the mean square error, selecting a disturbance strategy, and selecting a privacy budget epsilon based on the strategy at the current moment_t，2Carrying out random disturbance on the original data to obtain a true frequency c_tAn unbiased estimation of

And issuing the statistical issuing value at the current moment

when the unbiased estimation quantity is less than or equal to the mean square error, selecting an approximate strategy and using the release value r of the previous adjacent moment_t-1＝r_lAs an approximate value of the current-time distribution value, distributing the current-time statistical distribution value r_t＝r_t-1Without consuming privacy budget, e_t，2＝0。

9. An infinite data stream real-time privacy protection system based on dynamic budget allocation, comprising: