CN112231749A

CN112231749A - Distributed single-dimensional time sequence data real-time privacy protection publishing method with consistency

Info

Publication number: CN112231749A
Application number: CN202011097748.XA
Authority: CN
Inventors: 任雪斌; 王舒阳; 杨树森; 杨新宇; 姚向华; 闫雯雯
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2021-01-15
Anticipated expiration: 2040-10-14
Also published as: CN112231749B

Abstract

The invention discloses a distributed single-dimensional time sequence data real-time privacy protection publishing method with consistency, and belongs to the field of privacy protection. According to the method, the sampling frequency is adjusted through a self-adaptive sampling strategy, sampled data are disturbed through a Laplace mechanism at the sampling moment, then distributed posterior estimation is carried out on the published data of each node at the sampling moment through a single-dimensional perception information correction strategy based on Kalman consistency filtering, prior prediction data are directly published at the non-sampling moment, finally, the method for publishing the distributed single-dimensional time sequence perception data in real time is achieved, the method is independent of a central server and has the differential privacy guarantee and consistency, and the real-time requirement of dynamic data publishing and the consistency requirement of distributed data publishing are met. The method has a good application effect in an actual data scene, and can be used in distributed dynamic perception data distribution systems such as a distributed power utilization load monitoring system of an intelligent power grid and disease monitoring application.

Description

Distributed single-dimensional time sequence data real-time privacy protection publishing method with consistency

Technical Field

The invention belongs to the field of privacy protection, and particularly relates to a distributed single-dimensional time sequence data real-time privacy protection issuing method with consistency.

Background

With the rapid development of the internet of things technology and the continuous improvement of the hardware level, people's production and life are more and more digital, intelligent and networked, and various intelligent devices fill people's daily life to perceive data closely related to people themselves and the surrounding environment all the time. The system can collect, release and mine a large amount of user perception data, can provide all-round information and knowledge, and provides more specialized and personalized services for people. For example, the information such as browsing, shopping records and book image scores of the user is gathered and released, and the information can be used for making a recommendation scheme which is more convenient for people; the environment information or the position track collected by the smart phone is converged and published, and the method can be used for digital planning of cities, intelligent traffic service and the like. Thus, the distribution of such perceptually aggregated data has tremendous application, whether to a business organization, research institution or government.

However, the convergence and distribution of sensory data provides services and convenience to people, as well as presents an unprecedented privacy concern. The data contains a large amount of personal sensitive information, such as disease information, consumption habits, location tracks and the like, which is likely to be leaked along with the distribution of the data. With the rapid development of technologies such as data mining and machine learning, there is a high possibility that data that appears to be secure indirectly reveals privacy. Research indicates that the identity of a mobile phone user can be subjected to associated identification by combining flow information, time information and social network information of a mobile phone network; further, studies have demonstrated that a single user can be uniquely identified by performing correlation analysis using a plurality of pieces of attribute information other than IDs, and that even deletion of personal identification information (such as name, ID, and the like) is insufficient to secure private information. Therefore, the privacy protection problem in the data distribution application is very important and cannot be ignored.

Most of the existing privacy protection publishing technologies aim at a centralized perception data convergence system and solve the publishing problem of static data sets. And the rapidly developed application technology of the internet of things and the increasingly serious privacy disclosure problem put higher demands on the privacy protection release technology. On the one hand, in recent years, data mining is more and more commonly applied to a time sequence data perception system, such as traffic flow monitoring, real-time financial analysis, epidemic situation monitoring, intelligent power grid power load monitoring and the like. The applications can provide real-time feedback results by mining and analyzing the dynamically updated and issued time sequence perception convergence data, so that the most accurate information is provided at present. On the other hand, with the continuous development of the internet of things technology, the system scale and the data scale of various sensing applications are continuously enlarged, and the centralized sensing data aggregation system cannot meet the era requirements of huge data volume and high information sharing. Therefore, more and more applications adopt a distributed system architecture for data aggregation and distribution, including many sensing applications based on dynamic time series data. However, in the distributed dynamic time-series aware data distribution application, since the time-series data contains rich time correlation, once being utilized by an attacker, more information can be mined out besides the data itself, resulting in more privacy being exposed. Meanwhile, distributed data publishing applications generally need to ensure the cooperation and consistency of data published by each node, so that operations such as communication, data sharing and the like need to be performed between different nodes, and these frequent and complicated operations may bring more attack opportunities to attackers, further aggravating the threat of privacy exposure. Thus, the privacy protection problem in distributed dynamic time-series aware data distribution applications is more severe and urgent.

Disclosure of Invention

Aiming at the problem of privacy protection and release of distributed time sequence sensing data, the existing related research cannot effectively meet the requirement of consistency in the distributed data dynamic release process while improving the effectiveness of continuously released data. Therefore, the invention aims to provide a distributed real-time privacy protection publishing method of single-dimensional time sequence data, which has consistency, aiming at publishing the most commonly applied single-dimensional limited time sequence perception data. The method realizes the real-time property, high utility and consistency of the private data release in the decentralized distributed sensing system. The distributed dynamic sensing data distribution system can be used in a distributed power utilization load monitoring system of a smart power grid, a disease monitoring application and other distributed dynamic sensing data distribution systems.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

a distributed single-dimensional time sequence data real-time privacy protection release method with consistency adjusts sampling frequency through a self-adaptive sampling strategy, sample data are disturbed through a Laplace mechanism at the sampling moment, then distributed posterior estimation is carried out on the release data of each node at the sampling moment through a single-dimensional perception information correction strategy based on Kalman consistency filtering, prior prediction data are directly released at the non-sampling moment, and finally, the distributed single-dimensional time sequence perception data real-time release method which is independent of a central server and has differential privacy guarantee and consistency is used in a distributed dynamic perception data release system of a distributed power utilization load monitoring system and a disease monitoring application of an intelligent power grid and specifically comprises the following steps:

1) modeling data: taking a statistic convergence issuing scene as an example, defining a real-time privacy protection issuing problem of distributed single-dimensional time sequence data, and modeling a sensing node network interaction condition;

2) initialization: presetting maximum sampling times M and a maximum privacy budget epsilon according to actual conditions and requirements, and allocating the privacy budget epsilon to each sampling point in advance according to an average allocation principle_kThe following steps are all performed locally at each node;

3) and (3) perception data sampling: each node determines whether the current moment k samples the input time sequence data or not through the latest sampling interval;

4) perception data disturbance: the sampled node i truly senses data x of the node i through a Laplace algorithm LPA_i(k) Disturbing to obtain a disturbance value z of the node satisfying the differential privacy guarantee_i(k) Directly taking the released data at the previous moment as disturbance observation data at the current moment by the non-sampling node;

5) perception information interaction: all the nodes including non-sampling nodes pass through a Kalman consistency filtering-local information updating algorithm KCF-update locally to calculate the prior estimation value of the node i

And broadcast information message_i(k) Broadcasting to all neighbor nodes, and receiving broadcast information from the neighbor nodes at the same time, wherein the interactive information does not relate to real sampling data of each node;

6) and (3) correcting the perception information: the sampling node realizes the correction of local disturbance data by using prior information and sensing information of neighbor nodes, the sensing target at the current moment k is subjected to posterior estimation and issued by a Kalman consistency filtering-single-dimensional distributed estimation algorithm KCF-correction, and the non-sampling node directly issues the prior estimation at the moment and completes the iteration;

7) perceptual update feedback: and the Sampling node feeds the prior posterior estimation error back to the PID controller, updates the Sampling interval and completes the iteration by an Adaptive Sampling algorithm, and if the Sampling frequency exceeds the maximum Sampling frequency M, the system stops Sampling.

The invention further improves the method that the specific operation of the step 1) is as follows: assuming that in a distributed sensing system with n nodes, each sensing node carries out real-time statistical aggregation on the same sensing target information to form a distributed single-dimensional sensing data sequence X of the ith node_i＝{x_i(1),…,x_i(k),…,x_i(T) }, k ═ 1, …, T, where x_i(k) For sensing the sensing data of the node i at the moment k, namely the aggregation information of the node i to all the user individuals in the aggregation range, T is the length of the time sequence,let the actual statistical sequence of the information of the perception target be R ═ { R (1), …, R (k), …, R (t) }, where R (k) is the actual statistical value of the perception target at discrete time k time, and the perception data x at node i, k time_i(k) The theoretical relationship with the perception target actual data r (k) can be expressed as

x_i(k)＝H_i·r(k)

Wherein H_iIn order to realize the comprehensive, accurate and consistent dynamic information release of each node in a distributed sensing system without a central convergence server, each node is communicated with other nodes and operated cooperatively during the privacy protection processing and releasing process, each node is connected with certain adjacent nodes according to communication conditions to carry out local interaction, and does not interact with all other nodes in the sensing system pairwise, a undirected graph (V, E) is adopted to model the interaction condition of the sensing node network, wherein, V ═ 1,2, …, n is a node set, E is an edge set, every two nodes capable of communicating with each other are connected by an edge, namely, a neighbor node, in order to reduce network communication load, it is assumed that a sensing node in the distributed system only performs information interaction with the neighbor node at each moment, and a model graph G is a connected graph, since the sensing data of each node may be attacked in the interaction process and privacy security between each node does not have transparency, each node performs privacy protection processing on the sensing data at a local end before performing information interaction with other nodes, and z is used for protecting the sensing data before performing information interaction with other nodes_i(k) Representing perception data x for node i_i(k) Disturbing data after privacy protection; finally, the distributed distribution sequence of each node is denoted as O ═ O_i(1),…,o_i(k),…,o_i(T)},i＝1,…,n。

The further improvement of the invention is that the specific operation of the step 2) is as follows: according to the actual situationPresetting maximum sampling times M and maximum privacy budget epsilon according to conditions and requirements, wherein the privacy budget

A further development of the invention is that the specific operation of step 3) is: if the current time k is equal to the latest sampling time sp of the node i_iAnd if the current sampling number n is less than the maximum sampling number M, sampling the input time sequence data at the current moment k, wherein the current moment is a sampling point, and the sampling number n is equal to n +1, otherwise, sampling the input time sequence data at the current moment k, and wherein the current moment is a non-sampling point.

The further improvement of the invention is that the specific operation of the step 4) is as follows: sensing data x of ith node needing sampling at k moment_i(k) Addition is based on sensitivity Δ_fAnd privacy budget ε_kCorrected Laplace noise to obtain disturbance data z_i(k) I.e. z_i(k)＝x_i(k)+Lap(0,Δ_f/ε_k) Wherein the sensitivity Δ_fUsually, the calculation is performed according to a query function in practical application, and a node which does not need to be sampled directly takes the released data at the previous moment as the disturbance observation data at the current moment.

The further improvement of the invention is that the concrete operation of the step 5) is as follows: locally calculating the prior estimated value of the ith node through a process model at the moment k by using a Kalman consistency filtering-local information updating algorithm of all nodes

Wherein A is the transfer coefficient of the first layer,

calculating a posterior estimated value at the last moment of the ith node through a Kalman consistency estimator; then calculating the information vector u of the ith node_i(k) And matrix U_i，

R_jIs the observed noise variance, H, of node j_iThe information to be interacted of the node i contains a priori estimated value of k time as a perception coefficient of the node i

And an information vector u_i(k) And a matrix U_iIn a standardized form of

And will message_i(k) Broadcast to neighbor nodes i.e. j ∈ N_iSimultaneously receiving broadcast information of all adjacent nodes, and finally fusing the node i and the adjacent node j belonging to the node N_iInformation acquisition of (2) fused perception data

Fusion variance

The invention is further improved in that the specific operation of the step 6) is as follows: at the sampling node, the prior information is passed

And neighbor node perception information, i.e. fusion perception data y_i(k) And fusion variance S_i(k) The node i can obtain an accurate posterior estimation of the k moment by correcting

Firstly, a series of parameters of the current node i at the moment k are calculated: kalman gain M_iUniformity gain gamma and estimation error variance P_i(ii) a Kalman gain M_i(k)＝(P_i ^-1+S_i(k))^-1Gain of uniformity

Where β > 0 is a relatively small constant, the error variance P is estimated_i＝AM_i(k-1)A^T+ Q, where A is the transfer coefficient and Q is the variance of the noise in the user equation of state, associated with the change in the detection target itself, obtained from historical data; computing a posteriori estimate using a series of parameters and fusion information

Namely:

on non-sampling nodes, prior estimates are issued directly

Namely, it is

Wherein A is the transfer coefficient of the first layer,

the posterior estimation value is calculated by the Kalman consistency estimator at the last moment of the ith node, and meanwhile, other nodes synchronously calculate local information and posterior estimation, so that all nodes can realize accurate distributed estimation.

The invention is further improved in that the specific operation of the step 7) is as follows: due to the lack of priori knowledge of time sequence data in the real-time release of dynamic data, the dynamic change of the data is detected to adjust the sampling frequency in real time, the change condition of the data is measured according to the deviation of the priori estimation and the posterior estimation by introducing a filtering strategy, and then the nth sampling time k is defined_n(0＜k_nFeedback error of < T)

Is composed of

Wherein

Is k_nIs estimated a priori of the time-of-flight,

is k_nThe parameter δ is set to prevent the situation where the divisor is 0, and 1 is usually taken in the statistical convergence scenario, assuming that the a posteriori estimation is performed

Approaching the actual data of the perception target and estimating a priori

Is determined by a fixed process model, so that it is inferred if the feedback error is correct

Increasing means that the actual target data is undergoing rapid change, the error is fed back to the controller, the controller detects the error and correspondingly reduces the sampling interval, the dynamic adjustment of the sampling frequency can be realized, the most common feedback controller, namely PID controller, is adopted to measure the sampling performance, and the feedback error is based on

The output error delta of the PID controller is:

wherein, C_p、C_i、C_dProportional, integral, differential control gain, T, respectively_iFor the integration time of the i-node, based on the analysis of the adaptive sampling strategy described above, k_nSampling interval of time of day

The update formula of (2) is:

wherein theta and xi are predefined parameters, theta determines the variation amplitude of the sampling interval, xi is the adjusting point of the sampling process, sp_iIs the latest sampling moment of the node i, which needs to be based on the sampling interval

Is updated, i.e.

But if the sampling time n exceeds the maximum sampling time M, the system stops sampling, and the latest sampling time sp_iNo update is being performed.

The invention has at least the following beneficial technical effects:

according to the method for distributing the distributed single-dimensional time sequence data with consistency in real-time privacy protection and release, the sampling frequency is adjusted through a self-adaptive sampling strategy, and the sampled data is disturbed through a Laplace mechanism at the sampling moment. And then distributed posterior estimation is carried out on the published data of each node at the sampling moment through a single-dimensional perception information correction strategy based on Kalman consistency filtering, prior prediction data is published directly at the non-sampling moment, and finally, a distributed single-dimensional time sequence perception data real-time publishing method which does not depend on a central server and has differential privacy guarantee and consistency is realized, the real-time requirement of dynamic data publishing and the consistency requirement of distributed data publishing are met, and the real-time property, the high utility property and the consistency of data publishing are realized. The distributed dynamic sensing data distribution system can be used in a distributed power utilization load monitoring system of a smart power grid, a disease monitoring application and other distributed dynamic sensing data distribution systems.

Drawings

FIG. 1 is a distributed dynamic awareness data publishing process.

FIG. 2 is a flow chart of the method of the present invention.

Fig. 3(a) is a graph of the Average Relative Error (Average Relative Error) between the published statistical sequence and the actual statistical sequence of the real data set uemploy data according to the present invention and FAST method as a function of the maximum privacy budget epsilon.

Fig. 3(b) is a graph showing the variation of the Average consistency Error (Average Consensus Error) of the distribution sequence of the real data set uemploy data and the Average value of the distribution data of all nodes according to the maximum privacy budget epsilon by the FAST method and the present invention.

Fig. 4(a) is a graph of the Average Relative Error (Average Relative Error) between the actual statistical sequence and the actual statistical sequence published by the tru data set Flu data according to the FAST method of the present invention as a function of the maximum privacy budget epsilon.

Fig. 4(b) is a graph showing the variation of the Average consistency Error (Average Consensus Error) of the real data set Flu data publishing sequence and the Average value of all the node publishing data according to the maximum privacy budget epsilon of the present invention and the FAST method.

Detailed Description

The invention is described in further detail below with reference to the figures and specific embodiments.

Referring to fig. 1, in the process of distributing distributed dynamic sensing data, each server node aggregates the sensing data of the user individuals in its local range and interactively shares with other nodes, so that the distributed nodes can simultaneously and consistently distribute global data, thereby providing further analysis and application. However, interactive sharing among distributed server nodes may pose a certain threat to the privacy of data, and published aggregated data may be subjected to link attacks, resulting in disclosure of the privacy of individual users therein. For example, in a distributed power consumption load monitoring system of a smart grid, each distributed station aggregates real-time power consumption data of users in different areas, and cooperatively interacts with other stations to obtain comprehensive, efficient and accurate aggregated data, so that services such as intelligent whole-grid scheduling, real-time electricity price policies, accurate prediction of whole-grid loads and the like are provided. However, once the interaction process between the distributed sites is intercepted or an untrusted site exists, the aggregated data is exposed, and then an attacker can further deduce the privacy of the user such as daily behavior and life habit through the power utilization load curve of the user. Similarly, in disease monitoring applications, multiple hospitals perform real-time statistics on the number of patients with various diseases, and perform information interactive sharing with other hospitals to obtain comprehensive statistical information for scientific research or social services (such as intelligent monitoring and prevention of infectious diseases or epidemic infection in various regions). However, it is important to provide the necessary privacy protection to the aggregated data prior to distribution because the unprotected aggregated healthcare data is likely to reveal the individual condition of the patient, and the disease is very sensitive private information to the patient.

Referring to fig. 2, the distributed single-dimensional time series data real-time privacy protection publishing method with consistency provided by the invention includes the following steps:

1) modeling data: assuming that in a distributed sensing system with n nodes, each sensing node performs real-time statistics and aggregation on the same sensing target information (for example, statistics on the number of infectious disease patients per day, or statistics on the total electricity consumption per minute of all users in a certain area), so as to form a distributed single-dimensional sensing data sequence X of the ith node_i＝{x_i(1),…,x_i(k),…,x_i(T) }, k ═ 1, …, T. Wherein x is_i(k) And T is the length of the time sequence, wherein the sensing data of the sensing node i at the moment k, namely the information of the node i on the convergence of all the user individuals in the convergence range, is sensed. Let the actual statistical sequence of the perception target information be R ═ { R (1), …, R (k), …, R (t) }, where R (k) is the actual statistical value of the perception target at discrete time k time instant. Perception data x of node i_i(k) The theoretical relationship with the perception target actual data r (k) can be expressed as

x_i(k)＝H_i·r(k)

Wherein H_iIs the perceptual coefficient of the node i. The privacy protection issuing method aims at issuing the sensing target statistical data which is as accurate as possible in real time in the dynamic convergence process of the distributed nodesAnd simultaneously protecting the privacy of the individual users.

Because the perception data of a single node is often not comprehensive and accurate enough, the published data of distributed nodes is often inconsistent. Therefore, in the distributed sensing system without the central aggregation server, in order to achieve comprehensive, accurate and consistent dynamic information release of each node, each node needs to perform mutual communication and cooperative operation with other nodes in the privacy protection processing and releasing process. Each node only needs to establish connection with some adjacent nodes according to communication conditions to carry out local interaction, and does not need to carry out pairwise interaction with all other nodes in the sensing system. And modeling the interaction situation of the sensing node network by adopting an undirected graph G (V, E), wherein V {1,2, …, n } is a node set, and E is an edge set. Every two nodes which can communicate with each other are connected by edges, namely, the nodes are neighbor nodes. In order to reduce network communication load, it is assumed that a sensing node in the distributed system performs information interaction only with a neighbor node at each time, and the model graph G is a connected graph. Since the sensing data of each node may be attacked in the interaction process, and the privacy security between the nodes does not have transparency, before performing information interaction with other nodes, each node needs to perform privacy protection processing on the sensing data at the local end. By z_i(k) Representing perception data x for node i_i(k) And (5) disturbing data after privacy protection. Finally, the distributed distribution sequence of each node is denoted as O ═ O_i(1),…,o_i(k),…,o_i(T)},i＝1,…,n。

2) Initialization: presetting the maximum sampling times M and the maximum privacy budget epsilon according to actual conditions and requirements, and allocating the well-distributed privacy budget in advance

3) And (3) perception data sampling: if the current time k is equal to the latest sampling time sp of the node i_iAnd the current sampling number n is less than the maximum sampling number M, then the input time series data is sampled at the current time k, the current time is a sampling point, and the sampling number n is equal to n + 1. Otherwise whenThe input time sequence data are not sampled at the previous moment k, and the current moment is a non-sampling point.

4) Perception data disturbance: sensing data x of ith node needing sampling at k moment_i(k) Addition is based on sensitivity Δ_fAnd privacy budget ε_kCorrected Laplace noise to obtain disturbance data z_i(k) I.e. z_i(k)＝x_i(k)+Lap(0,Δ_f/ε_k) Wherein the sensitivity Δ_fIt is usually calculated according to a query function in practical applications, for example, it is usually 1 in counting statistics. And directly taking the released data at the previous moment as the disturbance observation data at the current moment by the node without sampling.

5) Perception information interaction: the local of all the nodes calculates the prior estimated value of the ith node through a process model at the moment k by a Kalman consistency filtering-local information updating algorithm

Wherein A is the transfer coefficient of the first layer,

the posterior estimated value is calculated by a Kalman consistency estimator at the last moment of the ith node.

Then, the information vector u of the ith node needs to be calculated_i(k) And matrix U_i，

R_jIs the observed noise variance, H, of node j_iIs the perceptual coefficient of the node i. The information to be interacted of the node i comprises a priori estimated value of k time

Information vector u_i(k) And matrix U_iIn a standardized form of

And will message_i(k) Broadcast to neighbor nodes i.e. j ∈ N_iAnd simultaneously receiving broadcast information of all adjacent nodes. Finally, the fusion node i and the adjacent node j thereof belong to N_iInformation acquisition of (2) fused perception data

Fusion variance

6) And (3) correcting the perception information: at the sampling node, the prior information is passed

And neighbor node perception information, i.e. fusion perception data y_i(k) And fusion variance S_i(k) The node i can obtain the posterior estimation with accurate k time after correction

Firstly, a series of parameters of the current node i at the moment k are calculated: kalman gain M_iUniformity gain gamma and estimation error variance P_i. Kalman gain M_i(k)＝(P_i ^-1+S_i(k))^-1Gain of uniformity

Where β > 0 is a relatively small constant, the error variance P is estimated_i＝AM_i(k-1)A^T+ Q, where a is the transfer coefficient and Q is the variance of the noise in the user equation of state, associated with the change in the detection target itself, can be obtained from historical data. Computing a posteriori estimate using a series of parameters and fusion information

Namely:

on non-sampling nodes, prior estimates are issued directly

Namely, it is

Wherein A is the transfer coefficient of the first layer,

the posterior estimated value is calculated by a Kalman consistency estimator at the last moment of the ith node. Meanwhile, other nodes are synchronously calculating local information and posterior estimation, so that all nodes can realize accurate distributed estimation.

7) Perceptual update feedback: due to the lack of prior knowledge of the timing data in the real-time distribution of dynamic data, dynamic changes in the data need to be detected to adjust the sampling frequency in real-time. The introduced filtering strategy measures the change condition of data according to the deviation between the prior estimation and the posterior estimation, and defines the nth sampling time k_n(0＜k_nFeedback error of < T)

Is composed of

Wherein

Is k_nIs estimated a priori of the time-of-flight,

is k_nThe parameter δ is set to prevent the case where the divisor is 0, and 1 is usually taken in the case of statistical convergence. Hypothesis posterior estimation

Approaching the actual data of the perception target and estimating a priori

Is determined by a fixed process model, so it can be inferred if the feedback error is correct

Increasing means that the actual target data is undergoing rapid changes. At this time, the error is fed back to the controller, and the controller detects the error and correspondingly reduces the sampling interval, so that the dynamic adjustment of the sampling frequency can be realized.

The most common feedback controller, the PID (proportional, integral and derivative) controller, is used to measure the performance of the sample. Based on feedback error

Has an output error delta of PID controller

Wherein, C_p、C_i、C_dProportional, integral, differential control gain, T, respectively_iIs the integration time. Analysis based on the adaptive sampling strategy described above, k_nSampling interval of time of day

Is updated by the formula

Wherein theta and xi are predefined parameters, theta determines the variation amplitude of the sampling interval, and xi is the adjusting point of the sampling process. sp_iIs the latest sampling moment of the node i, which needs to be based on the sampling interval

Is updated, i.e.

Referring to fig. 3(a) to (b) and fig. 4(a) to (b), the superiority of the present invention in data protection is analyzed as follows:

FIGS. 3(a) - (b) are graphs comparing the effect of the method of the present invention (PropDP) and the FAST method on the change of the real data set Unamploy data with the maximum privacy budget ε; wherein, fig. 3(a) is a graph of the Average Relative Error (Average Relative Error) between the published statistical sequence and the actual statistical sequence of the FAST method according to the present invention as a function of the maximum privacy budget epsilon. Fig. 3(b) is a graph of the Average consistency Error (Average Consensus Error) of the published sequence of the FAST method and the Average of the published data of all nodes according to the present invention as a function of the maximum privacy budget epsilon.

In fig. 3(a), as the privacy budget increases, i.e., the privacy level decreases, the average relative error of the two mechanisms over different data sets exhibits different degrees of decrease. The FAST mechanism also adopts a self-adaptive sampling strategy and a filtering idea, so that the real-time performance and high utility of privacy protection issued data are ensured. However, the FAST employs a general kalman filter, which is only suitable for independent real-time publishing of a single data source, and can not achieve consistent publishing of each node in the distributed sensing system. This chapter uses it in a distributed sensing system, with each node independently performing the method. The average relative error is generally used for measuring the following condition of the output sequence to the input sequence, and is a generalized index which does not need to consider the actual specific application field, and the smaller the average relative error is, the closer the issued sequence is to the actual sequence is, that is, the better the utility of the issued data is. Conversely, the larger the average relative error, the less effective. As the privacy budget increases, the less error the laplace perturbation introduces. And the average relative error of the PropDP mechanism is still much lower than that of FAST, so that the PropDP mechanism has higher utility than that of FAST mechanism under the condition of ensuring the same privacy level. And the utility of the promdp mechanism on the unneploy data set is still greatly improved when the privacy budget is large. The reason is that the Unamploy data set and the numerical value are generally small, when the privacy budget is close to 1, errors caused by Laplace disturbance are large, and in this case, the PropDP method is more advantageous and the effectiveness is improved more remarkably. As such, the average relative error of the PropDP mechanism is much lower across all data sets than that of the FAST mechanism, as the perturbation error is large, when the privacy budget is close to 0.1.

In fig. 3(b), as the privacy budget increases, i.e., the privacy level decreases, the average consistency error of both mechanisms over different data sets exhibits different degrees of decrease. The Average consistency Error (Average Consensus Error) is used to measure the consistency of the distributed node privacy protection release data. The average consistency error is defined as follows

I.e. distributed release sequence o_i(k) And the average value o_average(k) The deviation therebetween. o_average(k) Representing the mean value of data released by all nodes, i.e.

The smaller the average consistency error is, the better the consistency between the issuing sequences of the sensing nodes is, and the worse the opposite is.

Because the inconsistency of data release of each node is mainly caused by Laplace disturbance, the larger the privacy budget is, the smaller the disturbance noise is, and the better the consistency among the nodes is. The average coherence error of the PropDP mechanism is much lower than that of the FAST mechanism, both at lower and higher privacy budgets. It can be seen that the promdp mechanism publishes data much more consistently than the FAST mechanism, while ensuring the same privacy level.

FIGS. 4(a) - (b) are graphs comparing the effect of the method of the invention (PropDP) and the FAST method on the variation of the data of the real data set FLu data with the maximum privacy budget ε; fig. 4(a) is a graph of the Average Relative Error (Average Relative Error) between the published statistical sequence and the actual statistical sequence of the FAST method according to the present invention as a function of the maximum privacy budget epsilon. Fig. 4(b) is a graph of the Average consistency Error (Average Consensus Error) of the release sequence of the FAST method and the Average of all node release data according to the present invention as a function of the maximum privacy budget epsilon.

In fig. 4(a), as the privacy budget increases, i.e. the privacy level decreases, the average relative error of the PropDP and FAST on different data sets shows different decreases, and the average relative error curve of the PropDP mechanism is closer to that of the FAST mechanism, but the average relative error of the PropDP mechanism is still smaller than that of the FAST mechanism in general. It can be seen that the PropDP mechanism has a higher utility than the FAST mechanism, ensuring the same privacy level.

In fig. 4(b), as the privacy budget increases, i.e., the privacy level decreases, the average consistency error of both mechanisms over different data sets exhibits different degrees of decrease. The average coherence error of the PropDP mechanism is much lower than that of the FAST mechanism, both at lower and higher privacy budgets. It can be seen that the PropDP mechanism has greater consistency in publishing data than the FAST mechanism, while ensuring the same privacy level.

Claims

1. A distributed single-dimensional time sequence data real-time privacy protection release method with consistency is characterized in that sampling frequency is adjusted through a self-adaptive sampling strategy, sampled data are disturbed through a Laplace mechanism at the sampling moment, then distributed posterior estimation is carried out on the released data of each node at the sampling moment through a single-dimensional perception information correction strategy based on Kalman consistency filtering, prior prediction data are directly released at the non-sampling moment, and finally a distributed single-dimensional time sequence perception data real-time release method which is independent of a central server and has differential privacy guarantee and consistency is used in a distributed power grid distributed power consumption load monitoring system and a distributed dynamic perception data release system of disease monitoring application is realized, and the method specifically comprises the following steps:

2. The method for real-time privacy-preserving publication of distributed single-dimensional time-series data with consistency according to claim 1, wherein the specific operations of step 1) are as follows: assuming that in a distributed sensing system with n nodes, each sensing node carries out real-time statistical aggregation on the same sensing target information to form a distributed single-dimensional sensing data sequence X of the ith node_i＝{x_i(1),…,x_i(k),…,x_i(T) }, k ═ 1, …, T, where x_i(k) For sensing the sensing data of the node i at the time k, namely the aggregated information of the node i on all the user individuals in the aggregation range, T is the length of the time sequence, and it is assumed that the actual statistical sequence of the sensing target information is R ═ { R (1), …, R (k), …, R (T) }, where R (k) is the actual statistical value of the sensing target at the discrete time k, and the sensing data x of the node i at the time k is the actual statistical value of the sensing target at the discrete time k_i(k) The theoretical relationship with the perception target actual data r (k) can be expressed as

x_i(k)＝H_i·r(k)

Wherein H_iIn order to realize the comprehensive, accurate and consistent dynamic information release of each node in a distributed sensing system without a central convergence server, each node is communicated with other nodes and operated cooperatively during the privacy protection processing and releasing process, each node is connected with certain adjacent nodes according to communication conditions to carry out local interaction, and does not interact with all other nodes in the sensing system pairwise, a undirected graph (V, E) is adopted to model the interaction condition of the sensing node network, wherein, V ═ {1,2, …, n } is a node set, E is an edge set, every two nodes capable of communicating with each other are connected by an edge, namely, are neighbor nodes, and in order to reduce network communication load, assume a distributed systemBecause the sensing data of each node is possibly attacked in the interaction process and the privacy security between the nodes does not have transparency, each node carries out privacy protection processing on the sensing data at a local end before carrying out information interaction with other nodes, and z is used for carrying out information interaction with the neighboring nodes_i(k) Representing perception data x for node i_i(k) Disturbing data after privacy protection; finally, the distributed distribution sequence of each node is denoted as O ═ O_i(1),…,o_i(k),…,o_i(T)},i＝1,…,n。

3. The method for real-time privacy-preserving publication of distributed single-dimensional time-series data with consistency according to claim 2, wherein the specific operations of step 2) are as follows: presetting maximum sampling times M and maximum privacy budget epsilon according to actual conditions and requirements, wherein the privacy budget

4. The distributed real-time privacy-preserving publication method for one-dimensional time-series data with consistency according to claim 3, wherein the specific operation of step 3) is: if the current time k is equal to the latest sampling time sp of the node i_iAnd if the current sampling number n is less than the maximum sampling number M, sampling the input time sequence data at the current moment k, wherein the current moment is a sampling point, and the sampling number n is equal to n +1, otherwise, sampling the input time sequence data at the current moment k, and wherein the current moment is a non-sampling point.

5. The method for real-time privacy-preserving publication of distributed single-dimensional time-series data with consistency according to claim 4, wherein the specific operations of step 4) are as follows: sensing data x of ith node needing sampling at k moment_i(k) Addition is based on sensitivity Δ_fAnd privacy budget ε_kCalibrated laplace noise, toTo disturbance data z_i(k) I.e. z_i(k)＝x_i(k)+Lap(0,Δ_f/ε_k) Wherein the sensitivity Δ_fUsually, the calculation is performed according to a query function in practical application, and a node which does not need to be sampled directly takes the released data at the previous moment as the disturbance observation data at the current moment.

6. The method for real-time privacy-preserving publication of distributed single-dimensional time-series data with consistency according to claim 5, wherein the specific operations of step 5) are as follows: locally calculating the prior estimated value of the ith node through a process model at the moment k by using a Kalman consistency filtering-local information updating algorithm of all nodes

Wherein A is the transfer coefficient of the first layer,

And an information vector u_i(k) And a matrix U_iIn a standardized form of

Fusion variance

7. The method for real-time privacy-preserving publication of distributed single-dimensional time-series data with consistency according to claim 6, wherein the specific operations in step 6) are as follows: at the sampling node, the prior information is passed

Namely:

on non-sampling nodes, prior estimates are issued directly

Namely, it is

Wherein A is the transfer coefficient of the first layer,

8. The method for real-time privacy-preserving publication of distributed single-dimensional time-series data with consistency according to claim 7, wherein the specific operations of step 7) are as follows: due to the lack of priori knowledge of time sequence data in the real-time release of dynamic data, the dynamic change of the data is detected to adjust the sampling frequency in real time, the change condition of the data is measured according to the deviation of the priori estimation and the posterior estimation by introducing a filtering strategy, and then the nth sampling time k is defined_n(0＜k_nFeedback error of < T)