CN116489636A

CN116489636A - Personalized differential privacy protection method under cloud-edge cooperative scene

Info

Publication number: CN116489636A
Application number: CN202310443160.2A
Authority: CN
Inventors: 任爽; 张鑫云
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-07-25

Abstract

The invention provides a personalized differential privacy protection method under a cloud-edge cooperative scene, and provides a personalized local differential privacy protection model based on cloud-edge cooperation, which provides personalized services for users on the premise of meeting the user personalized local differential privacy protection requirements on the edge computing side based on a cloud-edge cooperative framework. On the cloud computing center side, in order to solve the problem of high estimation error caused by reduction of the level privacy data samples due to personalized differential privacy, a privacy data derivation algorithm is adopted, and the utilization rate of data collected by the cloud and the accuracy of estimation are improved under the condition that the personalized differential privacy of a user is not damaged.

Description

Personalized differential privacy protection method under cloud-edge cooperative scene

Technical Field

The invention relates to the technical field of mobile edge computing, in particular to a personalized differential privacy protection method under a cloud edge cooperative scene.

Background

The main objective of differential privacy protection is to protect private information in user information collected from a user, and the data collection mode mainly comprises centralized differential privacy and localized differential privacy. The essence of differential privacy protection is to perturb the original data by adding noise, but to maintain the similarity of the original data to the output of the perturbed data. Common differential privacy preserving mechanisms include the laplace mechanism, the gaussian mechanism, the exponential mechanism, and the random response. Differential privacy mechanisms are now widely used in mobile edge computing scenarios.

The differential privacy protection model in the existing mobile edge computing scene is usually designed only by using two endpoints, namely a user and a third party data collection center, so that the influence of the transmission delay of data from the user to the data center on personalized services is ignored. In addition, the current differential privacy model generally sets the same differential privacy level for the same type or same attribute data, and analyzes the statistical characteristics of the data on the premise. In reality, however, users often prefer to decide on their own how much to protect the data, e.g., some users want to decrease the level of privacy protection to improve the quality of personalized services. At this time, for privacy data of different protection levels, it is difficult for the data collection center to fully utilize the data to analyze the statistical properties of the population.

Disclosure of Invention

The embodiment of the invention provides a personalized differential privacy protection method in a cloud-edge cooperative scene, which is used for solving the technical problems in the prior art.

In order to achieve the above purpose, the present invention adopts the following technical scheme.

A personalized differential privacy protection method under cloud-edge cooperative scene comprises the following steps:

s1, acquiring a user data set with user-defined privacy classes by constructing a local differential privacy protection model;

s2, based on the user data set, performing data deriving operation through a data deriving algorithm combining unitary encoding of a local differential privacy protection model and personalized privacy data deriving to obtain a privacy data set with a privacy class;

s3, grouping the data of the user data set and the privacy data set with the privacy class, and encoding the data of each group to obtain an estimation result of a user privacy histogram;

the estimation result of the user privacy histogram is used for differential privacy data transmission of the edge node.

Preferably, step S1 comprises:

for N users, let the true binary vector of each user u be X _u The privacy class of each user u is τ, the privacy class of each user uThe corresponding privacy budget is epsilon ^τ The privacy budget set is epsilon, the number of edge nodes is k, and the privacy protection level of the ith edge node is l _i ；

Let t= { T ₁ ,t ₂ ,...,t _k A discrete and limited data attribute value field in which user u has its own data T e T and the differential privacy budget required for encrypted transmissionsDifferential privacy budgetsMiddle epsilon ^u ∈ε＝{ε ¹ ,...,ε ⁿ },u＝1,2,...,n。

Preferably, step S2 includes:

value t based on user u in user dataset _i ∈T＝{t ₁ ,t ₂ ,...,t _k And privacy protection class τ, by initializing to obtain a k-dimensional all 0 vector X _u ＝[0,0,...,0] _k ；

Let k-dimensional all 0 vector X _u ＝[0,0,...,0] _k Is 1to obtain X _u [i]＝1；

Through type

Calculating to obtain privacy data set of user u after using grade tau and adding noiseWherein Pr is an abbreviation for Probability;

grouping the privacy data in the user data set according to the corresponding privacy class, and constructing a privacy data set G= [ G ] for each group ₁ ,G ₂ ,...,G _τ ,...,G _m ]Adding all privacy data of privacy class τ in the user data set to the sub data set G _τ ；

By initializing operation orderFormula->Obtaining a derived sub-datasetBy τ=1 to m and r>τto m, repeating the process of executing this substep to obtain the derived privacy data aggregate of each privacy class +.>

Based on sub-data set G _τ Obtaining a privacy data set with a privacy class tau through DR algorithm calculation

Preferably based on the sub-data set G _τ Obtaining a privacy data set with a privacy class tau through DR algorithm calculationComprising the following steps:

e1 through typeAssigning privacy data of privacy class i to a first intermediate temporary variable Z _sup ；

E2 through typeDetermining a user privacy dataset Z _u An upper bound for privacy class j that exists in (a);

e3, if the lower bound i and the upper bound j of the privacy class are equal, executing a substep E4, otherwise, executing a substep E8;

e4 through typeadd/>Z _u add/>Privacy data with privacy class τ are added to +.>And Z _u In (a) and (b);

e5, if the upper bound j of the privacy class is an empty set, executing the substeps E6-E8;

e6 let k be dataIs traversed from 1 by circularly executing this substep;

e7 through typeCalculation to get->The d-th position of (2) is given a value Z _sup [d]When the probability is +.>When->The d-th position of (2) takes a value of 1-Z _sup [d]When the probability is +.>In the method, in the process of the invention, budget for privacy class τ, +.> A privacy budget corresponding to the privacy class i;

e8, if the lower bound i and the upper bound j of the privacy class are not equal and j is not null, executing the substeps E9-E11;

e9 through typeAssigning privacy data of privacy class j to a second intermediate temporary variable Z _inf ；

E10 let k be dataIs traversed from 1 by circularly executing this substep;

if E11 satisfies Z _sup [d]＝Z _inf [d]Executing the substep E12, otherwise executing the substep E13;

e12 through typeWherein->Calculate and get proper->The d-th position of (2) is given a value Z _sup [d]When the probability is +.>When->The d-th position of (2) takes a value of 1-Z _sup [d]When the probability is +.>

E13 through typeWherein->Calculating to obtain the current timeThe d-th position of (2) is given a value Z _sup [d]When the probability is +.>When->The d-th position of (2) takes the value Z _inf [d]When the probability is +.>

E14 through typeadd/>Z _u add/>The privacy data with the privacy class tau obtained by the calculation in the substep are respectively added to +.>And Z _u In obtaining privacy class τ privacy dataset +.>

Preferably, step S3 includes:

for user data sets and privacy level data setsGrouping statistics is performed to obtain the number of samples of τ after data derivation +.>The value range t= { T of the user ₁ ,t ₂ ,...,t _k -and the actual data distribution p= [ P ] of the user ₁ ,p ₂ ,...,p _k ]The method comprises the steps of carrying out a first treatment on the surface of the In (1) the->Representing data as t _k The samples of (2) account for the total number of current samples->Ratio of->Representing the number of samples for which the kth bit is 1;

through type

Calculating to obtain an unbiased estimate of P [ j ], j=1,;

through type

Obtaining a histogram estimation result with the privacy protection level tau;

through type

And calculating the mean square error of the histogram estimation result with the privacy protection level tau.

According to the technical scheme provided by the embodiment of the invention, the personalized differential privacy protection method under the cloud-edge cooperative scene provides a personalized local differential privacy protection model based on cloud-edge cooperation, and personalized service is provided for a user on the premise of meeting the personalized local differential privacy protection requirements of the user on the edge computing side based on the cloud-edge cooperative architecture. On the cloud computing center side, in order to solve the problem of high estimation error caused by reduction of the level privacy data samples due to personalized differential privacy, a privacy data derivation algorithm is adopted, and the utilization rate of data collected by the cloud and the accuracy of estimation are improved under the condition that the personalized differential privacy of a user is not damaged. The method provided by the invention has the following beneficial effects:

according to the personalized local differential privacy protection method based on cloud-edge collaboration, personalized design is carried out on the basis of a local differential privacy model, so that a user has more flexible treatment on data with different privacy protection levels at the edge side under a cloud-edge collaboration scene, and more personalized service quality is experienced; on the cloud computing center side, data with different privacy protection levels are effectively counted, and estimation errors are reduced;

the invention adopts a personalized local differential privacy protection scheme, and the privacy budget of the scheme is determined by the user, so that the user has more personalized privacy protection experience;

the method adopts a OUE coding mode as an optimization scheme of unitary coding, and has smaller variance, so that derived data obtained by a OUE coding-based data derived scheme OUE-DRPP has smaller estimation error, and the mean square error obtained by histogram estimation is lower than that of the DRPP data derived scheme.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a process flow diagram of a personalized differential privacy protection method in a cloud-edge cooperative scene provided by the invention;

fig. 2 is a schematic diagram of a personalized local differential privacy protection model based on cloud edge collaboration in the personalized differential privacy protection method under a cloud edge collaboration scene;

fig. 3 is a schematic diagram of a histogram estimation flow, which is a frequency estimation algorithm of a differential privacy protection model of a personalized differential privacy protection method under a cloud edge cooperative scene;

fig. 4 is a data derivation algorithm example of a personalized differential privacy protection method under a cloud-edge cooperative scene provided by the invention;

fig. 5 is a graph comparing the mean square error results of a personalized privacy data derivation algorithm DRPP and a OUE coding-based DRPP algorithm OUE-DRPP under different privacy classes on an add data set in a preferred embodiment of a personalized differential privacy protection method under a cloud-edge cooperative scene.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the purpose of facilitating an understanding of the embodiments of the invention, reference will now be made to the drawings of several specific embodiments illustrated in the drawings and in no way should be taken to limit the embodiments of the invention.

The invention provides a personalized differential privacy protection method under a cloud edge cooperative scene, which solves the problems that a differential privacy protection model under a current moving edge computing scene cannot locally provide personalized differential privacy protection service, estimation errors are high due to reduction of level privacy data samples caused by personalized differential privacy, and the like.

Referring to fig. 1, the invention provides a personalized differential privacy protection method under cloud-edge cooperative scene, which comprises the following steps:

s3, grouping the data of the user data set and the privacy data set with the privacy class, and encoding the data of each group to obtain an estimation result of the user privacy histogram.

The invention provides a personalized local differential privacy protection model based on cloud edge collaboration, which is shown in figure 2. It is assumed that users within the same edge computing node range need the same privacy protection level, and that users within different edge node ranges need different privacy protection levels. In the cloud-edge collaboration scenario, the edge computing node can provide more personalized and rapid services for the user population within the node range by virtue of being geographically closer to the user side and having higher computing power. And each user sends the protected data to the edge computing node in the range of the user according to the local privacy protection budget, and at the moment, the edge computing node immediately analyzes the collected data to obtain statistical characteristics which are more in line with the user data in the node, and the statistical results only exist locally at the edge node and provide personalized service for the user according to the statistical results.

Cloud computing centers have more powerful computing power and are often used to statistically analyze data for a wide range of users. However, protection data generated based on different privacy protection levels is often collected by cloud computing centers. These data cannot be used directly for statistical analysis because there are particularly large errors in the statistics, which make the data useless. However, the data with low privacy level often contains information with high privacy level data, which can derive the high privacy level data from the low privacy level data, so as to increase the sample of the higher privacy level data, thereby reducing the estimation error of the statistics result.

The personalized differential privacy protection method based on cloud edge cooperation mainly comprises the following three steps: data collection, data derivation, histogram estimation. The specific implementation process is as follows:

step 1: and (5) data collection. Under the cloud-edge cooperative scene, the protection degree of different user groups on own data is different, some users hope to reduce the security and exchange better service quality, and the users pay more attention to privacy security. This requires a personalized local differential privacy protection model (Personalized Local Differential Privacy, PLDP) to be built for each user.

Under the assumption that N users exist in a cloud edge cooperative scene, the privacy data set of each user u is Z _u The privacy class is tau, and the privacy budget corresponding to each privacy class is epsilon ^τ Epsilon is a privacy budget set, and privacy data after noise addition of user u by using class tau isThe number of the edge nodes is k, and the privacy protection level of the ith edge node is l _i 。

The PLDP may be described as: let t= { T ₁ ,t ₂ ,...,t _k The value domain is a discrete and limited data attribute value domain, and the user u has own data T epsilon T and required differential privacy budgetWherein ε is ^u ∈ε＝{ε ¹ ,...,ε ⁿ U=1, 2,..n. If for any T' e T of user u, the random algorithm M satisfies the condition:the random algorithm M satisfies epsilon for user u ^u Personalized local differential privacy, where Z _u All private data of user u is contained.

Unlike conventional local differential privacy (Local Differential Privacy, LDP) which sets a global privacy preserving budget for all users, differential privacy budget in PLDP is determined by user u, meaning that control of privacy is handed back to the user. In addition, personalized differential privacy has the property of serial combination property and post-processing invariance.

Step 2: and (5) data derivation. The edge computing node not only serves users under the node, but also should provide the collected privacy data and the personalized privacy protection level in the edge node to the cloud computing center. The privacy protection levels of users under the same edge node are the same, so that the problem of fusion of privacy data with different privacy levels does not exist. However, the cloud computing center needs to collect user data under different edge computing nodes to obtain a larger range of data statistics characteristics, and at this time, privacy data collected in the cloud computing center cannot be effectively subjected to information statistics due to different protection levels.

One possible solution to this problem is a data amount expansion policy based on a data derivation technique, which generates a plurality of levels of privacy data from one privacy level of data sent by a user, increases the data amount of a certain privacy level without decreasing the data amount of other privacy levels, and stably improves the final estimation accuracy.

In personalized private data derivation (Data Recycle with Personalized Privacy, DRPP), the calculation of the derived data is closely related to the perturbation mode of the encoding. The present invention uses optimized unary coding (Optimized Unary Encoding, OUE), which is an optimized scheme for binary coding, with less variance than other coding schemes OUE. OUE is used in combination with DRPP to yield the data derivation algorithm OUE-DRPP. The OUE-DRPP data derivation scheme can enable the cloud computing center to effectively count information of collected data with different privacy protection levels.

Step 3: histogram estimation, as shown in fig. 3. Histogram estimation refers to grouping collected data, then counting the number of data falling in each group, and distributing by means of a histogram. First, data needs to be encoded and represented, and common encoding methods include a hash map-based cloth Long Bianma, histogram encoding, and unary encoding. The coding mode adopted is different according to the requirements of different differential privacy classes and the requirements of data transmission quantity. Generally, the purpose of encoding is to make the data fall into the corresponding group according to the set grouping range, then randomly perturb, count the number of each group of data in the output data, and then correct the number of each group of data to obtain the unbiased estimation of the real data quantity of each group.

The personalized differential privacy protection model obtained through the steps provides personalized services for users on the premise of meeting the personalized local differential privacy protection requirements of the users on the edge computing side; on the cloud computing center side, the utilization rate of data collected by the cloud and the accuracy of estimation are improved under the condition that the personalized differential privacy of a user is not damaged.

In a preferred embodiment, the data acquisition in step S1 specifically includes the following procedures:

for N users, let the true binary vector of each user u be X _u The privacy class of each user u is tau, and the privacy budget corresponding to the privacy class of each user u is epsilon ^τ The privacy budget set is epsilon, the number of edge nodes is k, and the privacy protection level of the ith edge node is l _i ；

Further, the specific process of data derivation in step S2 is as follows:

Let the k-dimensional all 0 vector X _u ＝[0,0,...,0] _k Is 1to obtain X _u [i]＝1；

Through type

Calculating to obtain privacy data set of user u after using grade tau and adding noiseIn the formula, pr is an abbreviation of "Probability" and means "Probability". The meaning of this formula is when X _u When the j-th position of (1) is 1, the privacy dataset +.>The probability of the j-th position of 1 is +.>When X is _u When the j-th position of (2) is 0, the privacy dataset +.>The probability of the j-th position of 1 is

By initializing operation orderFormula->Obtaining a derived sub-datasetBy τ=1 to m and r>τto m, repeating the sub-step process to obtain derived private data collectionThis->Is a summary of derived privacy datasets for each privacy class. In the subsequent histogram estimation, data of what privacy level is needed can be obtained from G ⁺ Is taken out. Privacy data set with privacy class τ, e.g. by DR algorithm +.>Its positive correspondence G ⁺ Is->

Based on sub-data set G _τ Obtaining the privacy level data set through DR algorithm calculation

Further, the specific process of histogram estimation is as follows:

for user data sets and privacy level data setsGrouping statistics is carried out to obtain the sample number of tau after data derivation

The value range t= { T of the user ₁ ,t ₂ ,...,t _k -and the actual data distribution p= [ P ] of the user ₁ ,p ₂ ,...,p _k ]The method comprises the steps of carrying out a first treatment on the surface of the In (1) the->Representing data as t _k The samples of (2) account for the total number of current samples->Ratio of->Representing the number of samples for which the kth bit is 1;

through type

Calculating to obtain an unbiased estimate of P [ j ], j=1,;

through type

Obtaining a histogram estimation result with the privacy protection level tau;

through type

The invention also provides an embodiment for exemplarily displaying the execution process and the processing effect of the method provided by the invention.

In the invention, a personalized differential privacy protection method based on cloud edge collaboration is shown in fig. 2. Assume that under the cloud-edge cooperative scene, N users exist, and each user uThe true binary vector is X _u The privacy dataset is Z _u The method comprises the steps of carrying out a first treatment on the surface of the Privacy class τ, privacy budget for each privacy class ε ^τ Epsilon is a privacy budget set, and privacy data after noise addition of user u by using class tau isThe number of the edge nodes is k, and the privacy protection level of the ith edge node is l _i The histogram estimation result of the original data is P, the histogram estimation result of the privacy data is +.>

The personalized local differential privacy protection model PLDP may be described as: let t= { T ₁ ,t ₂ ,...,t _k The value domain is a discrete and limited data attribute value domain, and the user u has own data T epsilon T and required differential privacy budgetWherein ε is ^u ∈ε＝{ε ¹ ,...,ε ⁿ U=1, 2,..n. If for any T' e T of user u, the random algorithm M satisfies the condition: />The random algorithm M satisfies epsilon for user u ^u Personalized local differential privacy, where Z _u All private data of user u is contained.

After the data collection is completed by using the local differential privacy protection model, personalized differential protection based on OUE codes is further designed, and a histogram estimation scheme is provided for the edge nodes, so that more personalized and rapid service is provided for users.

First, introducing OUE coding process of data, and taking data value t= { T ₁ ,t ₂ ,...,t _k Usually discrete and finite. But may also be a bounded continuous value, in which case the continuous data would need to be discretized first, since the ultimate goal of data collection isThe effective statistical data distribution histogram, so taking the length of the histogram cell interval as the discretization step length of the data value range does not affect the effectiveness of the data and the accuracy of estimation. For simplicity and without loss of generality, only the case where the user takes a value as discrete data will be discussed herein.

First, for any one edge node, assuming that the privacy protection level used by all users in its collection range is τ, algorithm 1 is employed for the privacy protection output of any one user in the node range.

The OUE code is shown below to satisfy ε ^τ -PLDP：

I.e.The OUE code thus fulfils for any user u

Table 1: OUE coding flow based on personalized local differential privacy

In a cloud computing center, using privacy data of a single privacy protection level is not sufficient to effectively estimate the overall data characteristics, as the privacy data is dispersed under different differential privacy levels. Therefore, the OUE and DRPP are combined, namely OUE-DRPP data derivation algorithm can derive data under different privacy classes from the data of the privacy classes provided by the user, increase the data quantity of a certain privacy class without reducing the data quantity of other privacy classes, and stably improve the final estimation accuracy.

Algorithm 2 gives the flow of the OUE-DRPP data derivation algorithm.

Table 2: OUE-DRPP data derivation algorithm flow

In the DRPP algorithm, the cloud computing center firstly groups the collected privacy data according to privacy classes, and establishes a privacy data set for each privacy class, which is marked as G= [ G ] ₁ ,G ₂ ,...,G _τ ,...,G _m ]And adds all privacy data with privacy class τ into data set G _τ . In addition, each derived privacy class data set is recorded asIn addition->

Algorithm 3 gives a flow of the data derivation algorithm DR and fig. 3 gives a specific example of data derivation. Fig. 4 shows an example of a process of the DR algorithm:

suppose that it is first necessary to derive the corresponding privacy data when its privacy level is L2Since only the privacy class in the data is L4<L2 data, so privacy data corresponding to the privacy level L2 can be directly generated by the 8 th-10 th lines in the DR algorithm>And adds the derived version to the user's private data set Zu. If the next step is to derive the corresponding privacy data +.>Since the privacy data set of the user now contains privacy data with privacy levels L4 and L2, it is necessary to use the data of both the L4 privacy level and the L2 privacy level +.>And->Deriving data with privacy level L3 according to lines 13-20 in DR algorithm +.>And will->Added to the user privacy dataset Zu. />

/>

Table 3: flow of DR algorithm

In Table 3, intermediate temporary variable Z _sup Temporarily stored is a privacy data set of privacy class i, where i is the user privacy data set Z _u A infinitesimal of privacy classes present in the system. Z is Z _inf Is defined similarly. And are therefore referred to in this embodiment as a first intermediate temporary variable and a second intermediate temporary variable.

In the table, j and d in [ j ] and [ d ] are position index symbols of vectors, and are not different, and any letter representation is possible, such as i, j, k, d in other steps, and the like.

The data based on OUE coding still meets epsilon after the privacy data with the privacy grade tau is obtained by a data derivation algorithm ^τ Requirements of PLDP, and similarly, data samples of other privacy classes derived may prove to fulfill the requirements of PLDP.

Next, the collected data is subjected to histogram estimation at the cloud as shown in fig. 3. Due to the requirement of differential privacy protection, only users with relaxed privacy protection requirements can provide valid additional data. Let n be used when the cloud collects the private data _τ Representing the number of samples provided by a user with a privacy protection level τ, anThe number of samples of τ after data derivation is: />The value range of the user is still: t= { T ₁ ,t ₂ ,...,t _k The actual data distribution of the user is: p= [ P ] ₁ ,p ₂ ,...,p _k ]Wherein->Representing data as t _k The samples of (2) account for the total number of current samples->Is a ratio of (2). The data statistics result isWherein->Indicating the number of samples for which the kth bit is "1". Then get P [ j ]]The unbiased estimate of k is:

the result of the histogram estimation with privacy protection level τ is:

/>

the estimation Error of the histogram estimation is typically evaluated with a Mean-Square Error (MSE), which is defined as follows:wherein->Representing the result of histogram estimation from privacy data, p= [ P ] ₁ ,p ₂ ,...,p _k ]Representing the transmission frequency of the real data. The mean square error of the histogram estimate at any privacy preserving level τ is:

the factors which are decisive for the histogram estimation are the privacy protection level tau, the number of samples after the expansion of the privacy level tau and the value range of the user data, which can be obtained by means of the analysis of the mean square error.

In order to evaluate the effectiveness of the model proposed by the present invention, simulation experiments were performed using an Adult dataset, which is one of the most popular datasets for UCI. The system environment of the simulation experiment is Intel (R) Core (TM) i5-7300HQ CPU,2.9GHz,64.0G RAM,Ubuntu18.04, and the simulation tool is JetBrains PyCharm 2019.2.2x64.

First, in order to verify the effectiveness of OUE coding on the lifting algorithm, the invention sets the personalized differential privacy protection levels to L1-L10 as 10 different levels altogether, L1 represents the highest level (privacy protection budget minimum), L10 represents the lowest level (privacy protection budget maximum), and the privacy protection budget of level L1 is 0.1, after which the privacy protection budget of each privacy level is increased by 0.1, so the privacy protection budgets corresponding to these 10 privacy protection levels are 0.1-1. In the simulation experiment, it is assumed that 10 privacy classes are uniformly selected by the user, i.e., the number of users under each privacy class is equal.

Fig. 5 shows mean square error comparing DRPP with OUE-DRPP, respectively, at 10 privacy protection levels. From the experimental results, the mean square error of the OUE-DRPP algorithm is always better than that of the DRPP algorithm at any privacy protection level, because the OUE code obtains the probability of data disturbance by solving the minimum value of the code variance, and therefore the noise variance generated by the code is smaller than that generated by the unitary code. The OUE coding adopted by the invention can effectively reduce the mean square error of histogram statistics and improve the accuracy of the personalized local differential privacy model by reflecting the data of fig. 4.

On the other hand, in the 10 privacy classes set, the mean square error of the histogram estimation under the L7 class is the smallest, which proves that after the data derived spreading samples are adopted, a minimum value must exist in the mean square error of the histogram estimation under different differential privacy classes, and only the privacy protection class corresponding to the minimum value is found first, and then only the data of the privacy class is spread in the data derived algorithm, so that the calculation amount is greatly reduced.

In summary, the personalized differential privacy protection method under the cloud-edge cooperative scene provided by the invention provides a personalized local differential privacy protection model based on cloud-edge cooperation, which designs personalized differential protection based on optimized unitary coding and provides a histogram estimation scheme for edge nodes. When the statistics of the data with different privacy classes is carried out, a personalized privacy data derivation algorithm is used, the estimation precision is improved, and more personalized and rapid service is provided for the user. The method provided by the invention has the following beneficial effects:

Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.

From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part. The apparatus and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The personalized differential privacy protection method under the cloud edge cooperative scene is characterized by comprising the following steps of:

s2, based on the user data set, performing data deriving operation through a data deriving algorithm combining unitary encoding of the local differential privacy protection model and personalized privacy data deriving to obtain a privacy data set with privacy class;

s3, grouping the data of the user data set and the privacy data set with privacy class, and encoding the data of each group to obtain an estimation result of a user privacy histogram;

2. The method according to claim 1, wherein step S1 comprises:

for N users, let the true binary vector of each user u be X _u The privacy class of each user u is tau, and the privacy budget corresponding to the privacy class of each user u is epsilon ^τ Privacy budget setEpsilon, k edge nodes and l privacy protection level of the ith edge node _i ；

Let t= { T ₁ ,t ₂ ,...,t _k A discrete and limited data attribute value field in which user u has its own data T e T and the differential privacy budget required for encrypted transmissionsDifferential privacy budget->Middle epsilon ^u ∈ε＝{ε ¹ ,...,ε ⁿ },u＝1,2,...,n。

3. The method according to claim 2, wherein step S2 comprises:

value t based on user u in the user dataset _i ∈T＝{t ₁ ,t ₂ ,...,t _k And privacy protection class τ, by initializing to obtain a k-dimensional all 0 vector X _u ＝[0,0,...,0] _k ；

Through type

Calculating to obtain privacy data set of user u after using grade tau and adding noiseWherein Pr is an abbreviation for Probability; grouping the privacy data in the user data set according to the corresponding privacy class, and constructing a privacy data set G= [ G ] for each group ₁ ,G ₂ ,...,G _τ ,...,G _m ]The user data setIn which all privacy data of privacy class tau are added to the sub-data set G _τ ；

By initializing operation orderFormula->Obtaining a derived sub-dataset +.>By τ=1 to m and r>τto m, repeating the process of executing this substep to obtain the derived privacy data aggregate of each privacy class +.>

Based on the sub-data set G _τ Obtaining a privacy data set with a privacy class tau through DR algorithm calculation

4. A method according to claim 3, wherein said sub-data set G is based on _τ Obtaining a privacy data set with a privacy class tau through DR algorithm calculationComprising the following steps:

E2 through typeDetermination ofHousehold privacy dataset Z _u An upper bound for privacy class j that exists in (a);

e4 through typePrivacy data with privacy class τ are added to +.>And Z _u In (a) and (b);

e6 let k be dataIs traversed from 1 by circularly executing this substep;

E10 let k be dataIs traversed from 1 by circularly executing this substep;

E13 through typeWherein->Calculate and get proper->The d-th position of (2) is given a value Z _sup [d]When the probability is +.>When->The d-th position of (2) takes the value Z _inf [d]When the probability is +.>

E14 through typeThe privacy data with the privacy class tau obtained by the calculation in the substep are respectively added to +.>And Z _u In obtaining privacy class τ privacy dataset +.>

5. The method according to claim 4, wherein step S3 comprises:

for the user data set and privacy class data setGrouping statistics is performed to obtain the number of samples of τ after data derivation +.>The value range t= { T of the user ₁ ,t ₂ ,...,t _k -and the actual data distribution p= [ P ] of the user ₁ ,p ₂ ,...,p _k ]The method comprises the steps of carrying out a first treatment on the surface of the In (1) the->Representing data as t _k The samples of (2) account for the total number of current samples->Ratio of->Representing the number of samples for which the kth bit is 1;

through type

Calculating to obtain an unbiased estimate of P [ j ], j=1,;

through type

Obtaining a histogram estimation result with the privacy protection level tau;

through type

And calculating and obtaining the mean square error of the histogram estimation result with the privacy protection level tau.