CN116611101A

CN116611101A - Differential privacy track data protection method based on interactive query

Info

Publication number: CN116611101A
Application number: CN202310204635.2A
Authority: CN
Inventors: 王国军; 冯光辉; 彭滔; 邢萧飞; 陈淑红; 李雨婷
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-08-18

Abstract

The invention relates to the technical field of privacy track data protection, and discloses a differential privacy track data protection method based on interactive inquiry, wherein a differential privacy track data protection model is used for reducing inquiry result sensitivity through increasing noise, so that an intruder cannot find out the increase or decrease of track data records from output information, thereby ensuring the security of user privacy track data information in a data set, using epsilon-DP, and using f for an attacker's inquiry algorithm: D-R ^d It is shown that the whole process from inputting a trajectory dataset to outputting a real vector D supports a random mechanism, D and D' being two adjacent trajectory datasets. Compared with the prior art, the differential privacy track data protection method based on the interactive query has the advantages that the correlation information hidden between track data sets is found out by using the independence processing of the differential privacy correlation rule, the track data is deeply compressed by using a shared prefix tree method, and the redundant track number is removedAccordingly, the track data processing efficiency is improved.

Description

Differential privacy track data protection method based on interactive query

Technical Field

The invention relates to the technical field of privacy track data protection, in particular to a differential privacy track data protection method based on interactive inquiry.

Background

Along with the continuous development of science and technology, the information age comes, the application range of big data is gradually expanded, the Internet enables people to live conveniently and rapidly, the requirement on data inquiry in daily application is more and more, how to strengthen personal information security is the problem to be solved urgently at present when a user applies a network, the most basic data inquiry in daily is linear inquiry, wherein the non-interactive inquiry and the interactive inquiry are contained, the response speed of the interactive inquiry is high, the real-time online processing can be realized, but a large amount of trace data can be generated during the inquiry, a large amount of text to be processed appears, the differential privacy trace data needs a long processing time to be protected, and the efficiency is extremely low.

The method is characterized in that a plurality of ' master models ' which are composed of different data sets and depend on the sensitive data are established through a ' black box ' model, encryption processing is carried out on the master models ', and ' master models ' which are related to the sensitive data are obtained through ' transfer learning of the master models ', so that a data set which is not required to be protected ' free models ' and is related to the sensitive data can be obtained, and an attacker can only inquire the unimportant ' free models ' when collecting information, and cannot acquire private data.

(1) An information hierarchical privacy protection mechanism is adopted. The classification tree algorithm is utilized to classify the users and the access rights into a privacy protection mechanism, so that the problem of rapid consumption of privacy budget in the interactive information query process can not be solved.

(2) The existing algorithm ignores the data availability requirements of users of different grades in the interactive information query process, and the processed data is low in availability and poor in information value.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a differential privacy track data protection method based on interactive query, which has the advantages of high efficiency and high speed and solves the problems of low efficiency, low data availability and poor information value.

(II) technical scheme

In order to achieve the effects of high efficiency and high speed, the invention provides the following technical scheme: a differential privacy track data protection method based on interactive query comprises the following steps:

s1, redundant data deleting method based on differential privacy association rule

The differential privacy track data protection model is characterized in that the increase and decrease of track data records cannot be found out from output information by an intruder by increasing the sensitivity of noise reduction query results, so that the security of the user privacy track data information in the data set is ensured;

using epsilon-DP, the attacker's query algorithm uses f: D-R ^d The whole process from input track data set to output real number vector D supports a random mechanism, D and D' are two adjacent track data sets, if f is the differential privacy track data set of any one outputThe method meets the requirement of the formula (1), and shows that the algorithm meets epsilon-DP, epsilon is privacy budget in the formula (1), and the smaller epsilon is, the smaller the frequency of attributes among track data sets is after the adjacent track data sets are processed, the more similar the differential privacy effect is proved, and an attacker can hardly judge which track data set the differential privacy track data to be stolen is specifically in, so that the track data protection degree is increased.

Pr[f(D)∈S]≤e ² ×Pr[f(D′)∈S] (1)

In order to determine useful information of track data to be protected in a track data set and reduce the frequency of extracting original track data, the method provides a data independence processing technology based on the relevance among information, and judges whether the track data is independence information or not through a differential privacy relevance rule;

calculating the global sensitivity of the trajectory dataset if f: D-R ^d ，Δf＝max D，D′||f(D)-f(D′)|| ₁ This function is called the global sensitivity of f;

judging whether the information in the current network is irrelevant information or not through the global sensitivity degree, carrying out deep compression on a track data set by using a method of sharing prefixes, constructing an FP-tree, then highlighting a high-frequency item set by using a frequent item growth method according to the FP-tree, reducing the excavation time consumption, hiding a lot of relevant information in the track data set, finding out the relevant modes among the relevant information by using an irrelevant processing model, and removing redundant track data in the interactive query process in the mode, wherein the method for removing the redundant track data is divided into 6 stages.

S2, matrix decomposition method based on combination property

The method uses the sequence and parallel combinability of differential privacy to determine the differential requirements of the track data, establishes a parallel gradient descent matrix decomposition model according to a low-rank mechanism and an alternate direction multiplication method, increases the differential privacy matrix decomposition efficiency and improves the speed of the whole algorithm;

the differential privacy track data protection model reduces query result sensitivity by adding controlled noise. The more the sub-algorithm and complex function need to use the differential privacy budget, the more the noise accumulation is, and exceeding a certain level will cause track data distortion, reduce track data availability, and in some queries, (ε) _i The value of delta-DP is strictly required, and for this purpose, the method selects a model with less limitation to reduce noise;

let (∈, δ) -DP be the differential privacy constraint, when 0 < "When delta is less than 1, the algorithm f on any two adjacent track data sets outputs the random outputSatisfying the requirement of formula (2), then f is said to satisfy (∈, δ) -DP.

Pr[f(D)∈S]≤e ^ε ×Pr[f(D′)∈S]+δ (2)

The content in the track data set is irrelevant, and the (∈delta) -DP can be used, if this condition cannot be satisfied, even if the differential privacy can be realized, the information pirate can still obtain differential privacy track data according to the data correlation, and at this time, privacy definition needs to be applied: epsilon-buffer Fish privacy, f: D-R ^d For an algorithm that can be queried randomly, (S, Q, Θ) is a privacy framework, Q e s×s is a secret trace data set, and Θ is a trace data distribution set. If the arbitrary distribution θ∈θ, arbitrary trajectory data (S _i ，S _j )∈Q(Pr(S _i /θ)≠0，Pr(S _j # -. Theta. Noteq.0), any output ω∈Range (f) where f meets the requirement of equation (3), then f satisfies ε -Puffer Fis privacy in (S, Q, Θ).

e ^-ε ≤Pr[f(X)＝ω/S _i ，θ]/Pr[f(X)＝ω/S _j ，θ]≤e ^ε (3)

In the formula, Θ is not only the maximum track data attack force of an attacker, but also the correlation description of the track data. The above formula (3) describes any S in the secret trace dataset Q _i 、S _j After the epsilon-Puffer Fish privacy is changed, the output result has no obvious gap, and the epsilon-DP requirement is not strict. Here, θ represents all the trace data distribution sets, S represents all the trace data sets, and Q represents the cartesian product in S, a special case of epsilon-buffer fishe privacy is epsilon-DP.

After the track data difference requirement is defined, deleting the load matrix established according to the initial result, obtaining an irrelevant load matrix according to the track data independence processing, and then decomposing the irrelevant load matrix. Using low rank mechanism, the gradient of decomposed matrix B and G to decomposed matrix F is defined

B＝(βWF ^T +πF ^T )(βFF ^T +I) ^-1 (4)

The parallel gradient descent matrix decomposition algorithm decomposes W into a plurality of matrixes according to the characteristics of the matrixes, calculates the matrixes on each node, accelerates the differential privacy track data protection speed, and comprises the following operation steps:

(1) Establishing an initial load matrix according to the query requirement of a user;

(2) Deleting redundant track data in the initial load matrix according to the track data independence processing model to obtain an irrelevant load matrix;

the irrelevant load matrix W generated in the second step _u×r Decomposing into z component parts, the number of lines being usedThe number of columns is denoted by r and the number of distributed system nodes is denoted by z. Calculating a decomposition matrix process by using a map process of distributed calculation and a reduce process of cloud calculation, and finally obtaining B and F;

s3, self-adaptive noise track data protection based on differential privacy noise mechanism

For track data needing differential privacy protection, an exponential mechanism and a Laplace mechanism are most commonly applied, the exponential mechanism is commonly applied to discrete data protection, track data output is limited according to a scoring function principle, the numerical data protection is applied to the Laplace mechanism, global sensitivity can influence the noise amount added in a differential privacy model, and the query result is added with data conforming to Laplace ^[14] The track data safety is increased, and differential privacy track data protection is realized;

laplace mechanism: f: D-R is an arbitrary function, if the random algorithm A accords with the formula (6), the A accords with E-DP;

A(D)＝f(D)+Lap(Δf/ε) (6)

in the formula, lap (delta f/epsilon) is Laplacian noise to be added, and the larger the noise amount is, the larger the global sensitivity is, and epsilon is smaller;

the most important step of the exponential mechanism is to establish a scoring function u (D, r) (r E O), and the output item of the output domain O is r;

an exponential mechanism: u: (D x O) to R is a scoring function for the trace dataset D, if the random algorithm A accords with the formula (7), it is indicated that A accords with E-DP;

where Δu is the maximum output value of the scoring function, i.e., the global sensitivity. It can be analyzed that the scoring quantity is proportional to the selected output probability;

according to the noise mechanism rule, the trace data is subjected to differential privacy protection, noise which can be suitable for Laplace distribution is required to be added to reduce the sensitivity of a query result, and according to the noise addition rule, after a large amount of noise is added to the trace data with high sensitivity, the information usability is reduced, so that a proper noise adding amount is required to be selected, the selection of E in a differential privacy model determines the trace data protection effect, and the smaller the selected E value is, the safer the user information is, but the larger the noise adding amount is; the larger the selected epsilon value is, the noise adding amount can be reduced, but the differential privacy track data protection effect is poor, and the information security is low. Therefore, a reasonable epsilon value is found by the comprehensive formulas (2) and (3), and the added noise quantity and the differential privacy track data protection effect can be considered;

the method selects the self-adaptive noise model, and combines under the premise of considering the user use permission (the higher the user permission level is, the closer the E is to the maximum value)And (3) determining a reasonable value of the added noise under the condition. The whole process is described as follows:

(1) By passing throughCondition limiting epsilon maximum value, selecting according to user authority levelTaking a proper epsilon privacy budget;

(2) Respectively adding noise into the track data set D and the load matrix decomposition result F by using a Laplace mechanism;

(3) Restoring irrelevant attribute track data deleted according to the track data independence processing model;

(4) And feeding back the query result to the user.

Preferably, in the step S1, the global means that the query algorithm is applied to the adjacent track data sets D and D' to output the maximum range of track data, and the distance L between the two ₁ The method is characterized in that the method is used for measuring a numerical value, is determined by f, is not influenced by a track data set, and under the condition that epsilon is consistent, the global sensitivity is in direct proportion to the noise adding amount in f and also in direct proportion to the privacy degree of the track data.

Preferably, in the step S1, the 6 stages are: (1) Scanning the track data set to obtain the frequency of each attribute, and sorting the attribute according to the frequency from high to low to obtain a descending list; (2) Setting m as the minimum supporting frequency, and removing all values smaller than m according to a descending order table; (3) Placing the descending list into a prefix tree, and forming a linked list by the first occurring nodes, wherein the FP-tree is built in the process; (4) Integrating the FP-tree by using a track data independence processing model; (5) Judging whether the paths of the leaf nodes are single, if not, returning to the previous step, and reconstructing a prefix set on each path to generate a brand new FP-tree; if so, deleting the leaf node to directly generate a prefix path set; (6) And finally, obtaining an interactive query association mode, namely the obtained prefix path set, and completing the whole process of redundant track data.

Preferably, in the step S2, sequence combination, { A ₁ ，A ₂ ，…，A _n N algorithms, D is a specified trajectory dataset, A _i For any algorithm, if A is on the trajectory dataset D _i Conforming to epsilon _i DP, then { a } ₁ ，A ₂ ，…，A _n Is just to satisfyIs a combination of sequential sequences of (a).

Preferably, in the step S2, parallelism and combinability ^[10] ，{D ₁ ，D ₂ ，…，D _n The arbitrary algorithm { A } is performed for n disjoint subsets of the trajectory dataset D ₁ ，A ₂ ，…，A _n Acting on each set if A _i Satisfy epsilon _i DP, then it can be said that { a } ₁ ，A ₂ ，…，A _n The } is a sum of (max epsilon) _i ) -parallel sequence combination of DPs.

Preferably, in the step S2, the formula (2) may be explained as: if the (1-delta) probability of f matches E-DP, then the (. Epsilon., delta.) DP can be approximated as differential privacy.

Preferably, in the step S3, in the formula (5), W is a load matrix, β is a load matrix coefficient, and the result of B is related to F only.

(III) beneficial effects

Compared with the prior art, the invention provides a differential privacy track data protection method based on interactive query, which has the following beneficial effects:

1. compared with the prior art, the differential privacy track data protection method based on the interactive query has the advantages that the correlation information hidden between track data sets is found out by using the independence processing of the differential privacy correlation rule, the track data is deeply compressed by using a shared prefix tree method, redundant track data is removed, and the track data processing efficiency is improved.

2. According to the method for protecting the trace data based on the interactive query difference privacy, the trace data difference requirement is defined, the load matrix established according to the initial result is deleted, the irrelevant load matrix is obtained according to the trace data independence processing, and then the irrelevant load matrix is decomposed. And the gradient of the decomposed matrix is definitely decomposed by using a low-order mechanism, and the matrix is decomposed into a plurality of matrixes by using a parallel gradient descent matrix decomposition algorithm, so that the matrixes are calculated on each node, and the differential privacy track data protection speed is increased.

3. According to the differential privacy track data protection method based on interactive query, the self-adaptive noise model is used for limiting the maximum value of privacy budget, the reasonable noise is added into the differential privacy track data by using the Laplacian mechanism according to the user permission level value, differential privacy track data protection is realized, and the user query is responded.

Drawings

FIG. 1 is a system protection model;

FIG. 2 is a step diagram of removing redundant trace data during an interactive query;

fig. 3 is an exploded matrix flow diagram.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-3, the present invention provides a technical solution: a differential privacy track data protection method based on interactive query comprises the following steps:

Differential privacy track data protection model ^[6] The sensitivity of the query result is reduced by increasing noise, so that an intruder cannot find out the increase or decrease of track data records from output information, and the security of user privacy track data information in the data set is ensured.

Using epsilon-DP, the attacker's query algorithm uses f: D-R ^d The whole process from input track data set to output real number vector D supports a random mechanism, D and D' are two adjacent track data sets, if f is the differential privacy track data set of any one outputAll meet the requirement of the formula (1), which shows that the algorithm meets epsilon-DP, epsilon is privacy budget in the formula (1), and the smaller epsilon is, the smaller the frequency of attributes among track data sets after the processing of adjacent track data sets is, and the more similar the differential privacy effect is provedIt is difficult for an attacker to judge in which track data set the differential privacy track data to be stolen is specifically present, so as to increase the protection degree of the track data.

Pr[f(D)∈S]≤e ^ε ×Pr[f(D′)∈S] (1)

In order to determine useful track data information to be protected in a track data set and reduce the number of times of extracting original track data, the method provides a data independence processing technology based on the relevance among information, and judges whether the track data is irrelevant information or not through a differential privacy relevance rule.

Calculating the global sensitivity of the trajectory dataset if f: D-R ^d ，Δf＝max D，D′||f(D)-f(D′)|| ₁ This function is called the global sensitivity of f. Global means that on the adjacent track data sets D and D', a query algorithm is applied to output the maximum range of track data, and the distance L between the two ₁ The measurement value is determined by f and is not influenced by the track data set. Under the condition that epsilon is consistent, the global sensitivity is proportional to the noise adding amount in f and also proportional to the privacy degree of track data.

Judging whether the information in the current network is irrelevant information or not through the global sensitivity degree, and carrying out deep compression on the track data set by using a method of sharing prefixes to construct the FP-tree. Then, according to the FP-tree, the high-frequency item set is highlighted by using a frequent item growing method, so that the time consumption for mining is reduced. Much of the correlation information is hidden in the trace data set, and the correlation pattern between them can be found out by using an independence processing model, in this way redundant trace data in the interactive query process is removed. The whole process is shown in fig. 2, and the method for removing redundant track data is divided into 5 stages.

1) The trace data set is scanned to obtain the frequency of each attribute, and the attribute is sorted from high frequency to low frequency to obtain a descending list.

2) Setting m as the minimum supporting frequency, and removing all values smaller than m according to the descending order table.

3) The descending list is put into the prefix tree, and the first occurring nodes form a linked list, and the process builds the FP-tree.

4) And integrating the FP-tree by using a track data independence processing model.

5) Judging whether the paths of the leaf nodes are single, if not, returning to the previous step, and reconstructing a prefix set on each path to generate a brand new FP-tree; if so, the leaf node needs to be deleted, and a prefix path set is directly generated.

6) And finally, obtaining an interactive query association mode, namely the obtained prefix path set, and completing the whole process of redundant track data.

S2, matrix decomposition method based on combination property

The track data amount in the interactive query is large, the query rounds are more, and the differential privacy track data protection degree can be reduced. The method uses the sequence and parallel combinability of differential privacy to determine the differential requirement of track data, establishes a parallel gradient descent matrix decomposition model according to a low-rank mechanism and an alternate direction multiplier method, increases the differential privacy matrix decomposition efficiency, and improves the speed of the whole algorithm.

Sequence combinability, { A ₁ ，A ₂ ，…，A _n N algorithms, D is a specified trajectory dataset, A _i For any algorithm, if A is on the trajectory dataset D _i Conforming to epsilon _i DP, then { a } ₁ ，A ₂ ，…，A _n Is just to satisfyIs a combination of sequential sequences of (a).

Parallel combinability, { D ₁ ，D ₂ ，…，D _n The arbitrary algorithm { A } is performed for n disjoint subsets of the trajectory dataset D ₁ ，A ₂ ，…，A _n Acting on each set if A _i Satisfy epsilon _i DP, then it can be said that { a } ₁ ，A ₂ ，…，A _n The } is a sum of (max epsilon) _i ) -parallel sequence combination of DPs.

The differential privacy track data protection model reduces query result sensitivity by adding controlled noise. The more differential privacy budget is needed for the sub-algorithm and complex function, the noise ^[11] The more the accumulation is, the more than oneTo some extent, distortion of the track data will be caused, reducing the usability of the track data. In some queries, (ε) _i The value of delta) -DP is critical, for which the method selects a less restrictive model to reduce the amount of noise.

Let (∈, δ) -DP be the differential privacy constraint, when 0 < δ < 1, algorithm f on any two adjacent trace datasets output randomlySatisfying the requirement of formula (2), then f is said to satisfy (∈, δ) -DP.

Pf[f(D)∈S]≤e ^ε ×Pr[f(D′)∈S]+δ (2)

Formula (2) can be explained as: if the (1-delta) probability of f matches E-DP, then the (. Epsilon., delta.) DP can be approximated as differential privacy.

The track data set is irrelevant in content, and can only be used (epsilon, delta) -DP, if the condition can not be met, even if differential privacy can be realized, an information stealer can still obtain differential privacy track data according to the data relevance. At this time, privacy definition needs to be applied: epsilon-buffer Fish privacy, f: D-R ^d For an algorithm that can be queried randomly, (S, Q, Θ) is a privacy framework, Q e s×s is a secret trace data set, and Θ is a trace data distribution set. If the arbitrary distribution θ∈θ, arbitrary trajectory data (S _i ，S _j )∈Q(Pr(S _i /θ)≠0，Pr(S _j # -. Theta. Noteq.0), any output ω∈Range (f) where f meets the requirement of equation (3), then f satisfies ε -Puffer Fis privacy in (S, Q, Θ).

e ^-ε ≤Pr[f(X)＝ω/S _i ，θ]/Pr[f(X)＝ω/S _j ，θ]≤e ^ε (3)

In the formula, Θ is not only the maximum track data attack force of an attacker, but also the correlation description of the track data. The above formula (3) describes any S in the secret trace dataset Q _i 、S _j After the epsilon-Puffer Fish privacy is changed, the output result has no obvious gap, and the epsilon-DP requirement is not strict. Where θ represents the distribution set of all trajectory data, S represents the distribution set of all trajectory data, and Q represents the Cartesian product of SOne special case of the private of the buffer Fish is the E-DP.

B＝(βWF ^T +πF ^T )(βFF ^T +I) ^-1 (4)

In the formula (5), W is a load matrix, β is a load matrix coefficient, and the result of B is related to F only. The parallel gradient descent matrix decomposition algorithm decomposes w into a plurality of matrixes according to the characteristics of the matrixes, calculates the matrixes on each node, accelerates the differential privacy track data protection speed, and comprises the following operation steps:

1) And establishing an initial load matrix according to the query requirement of the user.

2) And deleting redundant track data in the initial load matrix according to the track data independence processing model to obtain an irrelevant load matrix.

3) The irrelevant load matrix W generated in the second step _u×r Decomposing into z component parts, the number of lines being usedThe number of columns is denoted by r and the number of distributed system nodes is denoted by z. And calculating a decomposition matrix process by using a map process of distributed calculation and a reduce process of cloud calculation, and finally obtaining B and F, wherein the specific process is shown in figure 3.

For track data needing differential privacy protection, an exponential mechanism ^[12] And Laplace mechanism ^[13] Most commonly applied. Exponential mechanisms are often applied to discrete numbersAnd according to the protection, the track data output is limited according to the principle of the scoring function. The Laplace mechanism is applied to numerical data protection, the global sensitivity can influence the noise amount added in the differential privacy model, and the query result is added with the data which accords with Laplace ^[14] The track data safety is increased, and differential privacy track data protection is realized.

A(D)＝f(D)+Lap(Δf/ε) (6)

where Δu is the maximum output value of the scoring function, i.e., the global sensitivity. The scoring component may be analyzed to be proportional to the selected output probability.

Differential privacy protection is carried out on track data according to a noise mechanism rule, and noise which can be suitable for Laplace distribution needs to be added to reduce sensitivity of query results. As is clear from the noise addition rule, adding a large amount of noise to track data having high sensitivity reduces the usability of information, and therefore an appropriate noise addition amount must be selected. The selection of the epsilon in the differential privacy model determines the track data protection effect, and the smaller the selected epsilon value is, the safer the user information is, but the larger the noise adding amount is; the larger the selected epsilon value is, the noise adding amount can be reduced, but the differential privacy track data protection effect is poor, and the information security is low. Therefore, a reasonable epsilon value is found by the comprehensive formulas (2) and (3), and the added noise quantity and the differential privacy track data protection effect can be considered.

1) By passing throughThe condition limits epsilon maximum value, and proper epsilon privacy budget is selected according to the user permission level;

2) The locus dataset D and the load matrix decomposition result F are added to the noise using the laplace mechanism, respectively.

3) And restoring the irrelevant attribute track data deleted according to the track data independence processing model.

4) And feeding back the query result to the user.

Compared with the prior art, the proposal has the advantages that the relevance processing of the differential privacy relevance rule is used for finding out the relevance information hidden between the track data sets, the track data is deeply compressed by using a shared prefix tree method, redundant track data is removed, and the track data processing efficiency is improved.

And (3) defining the track data difference requirement, deleting the load matrix established according to the initial result, obtaining an irrelevant load matrix according to the track data independence processing, and then decomposing the irrelevant load matrix. And the gradient of the decomposed matrix is definitely decomposed by using a low-order mechanism, and the matrix is decomposed into a plurality of matrixes by using a parallel gradient descent matrix decomposition algorithm, so that the matrixes are calculated on each node, and the differential privacy track data protection speed is increased.

And limiting the maximum value of the privacy budget by using the self-adaptive noise model, taking a value according to the authority level of the user, adding reasonable noise into the differential privacy track data by using a Laplacian mechanism, realizing the protection of the differential privacy track data, and responding to the inquiry of the user.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The differential privacy track data protection method based on the interactive query is characterized by comprising the following steps of:

Pr[f(D)∈S]≤e ^ε ×Pr[f(D′)∈S] (1)

calculating the global sensitivity of the trajectory dataset if f: D-R ^a ，Δf＝max D，D′||f(D)-f(D′)|| ₁ This function is called the global sensitivity of f;

S2, matrix decomposition method based on combination property

the differential privacy track data protection model reduces the sensitivity of the query result by adding controlled noise; the more the sub-algorithm and complex function need to use the differential privacy budget, the more the noise accumulation is, and exceeding a certain level will cause track data distortion, reduce track data availability, and in some queries, (ε) _i The value of delta-DP is strictly required, and for this purpose, the method selects a model with less limitation to reduce noise;

Pr[fD)∈S]≤e ^ε ×Pr[f(D′)∈S]+δ (2)

The contents in the track dataset are not relevant, so that the (∈, δ) -DP can be used if this cannot be satisfiedUnder the condition that differential privacy can be realized, the information stealer can still obtain differential privacy track data according to the data correlation, and at the moment, privacy definition needs to be applied: epsilon-buffer Fish privacy, f: D-R ^d For an algorithm that can be queried randomly, (S, Q, Θ) is a privacy framework, Q e s×s is a secret trace data set, and Θ is a trace data distribution set. If the arbitrary distribution θ∈θ, arbitrary trajectory data (S _i ，S _j )∈Q(Pr(S _i /θ)≠0，Pr(S _j # -. Theta. Noteq.0), any output ω∈Range (f) where f meets the requirement of equation (3), then f satisfies ε -Puffer Fis privacy in (S, Q, θ).

e ^-ε ≤Pr[f(X)＝ω/S _i ，θ]/Pr[f(X)＝ω/S _j ，θ]≤e ^ε (3)

In the formula, Θ is not only the maximum track data attack force of an attacker, but also the correlation description of the track data. The above formula (3) describes any S in the secret trace dataset Q _i 、S _j After the epsilon-Puffer Fish privacy is changed, the output result has no obvious gap, and the epsilon-DP requirement is not strict. Here, a special case of epsilon-buffer fishe privacy is epsilon-DP under the conditions that Θ represents all track data distribution sets, S represents all track data sets, and Q represents the cartesian product in S;

B＝(βWF ^T +πF ^T )(βFF ^T +I) ^-1 (4)

A(D)＝f(D)+Lap(Δf/ε) (6)

(1) By passing throughThe condition limits epsilon maximum value, and proper epsilon privacy budget is selected according to the user permission level;

(4) And feeding back the query result to the user.

2. The method of claim 1, wherein in step S1, the global means that the query algorithm is applied to the adjacent track data sets D and D' to output the maximum range of track data, and the distance L between the two track data sets ₁ The method is characterized in that the method is used for measuring a numerical value, is determined by f, is not influenced by a track data set, and under the condition that epsilon is consistent, the global sensitivity is in direct proportion to the noise adding amount in f and also in direct proportion to the privacy degree of the track data.

3. The method for protecting differential privacy track data based on interactive query as claimed in claim 1, wherein in the step S1, 6 stages are: (1) Scanning the track data set to obtain the frequency of each attribute, and sorting the attribute according to the frequency from high to low to obtain a descending list; (2) Setting m as the minimum supporting frequency, and removing all values smaller than m according to a descending order table; (3) Placing the descending list into a prefix tree, and forming a linked list by the first occurring nodes, wherein the FP-tree is built in the process; (4) Integrating the FP-tree by using a track data independence processing model; (5) Judging whether the paths of the leaf nodes are single, if not, returning to the previous step, and reconstructing a prefix set on each path to generate a brand new FP-tree; if so, deleting the leaf node to directly generate a prefix path set; (6) And finally, obtaining an interactive query association mode, namely the obtained prefix path set, and completing the whole process of redundant track data.

4. The method according to claim 1, wherein in the step S2, the sequence combination, { a ₁ ，A ₂ ，…，A _n N algorithms, D is a specified trajectory dataset, A _i For any algorithm, if A is on the trajectory dataset D _i Conforming to epsilon _i DP, then { a } ₁ ，A ₂ ，…，A _n Is just to satisfyIs a combination of sequential sequences of (a).

5. The method for protecting differential privacy track data based on interactive query as claimed in claim 1, wherein in step S2, parallelism and combinability are achieved ^[10] ，{D ₁ ，D ₂ ，…，D _n The arbitrary algorithm { A } is performed for n disjoint subsets of the trajectory dataset D ₁ ，A ₂ ，…，A _n Acting on each set if A _i Satisfy epsilon _i DP, then it can be said that { a } ₁ ，A ₂ ，…，A _n The } is a sum of (max epsilon) _i ) -parallel sequence combination of DPs.

6. The method of claim 1, wherein in the step S2, the equation (2) may be interpreted as: if the (1-delta) probability of f matches E-DP, then the (. Epsilon., delta.) DP can be approximated as differential privacy.

7. The method of claim 1, wherein in the step S3, W is a load matrix, β is a load matrix coefficient, and the result of B is related to F only.