CN116611101A - Differential privacy track data protection method based on interactive query - Google Patents

Differential privacy track data protection method based on interactive query Download PDF

Info

Publication number
CN116611101A
CN116611101A CN202310204635.2A CN202310204635A CN116611101A CN 116611101 A CN116611101 A CN 116611101A CN 202310204635 A CN202310204635 A CN 202310204635A CN 116611101 A CN116611101 A CN 116611101A
Authority
CN
China
Prior art keywords
track data
differential privacy
privacy
epsilon
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310204635.2A
Other languages
Chinese (zh)
Inventor
王国军
冯光辉
彭滔
邢萧飞
陈淑红
李雨婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202310204635.2A priority Critical patent/CN116611101A/en
Publication of CN116611101A publication Critical patent/CN116611101A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of privacy track data protection, and discloses a differential privacy track data protection method based on interactive inquiry, wherein a differential privacy track data protection model is used for reducing inquiry result sensitivity through increasing noise, so that an intruder cannot find out the increase or decrease of track data records from output information, thereby ensuring the security of user privacy track data information in a data set, using epsilon-DP, and using f for an attacker's inquiry algorithm: D-R d It is shown that the whole process from inputting a trajectory dataset to outputting a real vector D supports a random mechanism, D and D' being two adjacent trajectory datasets. Compared with the prior art, the differential privacy track data protection method based on the interactive query has the advantages that the correlation information hidden between track data sets is found out by using the independence processing of the differential privacy correlation rule, the track data is deeply compressed by using a shared prefix tree method, and the redundant track number is removedAccordingly, the track data processing efficiency is improved.

Description

Differential privacy track data protection method based on interactive query
Technical Field
The invention relates to the technical field of privacy track data protection, in particular to a differential privacy track data protection method based on interactive inquiry.
Background
Along with the continuous development of science and technology, the information age comes, the application range of big data is gradually expanded, the Internet enables people to live conveniently and rapidly, the requirement on data inquiry in daily application is more and more, how to strengthen personal information security is the problem to be solved urgently at present when a user applies a network, the most basic data inquiry in daily is linear inquiry, wherein the non-interactive inquiry and the interactive inquiry are contained, the response speed of the interactive inquiry is high, the real-time online processing can be realized, but a large amount of trace data can be generated during the inquiry, a large amount of text to be processed appears, the differential privacy trace data needs a long processing time to be protected, and the efficiency is extremely low.
The method is characterized in that a plurality of ' master models ' which are composed of different data sets and depend on the sensitive data are established through a ' black box ' model, encryption processing is carried out on the master models ', and ' master models ' which are related to the sensitive data are obtained through ' transfer learning of the master models ', so that a data set which is not required to be protected ' free models ' and is related to the sensitive data can be obtained, and an attacker can only inquire the unimportant ' free models ' when collecting information, and cannot acquire private data.
(1) An information hierarchical privacy protection mechanism is adopted. The classification tree algorithm is utilized to classify the users and the access rights into a privacy protection mechanism, so that the problem of rapid consumption of privacy budget in the interactive information query process can not be solved.
(2) The existing algorithm ignores the data availability requirements of users of different grades in the interactive information query process, and the processed data is low in availability and poor in information value.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a differential privacy track data protection method based on interactive query, which has the advantages of high efficiency and high speed and solves the problems of low efficiency, low data availability and poor information value.
(II) technical scheme
In order to achieve the effects of high efficiency and high speed, the invention provides the following technical scheme: a differential privacy track data protection method based on interactive query comprises the following steps:
s1, redundant data deleting method based on differential privacy association rule
The differential privacy track data protection model is characterized in that the increase and decrease of track data records cannot be found out from output information by an intruder by increasing the sensitivity of noise reduction query results, so that the security of the user privacy track data information in the data set is ensured;
using epsilon-DP, the attacker's query algorithm uses f: D-R d The whole process from input track data set to output real number vector D supports a random mechanism, D and D' are two adjacent track data sets, if f is the differential privacy track data set of any one outputThe method meets the requirement of the formula (1), and shows that the algorithm meets epsilon-DP, epsilon is privacy budget in the formula (1), and the smaller epsilon is, the smaller the frequency of attributes among track data sets is after the adjacent track data sets are processed, the more similar the differential privacy effect is proved, and an attacker can hardly judge which track data set the differential privacy track data to be stolen is specifically in, so that the track data protection degree is increased.
Pr[f(D)∈S]≤e 2 ×Pr[f(D′)∈S] (1)
In order to determine useful information of track data to be protected in a track data set and reduce the frequency of extracting original track data, the method provides a data independence processing technology based on the relevance among information, and judges whether the track data is independence information or not through a differential privacy relevance rule;
calculating the global sensitivity of the trajectory dataset if f: D-R d ,Δf=max D,D′||f(D)-f(D′)|| 1 This function is called the global sensitivity of f;
judging whether the information in the current network is irrelevant information or not through the global sensitivity degree, carrying out deep compression on a track data set by using a method of sharing prefixes, constructing an FP-tree, then highlighting a high-frequency item set by using a frequent item growth method according to the FP-tree, reducing the excavation time consumption, hiding a lot of relevant information in the track data set, finding out the relevant modes among the relevant information by using an irrelevant processing model, and removing redundant track data in the interactive query process in the mode, wherein the method for removing the redundant track data is divided into 6 stages.
S2, matrix decomposition method based on combination property
The method uses the sequence and parallel combinability of differential privacy to determine the differential requirements of the track data, establishes a parallel gradient descent matrix decomposition model according to a low-rank mechanism and an alternate direction multiplication method, increases the differential privacy matrix decomposition efficiency and improves the speed of the whole algorithm;
the differential privacy track data protection model reduces query result sensitivity by adding controlled noise. The more the sub-algorithm and complex function need to use the differential privacy budget, the more the noise accumulation is, and exceeding a certain level will cause track data distortion, reduce track data availability, and in some queries, (ε) i The value of delta-DP is strictly required, and for this purpose, the method selects a model with less limitation to reduce noise;
let (∈, δ) -DP be the differential privacy constraint, when 0 < "When delta is less than 1, the algorithm f on any two adjacent track data sets outputs the random outputSatisfying the requirement of formula (2), then f is said to satisfy (∈, δ) -DP.
Pr[f(D)∈S]≤e ε ×Pr[f(D′)∈S]+δ (2)
The content in the track data set is irrelevant, and the (∈delta) -DP can be used, if this condition cannot be satisfied, even if the differential privacy can be realized, the information pirate can still obtain differential privacy track data according to the data correlation, and at this time, privacy definition needs to be applied: epsilon-buffer Fish privacy, f: D-R d For an algorithm that can be queried randomly, (S, Q, Θ) is a privacy framework, Q e s×s is a secret trace data set, and Θ is a trace data distribution set. If the arbitrary distribution θ∈θ, arbitrary trajectory data (S i ,S j )∈Q(Pr(S i /θ)≠0,Pr(S j # -. Theta. Noteq.0), any output ω∈Range (f) where f meets the requirement of equation (3), then f satisfies ε -Puffer Fis privacy in (S, Q, Θ).
e ≤Pr[f(X)=ω/S i ,θ]/Pr[f(X)=ω/S j ,θ]≤e ε (3)
In the formula, Θ is not only the maximum track data attack force of an attacker, but also the correlation description of the track data. The above formula (3) describes any S in the secret trace dataset Q i 、S j After the epsilon-Puffer Fish privacy is changed, the output result has no obvious gap, and the epsilon-DP requirement is not strict. Here, θ represents all the trace data distribution sets, S represents all the trace data sets, and Q represents the cartesian product in S, a special case of epsilon-buffer fishe privacy is epsilon-DP.
After the track data difference requirement is defined, deleting the load matrix established according to the initial result, obtaining an irrelevant load matrix according to the track data independence processing, and then decomposing the irrelevant load matrix. Using low rank mechanism, the gradient of decomposed matrix B and G to decomposed matrix F is defined
B=(βWF T +πF T )(βFF T +I) -1 (4)
The parallel gradient descent matrix decomposition algorithm decomposes W into a plurality of matrixes according to the characteristics of the matrixes, calculates the matrixes on each node, accelerates the differential privacy track data protection speed, and comprises the following operation steps:
(1) Establishing an initial load matrix according to the query requirement of a user;
(2) Deleting redundant track data in the initial load matrix according to the track data independence processing model to obtain an irrelevant load matrix;
the irrelevant load matrix W generated in the second step u×r Decomposing into z component parts, the number of lines being usedThe number of columns is denoted by r and the number of distributed system nodes is denoted by z. Calculating a decomposition matrix process by using a map process of distributed calculation and a reduce process of cloud calculation, and finally obtaining B and F;
s3, self-adaptive noise track data protection based on differential privacy noise mechanism
For track data needing differential privacy protection, an exponential mechanism and a Laplace mechanism are most commonly applied, the exponential mechanism is commonly applied to discrete data protection, track data output is limited according to a scoring function principle, the numerical data protection is applied to the Laplace mechanism, global sensitivity can influence the noise amount added in a differential privacy model, and the query result is added with data conforming to Laplace [14] The track data safety is increased, and differential privacy track data protection is realized;
laplace mechanism: f: D-R is an arbitrary function, if the random algorithm A accords with the formula (6), the A accords with E-DP;
A(D)=f(D)+Lap(Δf/ε) (6)
in the formula, lap (delta f/epsilon) is Laplacian noise to be added, and the larger the noise amount is, the larger the global sensitivity is, and epsilon is smaller;
the most important step of the exponential mechanism is to establish a scoring function u (D, r) (r E O), and the output item of the output domain O is r;
an exponential mechanism: u: (D x O) to R is a scoring function for the trace dataset D, if the random algorithm A accords with the formula (7), it is indicated that A accords with E-DP;
where Δu is the maximum output value of the scoring function, i.e., the global sensitivity. It can be analyzed that the scoring quantity is proportional to the selected output probability;
according to the noise mechanism rule, the trace data is subjected to differential privacy protection, noise which can be suitable for Laplace distribution is required to be added to reduce the sensitivity of a query result, and according to the noise addition rule, after a large amount of noise is added to the trace data with high sensitivity, the information usability is reduced, so that a proper noise adding amount is required to be selected, the selection of E in a differential privacy model determines the trace data protection effect, and the smaller the selected E value is, the safer the user information is, but the larger the noise adding amount is; the larger the selected epsilon value is, the noise adding amount can be reduced, but the differential privacy track data protection effect is poor, and the information security is low. Therefore, a reasonable epsilon value is found by the comprehensive formulas (2) and (3), and the added noise quantity and the differential privacy track data protection effect can be considered;
the method selects the self-adaptive noise model, and combines under the premise of considering the user use permission (the higher the user permission level is, the closer the E is to the maximum value)And (3) determining a reasonable value of the added noise under the condition. The whole process is described as follows:
(1) By passing throughCondition limiting epsilon maximum value, selecting according to user authority levelTaking a proper epsilon privacy budget;
(2) Respectively adding noise into the track data set D and the load matrix decomposition result F by using a Laplace mechanism;
(3) Restoring irrelevant attribute track data deleted according to the track data independence processing model;
(4) And feeding back the query result to the user.
Preferably, in the step S1, the global means that the query algorithm is applied to the adjacent track data sets D and D' to output the maximum range of track data, and the distance L between the two 1 The method is characterized in that the method is used for measuring a numerical value, is determined by f, is not influenced by a track data set, and under the condition that epsilon is consistent, the global sensitivity is in direct proportion to the noise adding amount in f and also in direct proportion to the privacy degree of the track data.
Preferably, in the step S1, the 6 stages are: (1) Scanning the track data set to obtain the frequency of each attribute, and sorting the attribute according to the frequency from high to low to obtain a descending list; (2) Setting m as the minimum supporting frequency, and removing all values smaller than m according to a descending order table; (3) Placing the descending list into a prefix tree, and forming a linked list by the first occurring nodes, wherein the FP-tree is built in the process; (4) Integrating the FP-tree by using a track data independence processing model; (5) Judging whether the paths of the leaf nodes are single, if not, returning to the previous step, and reconstructing a prefix set on each path to generate a brand new FP-tree; if so, deleting the leaf node to directly generate a prefix path set; (6) And finally, obtaining an interactive query association mode, namely the obtained prefix path set, and completing the whole process of redundant track data.
Preferably, in the step S2, sequence combination, { A 1 ,A 2 ,…,A n N algorithms, D is a specified trajectory dataset, A i For any algorithm, if A is on the trajectory dataset D i Conforming to epsilon i DP, then { a } 1 ,A 2 ,…,A n Is just to satisfyIs a combination of sequential sequences of (a).
Preferably, in the step S2, parallelism and combinability [10] ,{D 1 ,D 2 ,…,D n The arbitrary algorithm { A } is performed for n disjoint subsets of the trajectory dataset D 1 ,A 2 ,…,A n Acting on each set if A i Satisfy epsilon i DP, then it can be said that { a } 1 ,A 2 ,…,A n The } is a sum of (max epsilon) i ) -parallel sequence combination of DPs.
Preferably, in the step S2, the formula (2) may be explained as: if the (1-delta) probability of f matches E-DP, then the (. Epsilon., delta.) DP can be approximated as differential privacy.
Preferably, in the step S3, in the formula (5), W is a load matrix, β is a load matrix coefficient, and the result of B is related to F only.
(III) beneficial effects
Compared with the prior art, the invention provides a differential privacy track data protection method based on interactive query, which has the following beneficial effects:
1. compared with the prior art, the differential privacy track data protection method based on the interactive query has the advantages that the correlation information hidden between track data sets is found out by using the independence processing of the differential privacy correlation rule, the track data is deeply compressed by using a shared prefix tree method, redundant track data is removed, and the track data processing efficiency is improved.
2. According to the method for protecting the trace data based on the interactive query difference privacy, the trace data difference requirement is defined, the load matrix established according to the initial result is deleted, the irrelevant load matrix is obtained according to the trace data independence processing, and then the irrelevant load matrix is decomposed. And the gradient of the decomposed matrix is definitely decomposed by using a low-order mechanism, and the matrix is decomposed into a plurality of matrixes by using a parallel gradient descent matrix decomposition algorithm, so that the matrixes are calculated on each node, and the differential privacy track data protection speed is increased.
3. According to the differential privacy track data protection method based on interactive query, the self-adaptive noise model is used for limiting the maximum value of privacy budget, the reasonable noise is added into the differential privacy track data by using the Laplacian mechanism according to the user permission level value, differential privacy track data protection is realized, and the user query is responded.
Drawings
FIG. 1 is a system protection model;
FIG. 2 is a step diagram of removing redundant trace data during an interactive query;
fig. 3 is an exploded matrix flow diagram.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-3, the present invention provides a technical solution: a differential privacy track data protection method based on interactive query comprises the following steps:
s1, redundant data deleting method based on differential privacy association rule
Differential privacy track data protection model [6] The sensitivity of the query result is reduced by increasing noise, so that an intruder cannot find out the increase or decrease of track data records from output information, and the security of user privacy track data information in the data set is ensured.
Using epsilon-DP, the attacker's query algorithm uses f: D-R d The whole process from input track data set to output real number vector D supports a random mechanism, D and D' are two adjacent track data sets, if f is the differential privacy track data set of any one outputAll meet the requirement of the formula (1), which shows that the algorithm meets epsilon-DP, epsilon is privacy budget in the formula (1), and the smaller epsilon is, the smaller the frequency of attributes among track data sets after the processing of adjacent track data sets is, and the more similar the differential privacy effect is provedIt is difficult for an attacker to judge in which track data set the differential privacy track data to be stolen is specifically present, so as to increase the protection degree of the track data.
Pr[f(D)∈S]≤e ε ×Pr[f(D′)∈S] (1)
In order to determine useful track data information to be protected in a track data set and reduce the number of times of extracting original track data, the method provides a data independence processing technology based on the relevance among information, and judges whether the track data is irrelevant information or not through a differential privacy relevance rule.
Calculating the global sensitivity of the trajectory dataset if f: D-R d ,Δf=max D,D′||f(D)-f(D′)|| 1 This function is called the global sensitivity of f. Global means that on the adjacent track data sets D and D', a query algorithm is applied to output the maximum range of track data, and the distance L between the two 1 The measurement value is determined by f and is not influenced by the track data set. Under the condition that epsilon is consistent, the global sensitivity is proportional to the noise adding amount in f and also proportional to the privacy degree of track data.
Judging whether the information in the current network is irrelevant information or not through the global sensitivity degree, and carrying out deep compression on the track data set by using a method of sharing prefixes to construct the FP-tree. Then, according to the FP-tree, the high-frequency item set is highlighted by using a frequent item growing method, so that the time consumption for mining is reduced. Much of the correlation information is hidden in the trace data set, and the correlation pattern between them can be found out by using an independence processing model, in this way redundant trace data in the interactive query process is removed. The whole process is shown in fig. 2, and the method for removing redundant track data is divided into 5 stages.
1) The trace data set is scanned to obtain the frequency of each attribute, and the attribute is sorted from high frequency to low frequency to obtain a descending list.
2) Setting m as the minimum supporting frequency, and removing all values smaller than m according to the descending order table.
3) The descending list is put into the prefix tree, and the first occurring nodes form a linked list, and the process builds the FP-tree.
4) And integrating the FP-tree by using a track data independence processing model.
5) Judging whether the paths of the leaf nodes are single, if not, returning to the previous step, and reconstructing a prefix set on each path to generate a brand new FP-tree; if so, the leaf node needs to be deleted, and a prefix path set is directly generated.
6) And finally, obtaining an interactive query association mode, namely the obtained prefix path set, and completing the whole process of redundant track data.
S2, matrix decomposition method based on combination property
The track data amount in the interactive query is large, the query rounds are more, and the differential privacy track data protection degree can be reduced. The method uses the sequence and parallel combinability of differential privacy to determine the differential requirement of track data, establishes a parallel gradient descent matrix decomposition model according to a low-rank mechanism and an alternate direction multiplier method, increases the differential privacy matrix decomposition efficiency, and improves the speed of the whole algorithm.
Sequence combinability, { A 1 ,A 2 ,…,A n N algorithms, D is a specified trajectory dataset, A i For any algorithm, if A is on the trajectory dataset D i Conforming to epsilon i DP, then { a } 1 ,A 2 ,…,A n Is just to satisfyIs a combination of sequential sequences of (a).
Parallel combinability, { D 1 ,D 2 ,…,D n The arbitrary algorithm { A } is performed for n disjoint subsets of the trajectory dataset D 1 ,A 2 ,…,A n Acting on each set if A i Satisfy epsilon i DP, then it can be said that { a } 1 ,A 2 ,…,A n The } is a sum of (max epsilon) i ) -parallel sequence combination of DPs.
The differential privacy track data protection model reduces query result sensitivity by adding controlled noise. The more differential privacy budget is needed for the sub-algorithm and complex function, the noise [11] The more the accumulation is, the more than oneTo some extent, distortion of the track data will be caused, reducing the usability of the track data. In some queries, (ε) i The value of delta) -DP is critical, for which the method selects a less restrictive model to reduce the amount of noise.
Let (∈, δ) -DP be the differential privacy constraint, when 0 < δ < 1, algorithm f on any two adjacent trace datasets output randomlySatisfying the requirement of formula (2), then f is said to satisfy (∈, δ) -DP.
Pf[f(D)∈S]≤e ε ×Pr[f(D′)∈S]+δ (2)
Formula (2) can be explained as: if the (1-delta) probability of f matches E-DP, then the (. Epsilon., delta.) DP can be approximated as differential privacy.
The track data set is irrelevant in content, and can only be used (epsilon, delta) -DP, if the condition can not be met, even if differential privacy can be realized, an information stealer can still obtain differential privacy track data according to the data relevance. At this time, privacy definition needs to be applied: epsilon-buffer Fish privacy, f: D-R d For an algorithm that can be queried randomly, (S, Q, Θ) is a privacy framework, Q e s×s is a secret trace data set, and Θ is a trace data distribution set. If the arbitrary distribution θ∈θ, arbitrary trajectory data (S i ,S j )∈Q(Pr(S i /θ)≠0,Pr(S j # -. Theta. Noteq.0), any output ω∈Range (f) where f meets the requirement of equation (3), then f satisfies ε -Puffer Fis privacy in (S, Q, Θ).
e ≤Pr[f(X)=ω/S i ,θ]/Pr[f(X)=ω/S j ,θ]≤e ε (3)
In the formula, Θ is not only the maximum track data attack force of an attacker, but also the correlation description of the track data. The above formula (3) describes any S in the secret trace dataset Q i 、S j After the epsilon-Puffer Fish privacy is changed, the output result has no obvious gap, and the epsilon-DP requirement is not strict. Where θ represents the distribution set of all trajectory data, S represents the distribution set of all trajectory data, and Q represents the Cartesian product of SOne special case of the private of the buffer Fish is the E-DP.
After the track data difference requirement is defined, deleting the load matrix established according to the initial result, obtaining an irrelevant load matrix according to the track data independence processing, and then decomposing the irrelevant load matrix. Using low rank mechanism, the gradient of decomposed matrix B and G to decomposed matrix F is defined
B=(βWF T +πF T )(βFF T +I) -1 (4)
In the formula (5), W is a load matrix, β is a load matrix coefficient, and the result of B is related to F only. The parallel gradient descent matrix decomposition algorithm decomposes w into a plurality of matrixes according to the characteristics of the matrixes, calculates the matrixes on each node, accelerates the differential privacy track data protection speed, and comprises the following operation steps:
1) And establishing an initial load matrix according to the query requirement of the user.
2) And deleting redundant track data in the initial load matrix according to the track data independence processing model to obtain an irrelevant load matrix.
3) The irrelevant load matrix W generated in the second step u×r Decomposing into z component parts, the number of lines being usedThe number of columns is denoted by r and the number of distributed system nodes is denoted by z. And calculating a decomposition matrix process by using a map process of distributed calculation and a reduce process of cloud calculation, and finally obtaining B and F, wherein the specific process is shown in figure 3.
S3, self-adaptive noise track data protection based on differential privacy noise mechanism
For track data needing differential privacy protection, an exponential mechanism [12] And Laplace mechanism [13] Most commonly applied. Exponential mechanisms are often applied to discrete numbersAnd according to the protection, the track data output is limited according to the principle of the scoring function. The Laplace mechanism is applied to numerical data protection, the global sensitivity can influence the noise amount added in the differential privacy model, and the query result is added with the data which accords with Laplace [14] The track data safety is increased, and differential privacy track data protection is realized.
Laplace mechanism: f: D-R is an arbitrary function, if the random algorithm A accords with the formula (6), the A accords with E-DP;
A(D)=f(D)+Lap(Δf/ε) (6)
in the formula, lap (delta f/epsilon) is Laplacian noise to be added, and the larger the noise amount is, the larger the global sensitivity is, and epsilon is smaller;
the most important step of the exponential mechanism is to establish a scoring function u (D, r) (r E O), and the output item of the output domain O is r;
an exponential mechanism: u: (D x O) to R is a scoring function for the trace dataset D, if the random algorithm A accords with the formula (7), it is indicated that A accords with E-DP;
where Δu is the maximum output value of the scoring function, i.e., the global sensitivity. The scoring component may be analyzed to be proportional to the selected output probability.
Differential privacy protection is carried out on track data according to a noise mechanism rule, and noise which can be suitable for Laplace distribution needs to be added to reduce sensitivity of query results. As is clear from the noise addition rule, adding a large amount of noise to track data having high sensitivity reduces the usability of information, and therefore an appropriate noise addition amount must be selected. The selection of the epsilon in the differential privacy model determines the track data protection effect, and the smaller the selected epsilon value is, the safer the user information is, but the larger the noise adding amount is; the larger the selected epsilon value is, the noise adding amount can be reduced, but the differential privacy track data protection effect is poor, and the information security is low. Therefore, a reasonable epsilon value is found by the comprehensive formulas (2) and (3), and the added noise quantity and the differential privacy track data protection effect can be considered.
The method selects the self-adaptive noise model, and combines under the premise of considering the user use permission (the higher the user permission level is, the closer the E is to the maximum value)And (3) determining a reasonable value of the added noise under the condition. The whole process is described as follows:
1) By passing throughThe condition limits epsilon maximum value, and proper epsilon privacy budget is selected according to the user permission level;
2) The locus dataset D and the load matrix decomposition result F are added to the noise using the laplace mechanism, respectively.
3) And restoring the irrelevant attribute track data deleted according to the track data independence processing model.
4) And feeding back the query result to the user.
Compared with the prior art, the proposal has the advantages that the relevance processing of the differential privacy relevance rule is used for finding out the relevance information hidden between the track data sets, the track data is deeply compressed by using a shared prefix tree method, redundant track data is removed, and the track data processing efficiency is improved.
And (3) defining the track data difference requirement, deleting the load matrix established according to the initial result, obtaining an irrelevant load matrix according to the track data independence processing, and then decomposing the irrelevant load matrix. And the gradient of the decomposed matrix is definitely decomposed by using a low-order mechanism, and the matrix is decomposed into a plurality of matrixes by using a parallel gradient descent matrix decomposition algorithm, so that the matrixes are calculated on each node, and the differential privacy track data protection speed is increased.
And limiting the maximum value of the privacy budget by using the self-adaptive noise model, taking a value according to the authority level of the user, adding reasonable noise into the differential privacy track data by using a Laplacian mechanism, realizing the protection of the differential privacy track data, and responding to the inquiry of the user.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. The differential privacy track data protection method based on the interactive query is characterized by comprising the following steps of:
s1, redundant data deleting method based on differential privacy association rule
The differential privacy track data protection model is characterized in that the increase and decrease of track data records cannot be found out from output information by an intruder by increasing the sensitivity of noise reduction query results, so that the security of the user privacy track data information in the data set is ensured;
using epsilon-DP, the attacker's query algorithm uses f: D-R d The whole process from input track data set to output real number vector D supports a random mechanism, D and D' are two adjacent track data sets, if f is the differential privacy track data set of any one outputThe method meets the requirement of the formula (1), and shows that the algorithm meets epsilon-DP, epsilon is privacy budget in the formula (1), and the smaller epsilon is, the smaller the frequency of attributes among track data sets is after the adjacent track data sets are processed, the more similar the differential privacy effect is proved, and an attacker can hardly judge which track data set the differential privacy track data to be stolen is specifically in, so that the track data protection degree is increased.
Pr[f(D)∈S]≤e ε ×Pr[f(D′)∈S] (1)
In order to determine useful information of track data to be protected in a track data set and reduce the frequency of extracting original track data, the method provides a data independence processing technology based on the relevance among information, and judges whether the track data is independence information or not through a differential privacy relevance rule;
calculating the global sensitivity of the trajectory dataset if f: D-R a ,Δf=max D,D′||f(D)-f(D′)|| 1 This function is called the global sensitivity of f;
judging whether the information in the current network is irrelevant information or not through the global sensitivity degree, carrying out deep compression on a track data set by using a method of sharing prefixes, constructing an FP-tree, then highlighting a high-frequency item set by using a frequent item growth method according to the FP-tree, reducing the excavation time consumption, hiding a lot of relevant information in the track data set, finding out the relevant modes among the relevant information by using an irrelevant processing model, and removing redundant track data in the interactive query process in the mode, wherein the method for removing the redundant track data is divided into 6 stages.
S2, matrix decomposition method based on combination property
The method uses the sequence and parallel combinability of differential privacy to determine the differential requirements of the track data, establishes a parallel gradient descent matrix decomposition model according to a low-rank mechanism and an alternate direction multiplication method, increases the differential privacy matrix decomposition efficiency and improves the speed of the whole algorithm;
the differential privacy track data protection model reduces the sensitivity of the query result by adding controlled noise; the more the sub-algorithm and complex function need to use the differential privacy budget, the more the noise accumulation is, and exceeding a certain level will cause track data distortion, reduce track data availability, and in some queries, (ε) i The value of delta-DP is strictly required, and for this purpose, the method selects a model with less limitation to reduce noise;
let (∈, δ) -DP be the differential privacy constraint, when 0 < δ < 1, algorithm f on any two adjacent trace datasets output randomlySatisfying the requirement of formula (2), then f is said to satisfy (∈, δ) -DP.
Pr[fD)∈S]≤e ε ×Pr[f(D′)∈S]+δ (2)
The contents in the track dataset are not relevant, so that the (∈, δ) -DP can be used if this cannot be satisfiedUnder the condition that differential privacy can be realized, the information stealer can still obtain differential privacy track data according to the data correlation, and at the moment, privacy definition needs to be applied: epsilon-buffer Fish privacy, f: D-R d For an algorithm that can be queried randomly, (S, Q, Θ) is a privacy framework, Q e s×s is a secret trace data set, and Θ is a trace data distribution set. If the arbitrary distribution θ∈θ, arbitrary trajectory data (S i ,S j )∈Q(Pr(S i /θ)≠0,Pr(S j # -. Theta. Noteq.0), any output ω∈Range (f) where f meets the requirement of equation (3), then f satisfies ε -Puffer Fis privacy in (S, Q, θ).
e ≤Pr[f(X)=ω/S i ,θ]/Pr[f(X)=ω/S j ,θ]≤e ε (3)
In the formula, Θ is not only the maximum track data attack force of an attacker, but also the correlation description of the track data. The above formula (3) describes any S in the secret trace dataset Q i 、S j After the epsilon-Puffer Fish privacy is changed, the output result has no obvious gap, and the epsilon-DP requirement is not strict. Here, a special case of epsilon-buffer fishe privacy is epsilon-DP under the conditions that Θ represents all track data distribution sets, S represents all track data sets, and Q represents the cartesian product in S;
after the track data difference requirement is defined, deleting the load matrix established according to the initial result, obtaining an irrelevant load matrix according to the track data independence processing, and then decomposing the irrelevant load matrix. Using low rank mechanism, the gradient of decomposed matrix B and G to decomposed matrix F is defined
B=(βWF T +πF T )(βFF T +I) -1 (4)
The parallel gradient descent matrix decomposition algorithm decomposes W into a plurality of matrixes according to the characteristics of the matrixes, calculates the matrixes on each node, accelerates the differential privacy track data protection speed, and comprises the following operation steps:
(1) Establishing an initial load matrix according to the query requirement of a user;
(2) Deleting redundant track data in the initial load matrix according to the track data independence processing model to obtain an irrelevant load matrix;
the irrelevant load matrix W generated in the second step u×r Decomposing into z component parts, the number of lines being usedThe number of columns is denoted by r and the number of distributed system nodes is denoted by z. Calculating a decomposition matrix process by using a map process of distributed calculation and a reduce process of cloud calculation, and finally obtaining B and F;
s3, self-adaptive noise track data protection based on differential privacy noise mechanism
For track data needing differential privacy protection, an exponential mechanism and a Laplace mechanism are most commonly applied, the exponential mechanism is commonly applied to discrete data protection, track data output is limited according to a scoring function principle, the numerical data protection is applied to the Laplace mechanism, global sensitivity can influence the noise amount added in a differential privacy model, and the query result is added with data conforming to Laplace [14] The track data safety is increased, and differential privacy track data protection is realized;
laplace mechanism: f: D-R is an arbitrary function, if the random algorithm A accords with the formula (6), the A accords with E-DP;
A(D)=f(D)+Lap(Δf/ε) (6)
in the formula, lap (delta f/epsilon) is Laplacian noise to be added, and the larger the noise amount is, the larger the global sensitivity is, and epsilon is smaller;
the most important step of the exponential mechanism is to establish a scoring function u (D, r) (r E O), and the output item of the output domain O is r;
an exponential mechanism: u: (D x O) to R is a scoring function for the trace dataset D, if the random algorithm A accords with the formula (7), it is indicated that A accords with E-DP;
where Δu is the maximum output value of the scoring function, i.e., the global sensitivity. It can be analyzed that the scoring quantity is proportional to the selected output probability;
according to the noise mechanism rule, the trace data is subjected to differential privacy protection, noise which can be suitable for Laplace distribution is required to be added to reduce the sensitivity of a query result, and according to the noise addition rule, after a large amount of noise is added to the trace data with high sensitivity, the information usability is reduced, so that a proper noise adding amount is required to be selected, the selection of E in a differential privacy model determines the trace data protection effect, and the smaller the selected E value is, the safer the user information is, but the larger the noise adding amount is; the larger the selected epsilon value is, the noise adding amount can be reduced, but the differential privacy track data protection effect is poor, and the information security is low. Therefore, a reasonable epsilon value is found by the comprehensive formulas (2) and (3), and the added noise quantity and the differential privacy track data protection effect can be considered;
the method selects the self-adaptive noise model, and combines under the premise of considering the user use permission (the higher the user permission level is, the closer the E is to the maximum value)And (3) determining a reasonable value of the added noise under the condition. The whole process is described as follows:
(1) By passing throughThe condition limits epsilon maximum value, and proper epsilon privacy budget is selected according to the user permission level;
(2) Respectively adding noise into the track data set D and the load matrix decomposition result F by using a Laplace mechanism;
(3) Restoring irrelevant attribute track data deleted according to the track data independence processing model;
(4) And feeding back the query result to the user.
2. The method of claim 1, wherein in step S1, the global means that the query algorithm is applied to the adjacent track data sets D and D' to output the maximum range of track data, and the distance L between the two track data sets 1 The method is characterized in that the method is used for measuring a numerical value, is determined by f, is not influenced by a track data set, and under the condition that epsilon is consistent, the global sensitivity is in direct proportion to the noise adding amount in f and also in direct proportion to the privacy degree of the track data.
3. The method for protecting differential privacy track data based on interactive query as claimed in claim 1, wherein in the step S1, 6 stages are: (1) Scanning the track data set to obtain the frequency of each attribute, and sorting the attribute according to the frequency from high to low to obtain a descending list; (2) Setting m as the minimum supporting frequency, and removing all values smaller than m according to a descending order table; (3) Placing the descending list into a prefix tree, and forming a linked list by the first occurring nodes, wherein the FP-tree is built in the process; (4) Integrating the FP-tree by using a track data independence processing model; (5) Judging whether the paths of the leaf nodes are single, if not, returning to the previous step, and reconstructing a prefix set on each path to generate a brand new FP-tree; if so, deleting the leaf node to directly generate a prefix path set; (6) And finally, obtaining an interactive query association mode, namely the obtained prefix path set, and completing the whole process of redundant track data.
4. The method according to claim 1, wherein in the step S2, the sequence combination, { a 1 ,A 2 ,…,A n N algorithms, D is a specified trajectory dataset, A i For any algorithm, if A is on the trajectory dataset D i Conforming to epsilon i DP, then { a } 1 ,A 2 ,…,A n Is just to satisfyIs a combination of sequential sequences of (a).
5. The method for protecting differential privacy track data based on interactive query as claimed in claim 1, wherein in step S2, parallelism and combinability are achieved [10] ,{D 1 ,D 2 ,…,D n The arbitrary algorithm { A } is performed for n disjoint subsets of the trajectory dataset D 1 ,A 2 ,…,A n Acting on each set if A i Satisfy epsilon i DP, then it can be said that { a } 1 ,A 2 ,…,A n The } is a sum of (max epsilon) i ) -parallel sequence combination of DPs.
6. The method of claim 1, wherein in the step S2, the equation (2) may be interpreted as: if the (1-delta) probability of f matches E-DP, then the (. Epsilon., delta.) DP can be approximated as differential privacy.
7. The method of claim 1, wherein in the step S3, W is a load matrix, β is a load matrix coefficient, and the result of B is related to F only.
CN202310204635.2A 2023-03-03 2023-03-03 Differential privacy track data protection method based on interactive query Pending CN116611101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310204635.2A CN116611101A (en) 2023-03-03 2023-03-03 Differential privacy track data protection method based on interactive query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310204635.2A CN116611101A (en) 2023-03-03 2023-03-03 Differential privacy track data protection method based on interactive query

Publications (1)

Publication Number Publication Date
CN116611101A true CN116611101A (en) 2023-08-18

Family

ID=87673480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310204635.2A Pending CN116611101A (en) 2023-03-03 2023-03-03 Differential privacy track data protection method based on interactive query

Country Status (1)

Country Link
CN (1) CN116611101A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131608A (en) * 2020-08-03 2020-12-25 辽宁工业大学 Classification tree difference privacy protection method meeting LKC model
CN117235800A (en) * 2023-10-27 2023-12-15 重庆大学 Data query protection method of personalized privacy protection mechanism based on secret specification

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131608A (en) * 2020-08-03 2020-12-25 辽宁工业大学 Classification tree difference privacy protection method meeting LKC model
CN112131608B (en) * 2020-08-03 2024-01-26 辽宁工业大学 Classification tree differential privacy protection method meeting LKC model
CN117235800A (en) * 2023-10-27 2023-12-15 重庆大学 Data query protection method of personalized privacy protection mechanism based on secret specification
CN117235800B (en) * 2023-10-27 2024-05-28 重庆大学 Data query protection method of personalized privacy protection mechanism based on secret specification

Similar Documents

Publication Publication Date Title
Huang et al. Behavior pattern clustering in blockchain networks
CN116611101A (en) Differential privacy track data protection method based on interactive query
Jorgensen et al. Publishing attributed social graphs with formal privacy guarantees
Kim et al. Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling
Xu et al. Building confidential and efficient query services in the cloud with RASP data perturbation
Lu et al. Enabling search over encrypted multimedia databases
Kiabod et al. TSRAM: A time-saving k-degree anonymization method in social network
Oliveira et al. A unified framework for protecting sensitive association rules in business collaboration
Bilge et al. Robustness analysis of privacy-preserving model-based recommendation schemes
Aggarwal Bridging the gap between probabilistic and fuzzy entropy
CN112332979A (en) Ciphertext searching method, system and equipment in cloud computing environment
Lee et al. Hashnwalk: Hash and random walk based anomaly detection in hyperedge streams
Lin et al. Mining of high average-utility patterns with item-level thresholds
Zhou et al. Abnormal profiles detection based on time series and target item analysis for recommender systems
CN109543094B (en) Privacy protection content recommendation method based on matrix decomposition
Shrivastava et al. Secure association rule mining for distributed level hierarchy in web
CN106296537B (en) A kind of group in information in public security organs industry finds method
Lan et al. Tightening upper bounds for mining weighted frequent itemsets
CN113486191A (en) Confidential electronic file fixed decryption method
Tsou et al. DPARM: Differentially private association rules mining
Wang et al. Federated cf: Privacy-preserving collaborative filtering cross multiple datasets
CN114297711A (en) Data security protection method based on cloud server
Liu et al. An online activity recommendation approach based on the dynamic adjustment of recommendation lists
Cai et al. A summary of data analysis based on differential privacy
Khan et al. Synthetic Identity Detection using Inductive Graph Convolutional Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination