CN108280366B - Batch linear query method based on differential privacy - Google Patents
Batch linear query method based on differential privacy Download PDFInfo
- Publication number
- CN108280366B CN108280366B CN201810042656.8A CN201810042656A CN108280366B CN 108280366 B CN108280366 B CN 108280366B CN 201810042656 A CN201810042656 A CN 201810042656A CN 108280366 B CN108280366 B CN 108280366B
- Authority
- CN
- China
- Prior art keywords
- data
- matrix
- attribute
- decomposition
- query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2141—Access rights, e.g. capability lists, access control lists, access tables, access matrices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A batch linear query method based on differential privacy comprises the following steps: step 1: inquiring an original data set R to obtain a data inquiry result set M; step 2: sorting the attribute frequency of the R in a descending order, screening the attribute with the frequency not greater than the minimum support degree, and discarding the attribute and the data corresponding to the attribute; performing data independence processing on the attribute with the attribute frequency larger than the minimum support degree to obtain an irrelevant data set D with the attribute frequency larger than the minimum support degree; and step 3: establishing a data-independent load matrix W on the basis of establishing an initial load matrix by using M, and decomposing the W in parallel by using a parallel gradient descent matrix decomposition technology to obtain a first matrix B of a complete decomposition result of the W and a second matrix L of the decomposition result; and 4, step 4: performing self-adaptive noise addition based on the difference privacy, adding Laplace noise to the L and the D, and restoring the discarded attribute and data to obtain a noise-added query result data set S; and 5: and returning the S to the user.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a batch linear query method based on differential privacy.
Background
With the development of the internet, humans enter the big data era. When large data is processed, batch linear query is the most common operation, but the query scale is very large, the query process is complicated, and the performance is low. In addition, in the process of using big data, many sensitive information is easily leaked, and the query precision (data availability) and the privacy protection degree cannot be guaranteed at the same time.
The algorithm in the prior art cannot simultaneously guarantee the algorithm performance, the query precision and the privacy protection degree of batch linear query. In the aspect of algorithm performance, the existing algorithm has high complexity and is not suitable for large-scale batch linear query; in terms of query accuracy, existing algorithms add noise to the query results to reduce the amount of noise required, thereby optimizing query accuracy. However, when the query sequence is given by the user arbitrarily, the computation overhead required by these mechanisms to find the optimal noise distribution is very large, and grows exponentially with the increase of data dimension, and cannot be used for large data sets; in the privacy protection degree, the added noise volume and the user authority are not considered in the existing algorithm, the appropriate noise volume added to users with different authorities cannot be ensured, and for users with high authorities, if the added noise is too much, the noise interference is large, and the query precision is reduced; for a low-authority user, if the noise is too low, the degree of privacy protection is insufficient.
Disclosure of Invention
The present invention is made to solve the above problems, and an object of the present invention is to provide a batch linear query method based on differential privacy.
The invention provides a batch linear query method based on differential privacy, which is characterized by comprising the following steps: step 1: inquiring an original data set R to obtain a data inquiry result set M; step 2: arranging the attribute frequency of the original data set R in a descending order, setting the attribute with the minimum support screening frequency not greater than the minimum support and discarding the attribute and the data corresponding to the attribute; performing data independence processing after the attribute with the attribute frequency larger than the minimum support degree adopts an FP-tree to obtain the associated attribute of the data to obtain an irrelevant data set D with the attribute frequency larger than the minimum support degree; and step 3: establishing an initial load matrix by using the data query result set M, establishing a data-independent load matrix W on the basis of the initial load matrix by using the attribute correlation in the step 2, and decomposing the data-independent load matrix W in parallel by using a parallel gradient descent matrix decomposition technology to obtain a first matrix B of a complete decomposition result of the data-independent load matrix W and a second matrix L of the decomposition result; and 4, step 4: performing adaptive noise addition based on differential privacy, adding Laplace noise to the second matrix L of the decomposition result and the irrelevant data set D with the attribute frequency greater than the minimum support degree, and restoring the attribute with the frequency not greater than the minimum support degree and the data corresponding to the attribute discarded in the step 2 to obtain a noise addition query result data set S; and 5: and returning the noisy query result data set S to the user.
The batch linear query method based on the differential privacy provided by the invention can also have the following characteristics: wherein the data independence process based on relevance analysis in step 2 comprises the following steps: step 2-1: scanning the original data set R to obtain the frequency of each attribute in the original data set R, and performing descending arrangement according to the attribute frequency to obtain an attribute frequency descending list; step 2-2: setting a minimum support degree, and removing attributes with the frequency not greater than the minimum support degree and data corresponding to the attributes according to the attribute frequency reduction sequence table; step 2-3: storing the attribute with the frequency not greater than the minimum support degree and the residual original data set R' of the corresponding data, which are removed, by a prefix tree to form an FP-tree, and establishing a linked list for the nodes appearing for the first time; step 2-4: sorting the FP-tree by using an FP-growth algorithm, and excavating an association mode; step 2-5: judging whether the leaf nodes are single paths or not, removing the leaf nodes when the judgment result is yes, generating a prefix path set, and entering the step 2-6; if not, generating a set of prefix paths of each path to form a new FP-tree, and returning to the step 2-4; step 2-6: acquiring the set of the prefix paths generated in the step 2-5, and defining the set as the associated attribute of the data; step 2-7: and carrying out data independence processing, and removing redundant data by utilizing the relevance of the attributes.
The batch linear query method based on the differential privacy provided by the invention can also have the following characteristics: wherein the parallel gradient descent matrix decomposition in step 3 comprises the following steps: step 3-1: generating an initial load matrix according to the query requirement of a user based on the data query result set M obtained in the step 1; step 3-2: converting the initial load matrix into a data-independent load matrix W according to the relevance attribute of the data obtained by the relevance processing of the data based on the relevance analysis in the step 2; step 3-3: running the Map process: decomposing a data-independent load matrix W into W and BL, B being a first matrix of decomposition results and L being a second matrix of decomposition results, wherein the B matrix has m rows and n columns, the L matrix has n rows and r columns, m represents the number of query records, r represents the maximum query attribute scale, and n represents the number of nodes, wherein the gradient of the first matrix B of decomposition results and the second matrix L of decomposition resultsThe calculation is made by the following formula:
B=(βWLT+πLT)(βLLT+I)-1 (1)
in the formulas (1) and (2), T is a transposed symbol; beta is a positive penaltyThe term, β, needs to be initialized; i is an identity matrix; pi is a Lagrangian multiplier; performing a matrix decomposition algorithm in parallel: decomposing B into B1,B2,…,Bi,…,BnN matrices, wherein BiA decomposition matrix representing B at the ith node; decomposing L into L according to rows1,L2,…,Li,…,LnN matrices, where LiA decomposition matrix representing L at the ith node, i.e. a data-independent load matrix W into n parts, each part comprising a BiMatrix and one LiMatrix, where the number of rows in each part is m/n, m is the number of rows of W, n is the number of nodes in the distributed system, and the decomposition matrix of W at the ith node is represented as Wi=BiLiThe Map process of distributed computing is introduced: firstly, accessing a decomposed data set, traversing each row of data, recording a row number a, then, rounding an output key value as a packet number a/n, making value as m/n row of data in the data, and carrying out a Combiner process: aggregating the data in each group to form data to be processed, distributing the divided parts to n nodes, and carrying out steps 3-4: run Reduce procedure: calculating at each node a matrix norm τ of the difference between the positive penalty term factor β and the product of the data-independent load matrix W and the decomposition matrix, τ being calculated by the formula τ | W-BiLiII, and updating beta and tau, stopping iteration when beta > 1000 and tau < 0.001, introducing a Reduce process of distributed computation: distributing B and L to each node, and calculating B by each nodeiAnd LiAnd the group number a/n is written into the Reduce process of cloud computing to realize integration, and B with the same group number is written into the cloud computing to form a groupiAnd LiSplicing is performed by line number a, thus obtaining complete L, B.
The batch linear query method based on the differential privacy provided by the invention can also have the following characteristics: wherein, the self-adaptive noise adding in the step 4 comprises the following steps: step 4-1: byCalculating the upper bound of the privacy budget epsilon, selecting epsilon according to the authority of the user,in equation (3), ε is the privacy budget; l is a second matrix of the decomposition result of the load matrix; ρ represents a correlation coefficient in the range of [ -1,1 [ ]](ii) a Δ q is sensitivity; step 4-2: adding Laplace noise satisfying epsilon to L and D by utilizing a Laplace noise mechanism; step 4-3: restoring the attribute with the discarding frequency not more than the minimum support in the step 2 and the data corresponding to the attribute; step 4-4: a noisy query result dataset S is obtained.
The batch linear query method based on the differential privacy provided by the invention can also have the following characteristics: the higher the authority of the user is, the closer the selected epsilon value is to the upper bound, the smaller the privacy protection degree is, and the higher the query precision is; the lower the authority of the user, the smaller the selected epsilon value, the greater the privacy protection degree and the lower the query precision.
Action and Effect of the invention
Aiming at the characteristics of batch linear query, the method realizes data independence processing based on relevance analysis, reduces redundant information, and improves query performance by adopting a parallel gradient descent matrix decomposition algorithm for processing. In addition, the method is based on a differential privacy protection algorithm and combines a user authority design self-adaptive noise adding algorithm to generate a reasonable amount of noise, so that privacy protection is realized. Therefore, the batch linear query method based on the differential privacy is not only an efficient linear query algorithm, but also a privacy protection algorithm which gives consideration to query precision and privacy protection degree.
Drawings
FIG. 1 is an overall flow diagram of a batch linear query method based on differential privacy in an embodiment of the invention;
FIG. 2 is a flow diagram of data independence processing based on relevance analysis in an embodiment of the invention;
FIG. 3 is a flow chart of a parallel gradient descent matrix decomposition in an embodiment of the invention; and FIG. 4 is a flow chart of adaptive noise addition in an embodiment of the present invention.
Detailed Description
In order to make the technical means, the creation features, the achievement purposes and the effects of the present invention easy to understand, the following embodiments specifically describe the batch linear query method based on differential privacy in conjunction with the accompanying drawings.
FIG. 1 is an overall flowchart of a batch linear query method based on differential privacy in an embodiment of the present invention.
As shown in fig. 1, the batch linear query method based on differential privacy of the present invention includes the following steps:
step 1: inquiring an original data set R to obtain a data inquiry result set M, wherein the original data set R comprises attributes and data, and the attributes are repeated; the data query result set M also contains attributes and data, wherein the attributes have duplicates.
Step 2: setting an attribute with the minimum support screening frequency not greater than the minimum support and discarding the attribute and data corresponding to the attribute; and acquiring the associated attribute of the data by adopting an FP-tree for the attribute with the attribute frequency greater than the minimum support degree, then performing data independence processing, removing the redundant attribute, adding the data with the redundant attribute to the data with the associated attribute, and obtaining an irrelevant data set D with the attribute frequency greater than the minimum support degree. The above-mentioned discarded attributes and their data are not processed in parallel in step 2, and are finally restored in the subsequent steps.
FIG. 2 is a flow diagram of data independence processing based on relevance analysis in an embodiment of the invention.
The data independence processing based on the relevance analysis in the step 2 comprises the following steps:
step 2-1: and scanning the original data set R to obtain the frequency of each attribute in the original data set R, and performing descending arrangement according to the attribute frequency to obtain an attribute frequency descending list.
Step 2-2: and setting the minimum support degree, and removing the attribute with the frequency not greater than the minimum support degree and the data thereof according to the attribute frequency reduction sequence table.
Step 2-3: and storing the residual original data set R' without the attribute with the frequency not more than the minimum support degree by using a data structure of an prefix tree to form an FP-tree, and establishing a linked list for the nodes appearing for the first time.
Step 2-4: and (5) sorting the FP-tree by using an FP-growth algorithm, and excavating an association mode.
Step 2-5: judging whether the leaf nodes are single paths or not, removing the leaf nodes when the judgment result is yes, generating a set of prefix paths, and entering the step 2-6; and if not, generating a set of prefix paths of each path to form a new FP-tree, and returning to the step 2-4.
Step 2-6: and acquiring the set of prefix paths generated in the step 2-5, and defining the set as the associated attribute of the data.
Step 2-7: and carrying out data independence processing, and removing redundant data by utilizing the relevance of the attributes.
And step 3: and (3) establishing an initial load matrix by using the query result set M, and establishing an irrelevant load matrix W on the basis of the initial load matrix by using the data correlation in the step 2. And decomposing the data-independent load matrix W in parallel by adopting a parallel gradient descent matrix decomposition technology to obtain a first matrix B of a complete decomposition result of the data-independent load matrix W and a second matrix L of the decomposition result.
FIG. 3 is a flow diagram of a parallel gradient descent matrix decomposition technique in an embodiment of the invention.
The parallel gradient descent matrix decomposition technology in the step 3 comprises the following steps:
step 3-1: and (4) generating an initial load matrix according to the query requirement of the user based on the query result set M obtained in the step (1). The initial load matrix contains only a matrix of attributes, with duplicates of the attributes.
Step 3-2: and (3) according to the relevance attribute of the data obtained by the relevance processing of the data analyzed by the relevance, performing conversion elimination on the relevance data to generate a data-irrelevant load matrix W. The data-independent load matrix W is a matrix containing attributes, which are not duplicated.
Step 3-3: running the Map process:
and decomposing the data-independent load matrix W into W and BL, B being a first matrix of the decomposition result, and L being a second matrix of the decomposition result, wherein the B matrix has m rows and n columns, the L matrix has n rows and r columns, m represents the query record number, r represents the maximum query attribute scale, and n represents the node number.
Wherein the gradient of the first matrix B of the decomposition results and the second matrix L of the decomposition resultsThe calculation is made by the following formula:
B=(βWLT+πLT)(βLLT+I)-1 (1)
in the formulas (1) and (2), G represents a function of L; t is a transposed symbol; beta is a positive penalty factor, and needs to be initialized; i is an identity matrix; pi is the lagrange multiplier.
Performing a matrix decomposition algorithm in parallel: decomposing B into B1,B2,…,Bi,…,BnN matrices, wherein BiA decomposition matrix representing B at the ith node; decomposing L into L according to rows1,L2,…,Li,…,LnN matrices, where LiA decomposition matrix representing L at the ith node, i.e. a data-independent load matrix W into n parts, each part comprising a BiMatrix and one LiMatrix, where the number of rows in each part is m/n, m is the number of rows of W, n is the number of nodes in the distributed system, and the decomposition matrix of W at the ith node is represented as Wi=BiLi。
Map process for distributed computing is introduced: firstly, accessing a decomposed data set, traversing each row of data, recording a row number a, then, rounding an output key value as a packet number a/n, and making value as m/n row of data in the data.
And carrying out Combiner process, namely aggregating the data in each group to form data to be processed, and distributing the divided parts to n nodes.
Step 3-4: run Reduce procedure:
calculating a positive penalty factor beta and a data-independent negative on each nodeA matrix norm τ of the difference of the product of the carrier matrix W and the decomposition matrix, τ being given by the formula τ | W-BiLiII and update β and τ, stopping iteration when β > 1000 and τ < 0.001.
Reduce process with distributed computing: distributing B and L to each node, and calculating B by each nodeiAnd LiAnd the group number a/n is written into the Reduce process of cloud computing to realize integration.
B with the same group numberiAnd LiSplicing is performed by line number a, thus obtaining complete L, B.
And 4, step 4: and carrying out self-adaptive noise addition based on differential privacy, and adding Laplace noise to a second matrix L of the decomposition result of the data-independent load matrix W and an independent data set D pointed by the attribute with the frequency greater than the minimum support degree.
In this embodiment, the amount of added noise is not specified in advance, but the privacy budget ε is chosen according to a formula in combination with the user's rights. The higher the authority of the user is, the closer the selected epsilon value is to the upper bound, and the smaller the privacy protection degree is; the higher the query accuracy (data availability); the lower the authority of the user, the smaller the selected epsilon value, the greater the privacy protection degree, and the lower the query precision (data availability).
Fig. 4 is a flow chart of adaptive noise addition in an embodiment of the present invention.
The self-adaptive noise adding in the step 4 comprises the following steps:
And calculating the upper bound of the privacy budget epsilon, and selecting the epsilon according to the authority of the user.
In formula (3), ε is the privacy budget used to measure the privacy protection level; l is a second matrix of the decomposition result of the load matrix; ρ represents a correlation coefficient in the range of [ -1,1 ]; Δ q is the sensitivity.
Step 4-2: and adding Laplace noise meeting the epsilon to L and D by utilizing a Laplace noise mechanism to realize differential privacy protection. Since the result of the batch linear query is WD ═ BLD, and B changes with the change of L, the differential privacy protection can be achieved by only adding laplacian noise satisfying epsilon to L and D using the laplacian noise mechanism.
Step 4-3: and restoring the attribute and the data thereof which are discarded in the step 2 and have the frequency not more than the minimum support degree. Since the data removed in step 2 contains attributes, it is only necessary to add the data directly to the noisy result set WD during restoration, where the noisy result set WD contains attributes and data.
Step 4-4: and combining the WD subjected to noise addition, the restored attributes and the data thereof to form a noise addition query result data set S. S is a set, which is composed of attributes with frequencies of adding noise WD and restoring no greater than the minimum support and data thereof, where W is an independent load matrix (a matrix including only attributes, and is not repeated), D represents an independent data set pointed to by attributes with frequencies greater than the minimum support, does not include attributes, and is a part of the initial query result set M (including attributes and data in all query results). L represents a second matrix of W decomposition results. W ═ BL, and B varies with L, so W varies with L.
And 5: and returning the noisy query result data set S obtained in the step 4-4 to the user.
Effects and effects of the embodiments
Aiming at the characteristics of batch linear query, the redundant information is reduced through data independence processing, and meanwhile, the query performance is improved by adopting a parallel gradient descent matrix decomposition algorithm for calculation processing. In addition, the embodiment is based on a differential privacy protection algorithm, and combines a user authority design self-adaptive noise adding algorithm to generate a reasonable amount of noise, so that privacy protection is realized. Therefore, the batch linear query method based on the differential privacy of the embodiment is not only an efficient linear query algorithm, but also a privacy protection algorithm which takes account of query precision and privacy protection degree.
Further, in order to give consideration to both the query precision and the privacy protection degree, the embodiment does not specify the added noise volume in advance, but proposes to select the privacy budget epsilon according to a formula and in combination with the user authority, so as to display that a reasonable amount of noise is adaptively added, thereby reducing the noise interference degree, improving the query precision and ensuring the data usefulness. Different epsilon can be selected for users with different authorities, so that the query precision and the differential privacy protection degree are organically related to the authority of the users.
The above embodiments are preferred examples of the present invention, and are not intended to limit the scope of the present invention.
Claims (5)
1. A batch linear query method based on differential privacy is characterized by comprising the following steps:
step 1: inquiring an original data set R to obtain a data inquiry result set M;
step 2: arranging the attribute frequency of the original data set R in a descending order, setting the attribute with the minimum support screening frequency not greater than the minimum support and discarding the attribute and the data corresponding to the attribute; performing data independence processing after the attribute with the attribute frequency larger than the minimum support degree adopts an FP-tree to obtain the associated attribute of the data to obtain an irrelevant data set D with the attribute frequency larger than the minimum support degree;
and step 3: establishing an initial load matrix by using the data query result set M, establishing a data-independent load matrix W on the basis of the initial load matrix by using the attribute correlation in the step 2, and decomposing the data-independent load matrix W in parallel by using a parallel gradient descent matrix decomposition technology to obtain a first matrix B of a complete decomposition result of the data-independent load matrix W and a second matrix L of the decomposition result;
and 4, step 4: performing adaptive noise addition based on differential privacy, adding Laplace noise to the second matrix L of the decomposition result and the irrelevant data set D with the attribute frequency greater than the minimum support degree, and restoring the attribute with the frequency not greater than the minimum support degree and the data corresponding to the attribute discarded in the step 2 to obtain a noise addition query result data set S;
and 5: the noisy query result dataset S is returned to the user,
wherein the gradient of the first matrix B of the decomposition results and the second matrix L of the decomposition resultsThe calculation is made by the following formula:
B=(βWLT+πLT)(βLLT+I)-1 (1)
in the formulas (1) and (2), T is a transposed symbol; beta is a positive penalty factor, and needs to be initialized; i is an identity matrix; pi is the lagrange multiplier.
2. The batch linear query method based on differential privacy of claim 1, wherein:
wherein the data independence process based on relevance analysis in step 2 comprises the following steps:
step 2-1: scanning the original data set R to obtain the frequency of each attribute in the original data set R, and performing descending arrangement according to the attribute frequency to obtain an attribute frequency descending list;
step 2-2: setting a minimum support degree, and removing attributes with the frequency not greater than the minimum support degree and data corresponding to the attributes according to the attribute frequency reduction sequence table;
step 2-3: storing the attribute with the frequency not greater than the minimum support degree and the residual original data set R' of the corresponding data, which are removed, by a prefix tree to form an FP-tree, and establishing a linked list for the nodes appearing for the first time;
step 2-4: sorting the FP-tree by using an FP-growth algorithm, and excavating an association mode;
step 2-5: judging whether the leaf nodes are single paths or not, removing the leaf nodes when the judgment result is yes, generating a prefix path set, and entering the step 2-6; if not, generating a set of prefix paths of each path to form a new FP-tree, and returning to the step 2-4;
step 2-6: acquiring the set of the prefix paths generated in the step 2-5, and defining the set as the associated attribute of the data;
step 2-7: and carrying out data independence processing, and removing redundant data by utilizing the relevance of the attributes.
3. The batch linear query method based on differential privacy of claim 1, wherein:
wherein the parallel gradient descent matrix decomposition in step 3 comprises the following steps:
step 3-1: generating an initial load matrix according to the query requirement of a user based on the data query result set M obtained in the step 1;
step 3-2: converting the initial load matrix into a data-independent load matrix W according to the relevance attribute of the data obtained by the relevance processing of the data based on the relevance analysis in the step 2;
step 3-3: running the Map process:
decomposing the data-independent load matrix W into W and BL, B being a first matrix of decomposition results and L being a second matrix of decomposition results, wherein the B matrix has m rows and n columns, the L matrix has n rows and r columns, m represents the number of query records, r represents the maximum query attribute scale, n represents the number of nodes,
performing a matrix decomposition algorithm in parallel: decomposing B into B1,B2,…,Bi,…,BnN matrices, wherein BiA decomposition matrix representing B at the ith node; decomposing L into L according to rows1,L2,…,Li,…,LnN matrices, where LiA decomposition matrix representing L at the ith node, i.e. a data-independent load matrix W into n parts, each part comprising a BiMatrix and one LiMatrix, where the number of rows in each part is m/n, m is the number of rows of W, n is the number of nodes in the distributed system, and the decomposition matrix of W at the ith node is represented as Wi=BiLi,
Map process for distributed computing is introduced: firstly, accessing the decomposed data set, traversing each row of data, recording the row number a, then, rounding the output key value as the packet number a/n, and making value as the m/n row of data in the data,
performing Combiner process, aggregating data in each group to form data to be processed, distributing divided parts to n nodes,
step 3-4: run Reduce procedure:
calculating at each node a matrix norm τ of the difference of the positive penalty term factor β and the product of the decomposition matrices of the data-independent load matrices W and W, τ being calculated by the formula τ | W-BiLiII, and updating beta and tau, stopping iteration when beta > 1000 and tau < 0.001,
reduce process with distributed computing: distributing B and L to each node, and calculating B by each nodeiAnd LiAnd a group number a/n, the Reduce process written in the cloud computing realizes integration,
b with the same group numberiAnd LiSplicing is performed by line number a, thus obtaining complete L, B.
4. The batch linear query method based on differential privacy of claim 1, wherein:
wherein, the self-adaptive noise adding in the step 4 comprises the following steps:
Calculating the upper bound of the privacy budget epsilon, selecting epsilon according to the authority of the user,
in equation (3), ε is the privacy budget; l is a second matrix of the decomposition result of the data-independent load matrix; ρ represents a correlation coefficient in the range of [ -1,1 ]; Δ q is sensitivity;
step 4-2: adding Laplace noise satisfying epsilon to L and D by utilizing a Laplace noise mechanism;
step 4-3: restoring the attribute with the discarding frequency not more than the minimum support in the step 2 and the data corresponding to the attribute;
step 4-4: a noisy query result dataset S is obtained.
5. The batch linear query method based on differential privacy of claim 4, wherein:
the higher the authority of the user is, the closer the selected epsilon value is to the upper bound, the smaller the privacy protection degree is, and the higher the query precision is; the lower the authority of the user, the smaller the selected epsilon value, the greater the privacy protection degree and the lower the query precision.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810042656.8A CN108280366B (en) | 2018-01-17 | 2018-01-17 | Batch linear query method based on differential privacy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810042656.8A CN108280366B (en) | 2018-01-17 | 2018-01-17 | Batch linear query method based on differential privacy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108280366A CN108280366A (en) | 2018-07-13 |
CN108280366B true CN108280366B (en) | 2021-10-01 |
Family
ID=62803867
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810042656.8A Active CN108280366B (en) | 2018-01-17 | 2018-01-17 | Batch linear query method based on differential privacy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108280366B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108763954B (en) * | 2018-05-17 | 2022-03-01 | 西安电子科技大学 | Linear regression model multidimensional Gaussian difference privacy protection method and information security system |
CN111914285B (en) * | 2020-06-09 | 2022-06-17 | 深圳大学 | Geographic distributed graph calculation method and system based on differential privacy |
CN111475854B (en) * | 2020-06-24 | 2020-10-20 | 支付宝(杭州)信息技术有限公司 | Collaborative computing method and system for protecting data privacy of two parties |
CN112818386B (en) * | 2021-01-20 | 2021-11-12 | 海南大学 | DIKW-mode-crossing typed private information resource differential protection method and system |
CN112507710B (en) * | 2021-02-05 | 2021-05-25 | 支付宝(杭州)信息技术有限公司 | Method and device for estimating word frequency in differential privacy protection data |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8627488B2 (en) * | 2011-12-05 | 2014-01-07 | At&T Intellectual Property I, L.P. | Methods and apparatus to anonymize a dataset of spatial data |
CN104050267B (en) * | 2014-06-23 | 2017-10-03 | 中国科学院软件研究所 | The personalized recommendation method and system of privacy of user protection are met based on correlation rule |
CN104537025B (en) * | 2014-12-19 | 2017-10-10 | 北京邮电大学 | Frequent episodes method for digging |
CN107092837A (en) * | 2017-04-25 | 2017-08-25 | 华中科技大学 | A kind of Mining Frequent Itemsets and system for supporting difference privacy |
-
2018
- 2018-01-17 CN CN201810042656.8A patent/CN108280366B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN108280366A (en) | 2018-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108280366B (en) | Batch linear query method based on differential privacy | |
Zhu et al. | Differential privacy and applications | |
Papadakis et al. | Meta-blocking: Taking entity resolutionto the next level | |
US8627488B2 (en) | Methods and apparatus to anonymize a dataset of spatial data | |
CN104123288B (en) | A kind of data query method and device | |
US7707005B2 (en) | Generating histograms of population data by scaling from sample data | |
EP3690677A1 (en) | Differentially private query budget refunding | |
US20150363404A1 (en) | Minimizing index maintenance costs for database storage regions using hybrid zone maps and indices | |
WO2020023759A1 (en) | Secure electronic messaging system | |
Bischoff et al. | Computing many-body wave functions with guaranteed precision: The first-order Møller-Plesset wave function for the ground state of helium atom | |
CN111475848B (en) | Global and local low noise training method for guaranteeing privacy of edge calculation data | |
CN102915365A (en) | Hadoop-based construction method for distributed search engine | |
CN116762069A (en) | Metadata classification | |
Lee et al. | Streamlined mean field variational Bayes for longitudinal and multilevel data analysis | |
CN109492429B (en) | Privacy protection method for data release | |
Yang et al. | Sparse hierarchical solvers with guaranteed convergence | |
Wang et al. | Discover community leader in social network with PageRank | |
Ke et al. | Optimal network membership estimation under severe degree heterogeneity | |
Qiao et al. | An effective data privacy protection algorithm based on differential privacy in edge computing | |
CN117574436B (en) | Tensor-based big data privacy security protection method | |
Liu et al. | A hybrid with distributed pooling blockchain protocol for image storage | |
Bante et al. | Big data analytics using hadoop map reduce framework and data migration process | |
US11068481B2 (en) | Optimized full-spectrum order statistics-based cardinality estimation | |
Kayid et al. | [Retracted] EM Algorithm for Estimating the Parameters of Quasi‐Lindley Model with Application | |
Zhang et al. | Research on the construction of university data platform based on hybrid architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |