CN112560094A - Dual optimization-based high-availability graph data privacy protection method - Google Patents

Dual optimization-based high-availability graph data privacy protection method Download PDF

Info

Publication number
CN112560094A
CN112560094A CN202011509745.2A CN202011509745A CN112560094A CN 112560094 A CN112560094 A CN 112560094A CN 202011509745 A CN202011509745 A CN 202011509745A CN 112560094 A CN112560094 A CN 112560094A
Authority
CN
China
Prior art keywords
graph data
availability
privacy protection
data
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011509745.2A
Other languages
Chinese (zh)
Inventor
宋甫元
秦拯
欧露
刘羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202011509745.2A priority Critical patent/CN112560094A/en
Publication of CN112560094A publication Critical patent/CN112560094A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention designs a high-availability graph data privacy protection method based on dual optimization. The invention mainly comprises the steps of analyzing the characteristic of a graph data node element set and providing a graph data availability quantification model based on the mean square error of a query function; providing a graph data differential privacy protection method subject to a Laplace mechanism, and analyzing that the provided privacy protection method meets differential privacy definition; the method is characterized by providing an image data availability optimization method based on dual optimization, determining whether a feasible solution is a saddle point according to a Lagrange multiplier method and a Hessian matrix judgment method, and ensuring high availability of image data. The method and the device provided by the invention can ensure the high availability of the graph data and ensure the safe and available publishing of the sensitive information and data of the graph data while protecting the private information in the graph data publishing process.

Description

Dual optimization-based high-availability graph data privacy protection method
Technical Field
The invention relates to the field of graph data security and privacy protection, in particular to a high-availability graph data privacy protection method based on dual optimization.
Background
With the rapid development of social networks and internet of things, more and more internet application data presents the characteristics of graph structures, for example, american Facebook (Facebook) corporation owns about 25 hundred million users, and the users are associated with each other to form a huge graph structure data set. Along with the scale of graph structure data and the application of graph structure data becoming larger and wider, the problem of privacy disclosure of large-scale graph data also faces serious challenges. In large-scale graph data, each node and each edge contain rich personal privacy information, such as user identification numbers, addresses, contact ways, and relationships between users. Once sensitive information of a user is leaked, serious economic loss is brought to the user and an enterprise, and even serious threat is caused to the country and the society. Therefore, it is necessary to research a large-scale graph data privacy protection technology to ensure the security of graph data in the publishing process.
In order to solve the problem of privacy protection of large-scale graph data, researchers design various encryption mechanisms for protecting privacy information of graph data in the publishing and querying processes. However, these conventional encryption mechanisms mainly aim at relational data, such as text information, and graph data has the characteristics of complex structure, large data volume, strong relevance, and the like. On one hand, the traditional encryption algorithm has single performance, so that the problem of privacy protection of complex graph data is difficult to solve; on the other hand, the traditional encryption algorithm is based on a heavy cryptographic algorithm structure, has high computational complexity, cannot provide high-efficiency ciphertext graph data application service, and is difficult to realize high-efficiency graph data application requirements. In addition, conventional encryption algorithms can greatly hinder the numerical computation of graph data, so that the properties of graph data in the clear are no longer applicable. Therefore, the research on the high-availability graph data privacy protection method has a very profound meaning and application value.
Because the differential privacy protection has strict mathematical interpretability and can protect the privacy information in the statistical data publishing process, the differential privacy protection is widely applied to data security and privacy protection at present. In addition, the differential privacy protection can determine the privacy protection degree and the data availability according to the privacy budget, so that the data availability optimization problem of the graph data under the privacy protection premise is solved. In the process of publishing the statistical information of the graph data, some attackers can deduce the sensitive information in the source data through differential attack. Therefore, noise needs to be added to nodes or edges of the graph data, so that the graph data is randomly disturbed, and an attacker cannot distinguish the graph data before and after the noise is added according to the existing information, thereby realizing the privacy protection of the graph data.
Considering that the traditional graph data privacy protection method is low in availability, only privacy protection of single type of data can be realized, large-scale and complex-structure graph data privacy protection cannot be supported, and privacy protection of association relation between graph data nodes cannot be realized. Compared with the traditional privacy protection method, the novel image data privacy protection method based on dual optimization has higher usability and higher privacy protection degree. In addition, the method combines a Lagrange multiplier method and a Hessian matrix judgment method to construct an optimization model based on privacy budget, and achieves the optimal balance of safety and usability in the graph data publishing process.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the problem of privacy disclosure in the process of publishing graph data, and provides a dual optimization-based high-availability graph data privacy protection method, which mainly comprises the following three contents:
the content one is as follows: analyzing the characteristic of a graph data node element set, and providing a graph data availability quantification model based on the mean square error of a query function;
and II, content II: the method for protecting the graph data differential privacy obeying the Laplace mechanism is provided, and the graph data privacy is protected;
and thirdly: a graph data availability optimization method based on a Lagrange multiplier method is provided, and a Hessian matrix decision method is used for determining saddle points of which feasible solutions are objective functions, so that high availability of graph data is ensured.
The invention provides a dual optimization-based high-availability graph data privacy protection method, which comprises the following specific contents:
the content one is as follows: and providing a data availability quantification model based on the mean square error of the query function.
Assuming that the graph data node contains user attribute information such as address, age, contact information, etc., in the graph data collection process, the user will generate a query request x (x) about statistical data1,x2,…,xn) And a query function q (x) corresponding to the query request (q (x)1),q(x2),…,q(xn) Wherein x isi(i∈[1,n]) Representing a user attribute in a query request, q (x)i) Query functions representing statistics of the corresponding user attributes, such as summing, averaging, or thresholding. Thus, the Mean Square Error (MSE) of the query function can be expressed as
Figure BDA0002846043060000031
The method is used for designing a data availability quantification model based on the mean square error of the query function, setting constraint conditions according to the noise distribution of differential privacy protection, and achieving privacy budget optimization. The method specifically comprises the following steps:
step (a): and extracting attribute values and attribute features of the graph data node element set.
Step (b): and decomposing the common characteristic subgraph by using a dynamic programming algorithm, and performing parallel computation on the characteristic subgraphs.
Step (c): and solving the mean square error of the query function according to the query request and the query function.
Step (d): and designing a data availability quantification model based on the mean square error of the query function.
And II, content II: a graph data differential privacy protection method subject to a Laplace mechanism is provided.
As shown in the design diagram of fig. 1, in order to protect the private information of the node element set of the graph data, the user first adds the noise γ, i.e. γ — Lap (0, λ) subject to the laplacian mechanism to the node element set, wherein the specification parameters of the laplacian distribution
Figure BDA0002846043060000032
Sensitivity Δ q ═ q | | q1-q2||1. And then, the user publishes the graph data added with the noise to a cloud server for storage and query service of the graph data. The cloud server has sufficient storage space and strong computing capacity, and on one hand, the cloud server provides graph data storage service for users provided by data; in another aspect, the cloud server is a querying userAnd providing graph data query service.
After the cloud server receives a statistical request about graph data of an inquiring user, the cloud server performs statistical calculation on the node element set of the graph data stored in the cloud server to obtain a corresponding statistical result, and the statistical result obtained through calculation is returned to the inquiring user.
According to the graph data differential privacy protection method based on Laplace distribution, any pair of query requests x is subjected toi,xjIs x and any output Q is Q (x), the probability corresponding to the statistical calculation result before and after the random disturbance is kept consistent, namely Pr [ Q (x)i)∈Q]≤eεPr[q(xj)∈Q]This is true. Wherein, Pr [ ·]The probability of privacy disclosure risks is represented, the privacy budget epsilon represents privacy protection force, and when epsilon is smaller, the privacy protection force is larger. The detailed steps of the whole differential privacy protection process are as follows:
step (a): and counting the node element set information, and calculating the statistical value of each attribute in the node.
Step (b): noise subject to a laplacian distribution is added to the statistics of the node element set.
Step (c): computing a statistical function q (x) of a querying useri) And returning the calculated result to the inquiry user.
The method adds the noise obeying the Laplace distribution, and can realize epsilon-difference privacy, wherein the probability density function of the Laplace distribution is
Figure BDA0002846043060000041
The analysis was as follows:
Figure BDA0002846043060000042
that is, Pr [ x ]i]≤exp(ε)Pr[xj]This is true. Then the laplacian distribution-compliant noise can ensure that the differential privacy protection definition holds. Therefore, the privacy protection method provided by the patent realizes the differential privacy protection of the graph data.
And thirdly: a graph data availability optimization method based on a Lagrange multiplier method and a Hessian matrix judgment method is provided.
In order to ensure high availability of the graph data after random disturbance, the method is based on a Lagrange multiplier method and a Hessian matrix judgment method, an availability optimization method is designed, and the availability of the graph data is improved. Graph data availability may be based on the mean square error of the query function Q (x) before and after the addition of noise
Figure BDA0002846043060000051
Figure BDA0002846043060000052
The decision is made that the larger the query function mean square error, the higher the graph data availability. In addition, the noise added by this patent should satisfy the Laplace distribution, i.e.
Figure BDA0002846043060000053
Therefore, the usability optimization problem of the patent can be converted into the maximization of the mean square error of the query function under the noise constraint condition of obeying the Laplace distribution. Aiming at the optimization problem under the constraint condition, the patent adopts a dual optimization method to solve the optimal solution of an objective function, and the specific flow is as follows:
step (a): introducing a Lagrange multiplier alpha, and setting a target function by using a dual optimization method according to constraint conditions
Figure BDA0002846043060000054
Step (b): and (3) converting the optimization problem into the minimum value of the solved objective function, namely minF (q, alpha), through a dual optimization idea.
Step (c): using Lagrange multiplier method, i.e. objective function with respect to different variables qiAnd alpha, calculating a first order partial derivative, and in this state, calculating an extreme point of the objective function. The noise corresponding to the extreme point is the saddle point which makes the mean square error of the query function take the extreme value.
Step (d): and determining whether the extreme value solution obtained by the query function is a minimum value or not by using a Hessian matrix determination method. If the minimum value solution is not obtained, the noise in the state is discarded, and the noise which is obtained by obtaining the minimum value solution is continuously obtained in an iterative mode. Otherwise, the iteration is stopped.
Drawings
FIG. 1 is a diagram of a dual optimization-based data privacy protection scheme for high availability graphs in accordance with an embodiment of the present invention;
FIG. 2 is a flowchart of graph data publishing and querying based on dual optimization in an embodiment of the present invention.
Detailed Description
The invention provides a high-availability graph data privacy protection method based on dual optimization, which mainly comprises the following five steps:
step (a): adding noise that obeys a laplacian distribution;
step (b): calculating the mean square error of the query function;
step (c): solving a dual optimization problem;
step (d): determining an extremum solution;
a step (e): and executing the high-availability graph data privacy protection operation.
The implementation platform is JAVA and the operating system is win 10. The method comprises the following specific steps:
the first step is as follows: noise is added that obeys a laplacian distribution.
Calculating several statistical values of node elements in graph data, decomposing a common characteristic subgraph by adopting a dynamic programming algorithm, and simultaneously carrying out parallel calculation on element sets of different characteristic subgraphs. Noise which obeys Laplace distribution is added to the node elements in the graph data, so that random disturbance occurs to the node element set of the graph data. The method aims to resist differential attack of attackers and realize graph data privacy protection.
The second step is that: the mean square error of the query function is calculated.
And calculating the mean square error of the query function according to the query function requested by the query user. The mean square error of the query function can be used for measuring the privacy protection degree, and when the mean square error of the query function is larger, the privacy protection degree is better. Therefore, the second step needs to solve the maximum value of the mean square error of the query function under the condition of obeying the Laplace distribution constraint.
The third step: and solving a dual optimization problem.
And converting the optimization problem of the query function mean square error maximum value in the second step into a dual problem, namely converting the maximum value of the solved query function into the minimum value of the solved Lagrangian function, namely solving minF (q, alpha). And selecting the saddle point of the target function according to the property that the corresponding solution is a feasible solution when the first-order gradient of the Lagrangian function is equal to zero.
The fourth step: and (6) determining an extremum solution.
Aiming at the problem that whether the extreme value solution of the Lagrangian function solved by the dual optimization problem in the third step is a minimum value (whether the extreme value point is a saddle point cannot be judged due to first-order partial derivation), the method utilizes a Hessian matrix judgment method to determine that the target function has a definite boundary under the constraint condition. According to the square matrix composed of the second-order partial derivatives of the objective function relative to the variables, if the square matrix composed of the second-order partial derivatives is a positive definite matrix, a feasible solution (q) is indicated11) Is a local minimum. Otherwise, the feasible solution is not a local minimum point, further iteration is needed, and whether the feasible solution is the minimum point is determined by utilizing a Hessian matrix judgment method.
The fifth step: and executing the high-availability graph data privacy protection operation.
Saddle points that cause the lagrangian function to take a minimum are added as noise to the set of graph data node elements. And each subgraph is added with the noise corresponding to the minimum value point, and the subgraph added with the noise is issued to a cloud server to provide storage and query service of graph data.

Claims (4)

1. A high-availability graph data privacy protection method based on dual optimization is characterized by comprising the following steps:
(1) providing an image data availability quantification model based on query function mean square error;
(2) providing a graph data differential privacy protection method subject to a Laplace mechanism;
(3) a dual optimization method for the availability of the graph data based on a Lagrange multiplier method is provided, and a Hessian matrix discrimination method is used for determining saddle points of a feasible solution which is an objective function, so that the high availability of the graph data is ensured.
2. The query function mean square error based graph data availability quantification model of claim 1, wherein: calculating the Mean Square Error (MSE) of a query function by analyzing the node element set characteristics of the characteristic subgraph, wherein the MSE is used for reflecting a measure of the difference degree between the image data before noise addition and the image data after noise addition; the availability of the graph data can be equivalent to the problem of the optimal solution under the unbiased estimation of the mean square error of the query function, and when the mean square error of the query function is larger, the availability of the graph data is better; conversely, the less graph data is available.
3. The graph data differential privacy protection method subject to the laplace mechanism according to claim 1, characterized in that: after the noise which obeys Laplace distribution is added, the statistical value of the graph data node element set has the indistinguishable property; that is, an attacker cannot acquire sensitive information of a record in the graph data set through differential attack; in addition, the differential privacy protection technology obeying the Laplace distribution can adopt the mean square error of the query function to measure the privacy protection degree and the data availability, and the safety and the availability are well balanced.
4. The graph data availability dual optimization method based on the Lagrangian multiplier method as claimed in claim 1, wherein: aiming at the graph data after the noise is added, the high availability of the graph data under random disturbance can be ensured; determining the minimum value of a Lagrangian dual function under a constraint condition by setting the constraint condition which obeys a Laplace mechanism; converting the most value problem under the feasible domain into a saddle point problem of a Lagrangian function by using a dual optimization method; in addition, whether a feasible solution under first-order partial derivation is a minimum value point (namely a saddle point of the target function) is determined by utilizing a Hessian matrix judgment method, so that a specific probability distribution function and a specific probability density function of the graph data noise are determined.
CN202011509745.2A 2020-12-18 2020-12-18 Dual optimization-based high-availability graph data privacy protection method Pending CN112560094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011509745.2A CN112560094A (en) 2020-12-18 2020-12-18 Dual optimization-based high-availability graph data privacy protection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011509745.2A CN112560094A (en) 2020-12-18 2020-12-18 Dual optimization-based high-availability graph data privacy protection method

Publications (1)

Publication Number Publication Date
CN112560094A true CN112560094A (en) 2021-03-26

Family

ID=75031866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011509745.2A Pending CN112560094A (en) 2020-12-18 2020-12-18 Dual optimization-based high-availability graph data privacy protection method

Country Status (1)

Country Link
CN (1) CN112560094A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537055A (en) * 2018-03-06 2018-09-14 南京邮电大学 A kind of privacy budget allocation of data query secret protection and data dissemination method and its system
CN109492428A (en) * 2018-10-29 2019-03-19 南京邮电大学 A kind of difference method for secret protection towards principal component analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537055A (en) * 2018-03-06 2018-09-14 南京邮电大学 A kind of privacy budget allocation of data query secret protection and data dissemination method and its system
CN109492428A (en) * 2018-10-29 2019-03-19 南京邮电大学 A kind of difference method for secret protection towards principal component analysis

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LU QU ET AL.: "Releasing Correlated Trajectories: Towards High Utility and Optimal Differential Privacy", 《 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING》 *
吴宁博等: "基于率失真的差分隐私效用优化模型", 《计算机学报》 *
张双越等: "差分隐私下满足一致性的轨迹流量发布方法", 《计算机科学与探索》 *
杨剑哲等: "基于改进增广拉格朗日乘子法的鲁棒性主成分分析", 《哈尔滨工业大学学报》 *
杨庚等: "面向实时数据流的差分隐私直方图发布技术", 《南京邮电大学学报(自然科学版)》 *

Similar Documents

Publication Publication Date Title
Wang et al. Edge-based differential privacy computing for sensor–cloud systems
WO2021077642A1 (en) Network space security threat detection method and system based on heterogeneous graph embedding
CN110874488A (en) Stream data frequency counting method, device and system based on mixed differential privacy and storage medium
CN112367338A (en) Malicious request detection method and device
CN113254988B (en) High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment
Ye et al. Secure and efficient outsourcing differential privacy data release scheme in cyber–physical system
Liu et al. Face image publication based on differential privacy
Belorkar et al. Regeneration of events using system snapshots for cloud forensic analysis
Huang et al. An improved federated learning approach enhanced internet of health things framework for private decentralized distributed data
Chen et al. A Weight Possibilistic Fuzzy C‐Means Clustering Algorithm
Katsomallos et al. Privacy, space and time: A survey on privacy-preserving continuous data publishing
Yang et al. A differential privacy framework for collaborative filtering
CN116628360A (en) Social network histogram issuing method and device based on differential privacy
Li et al. Release connection fingerprints in social networks using personalized diffierential privacy
CN112560094A (en) Dual optimization-based high-availability graph data privacy protection method
Kong et al. CVDP k-means clustering algorithm for differential privacy based on coefficient of variation
Wurzenberger et al. Discovering insider threats from log data with high-performance bioinformatics tools
Li et al. A personalized differential privacy protection method for repeated queries
Rajkumar et al. Fuzzy-Dedup: A secure deduplication model using cosine based Fuzzy interference system in cloud application
Zheng et al. An Enhanced Differential Private Protection Method Based on Adaptive Iterative Wiener Filtering in Discrete Time Series
Gao et al. Similarity-based deduplication and secure auditing in IoT decentralized storage
Alshammari et al. Internet of things attacks detection and classification using tiered hidden Markov model
CN112822004A (en) Belief network-based targeted privacy protection data publishing method
Wei-wu et al. An efficient parallel anomaly detection algorithm based on hierarchical clustering
Wei et al. An improved (k, p, l)-Anonymity method for privacy preserving collaborative filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210326