CN117313135B

CN117313135B - Efficient reconfiguration personal privacy protection method based on attribute division

Info

Publication number: CN117313135B
Application number: CN202310965523.9A
Authority: CN
Inventors: 罗凯伦; 李睿; 阮华锋
Original assignee: Dongguan University of Technology
Current assignee: Dongguan University of Technology
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2024-04-16
Anticipated expiration: 2043-08-02
Also published as: CN117313135A

Abstract

The invention relates to a personal privacy protection method capable of being efficiently reconstructed based on attribute division, which comprises a calculation inference set step and an attribute segmentation step; the step of computing the inference set comprises: carrying out inference set calculation on an input data set by a machine learning method, and adopting a recursion mode to input the inference set obtained by each calculation as a new sensitive attribute set in a machine learning algorithm for learning in the calculation process until the generated inference set tree converges or reaches a set layer number; the attribute segmentation step includes: and determining the values of the number of servers and the total number of servers which are allowed to be leaked, converting the privacy setting into corresponding constraint expressions, inputting the constraint expressions into a SAT solver, and obtaining a final attribute segmentation result through the SAT solver. According to the invention, through a calculation inference set algorithm and an attribute segmentation algorithm, privacy protection and segmentation of sensitive attributes are realized, and the security of data privacy is improved.

Description

Efficient reconfiguration personal privacy protection method based on attribute division

Technical Field

The invention relates to the technical field of network security, in particular to a personal privacy protection method which belongs to division and can be efficiently reconstructed.

Background

Current research in vertical data partitioning is mainly focused on partitioning a data set into specific subsets according to given privacy constraints and meeting proposed privacy settings; these studies propose privacy settings based on a hypothetical threat model and satisfy given privacy settings by means of attribute segmentation computation methods, where privacy constraints are typically expressed in terms of sets of attributes.

Accordingly, in solving the privacy protection problem of vertical data partitioning, the prior art has the following problems: 1. there are processing difficulties in the multi-server case: the prior art has studied about the case involving only two servers, but in practical applications, more servers may need to be handled, and in the case of multiple servers, the segmentation and privacy protection of data become more complex, and collaboration and communication between different servers need to be considered, which the prior art does not give an explicit solution. 2. An additional trusted server is needed: some of the prior art solutions rely on additional trusted servers to perform storage and querying tasks to protect the privacy of data. However, introducing trusted servers increases latency and usage costs of network queries, which may not be practical for large-scale data storage and query scenarios. 3. Computing power dependent on data encryption or trusted server: some prior art techniques use the computational power of data encryption or trusted servers in data storage and querying. This may limit the reconstruction of data and flexibility of queries, especially in scenarios where frequent queries or data operations are required.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a personal privacy protection method based on attribute division and capable of being efficiently reconstructed, and solves the problems of the prior method.

The aim of the invention is achieved by the following technical scheme: an efficient reconfigurable personal privacy protection method based on attribute division comprises a calculation inference set step and an attribute segmentation step;

the step of computing the inference set comprises: carrying out inference set calculation on an input data set by a machine learning method, and adopting a recursion mode to input the inference set obtained by each calculation as a new sensitive attribute set in a machine learning algorithm for learning in the calculation process until the generated inference set tree converges or reaches a set layer number;

the attribute segmentation step includes: and determining the values of the number of servers and the total number of servers which are allowed to be leaked, converting the privacy setting into corresponding constraint expressions, inputting the constraint expressions into a SAT solver, and obtaining a final attribute segmentation result through the SAT solver.

The step of calculating the inference set specifically comprises the following steps:

initializing an inference set tree T, traversing all subsets X of the attribute set, and calculating an inference set Y=L (X) of the attribute subset X through a machine learning algorithm L;

taking the inferred set Y as a new sensitive attribute set S, namely S=Y, and judging whether an ending condition is met or not;

ending if the generated inference set Y is empty or the depth of the inference set tree T reaches the set layer number h, and adding the inference set Y into the inference set tree T to serve as a child node of the inference set tree T;

repeating the steps, and continuing to traverse the next subset of the attribute set until the traversal of all subsets in the attribute set is completed.

The input data set and sensitive attribute set s= { S1, S2,..sm }, where Si is a sensitive attribute, and the data set includes attribute set d= { A1, A2,..an }, are also required before initializing the inference set tree.

The attribute segmentation step specifically comprises the following steps:

determining parameters in the privacy settings: determining the value of the number k of servers and the number n of total servers of the highest allowable leakage;

constructing a constraint expression: constructing a constraint expression, and converting privacy setting into constraints of SAT problems;

inputting the constructed constraint expression into a SAT solver, and obtaining an attribute segmentation scheme meeting the constraint by the SAT solver through solving the constraint expression;

and outputting an attribute segmentation scheme obtained by the SAT solver, and distributing each attribute to a corresponding server to realize personal privacy protection.

The constraints include:

constraint one: each attribute must be assigned to a certain server;

constraint II: negating all privacy settings from being allowed to occur;

constraint three: additional constraints set by the user themselves.

The personal privacy protection method further includes: setting a sensitive attribute set, an inference set and a security model before performing the step of calculating the inference set;

setting the sensitive attribute set includes: is provided withIs a set of attributes in the data, sensitive attribute set +.>

The set of settings inferences includes: is provided withIs a set of attributes in the data, data set +.>L is a machine learning algorithm if P (L (U) =S)>v, then U is the dataset +.>P (L (U) =s) is the inference rate of data sets U to S, c is the privacy preserving intensity;

setting the security module includes: is provided withIs a set of attributes in the data, with a sensitive set of attributes S, will +.>Distributed to servers { X _i In I epsilon {1 … n }, k servers are selected to combine the attributes +.> Representing data set +.>K < n.

The invention has the following advantages: the personal privacy protection method based on attribute division and capable of being efficiently reconstructed realizes privacy protection and segmentation of sensitive attributes through a calculation inference set algorithm and an attribute segmentation algorithm, improves the security of data privacy, converts an attribute segmentation problem into an SAT problem which is widely researched by adopting an SAT solver and a constraint expression mode, provides an effective solution idea, can obtain a better attribute segmentation scheme, and reduces the average time consumption of query operation by 20% compared with the conventional method due to efficient reconstruction.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Accordingly, the following detailed description of the embodiments of the present application, provided in connection with the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application. The invention is further described below with reference to the accompanying drawings.

The invention relates to a personal privacy protection method capable of being reconstructed with high efficiency based on attribute division, which comprises a calculation inference set algorithm part and an attribute segmentation algorithm part, wherein the calculation inference set algorithm part adopts a machine learning method to calculate an inference set through an input data set; the attribute segmentation algorithm part converts the attribute segmentation problem into a boolean Satisfiability (SAT) problem. First, the values of the number of servers and the total number of servers that are allowed to leak are determined, and privacy definitions are converted into corresponding constraint expressions. These constraint expressions are input into the SAT solver, through which the final attribute segmentation scheme is obtained.

As shown in fig. 1, the following are specifically included:

the following settings were first made:

Then input data set and sensitive attribute set, input data set, contain attribute set d= { A1, A2,..an }, input sensitive attribute set s= { S1, S2,..sm }, where Si is a sensitive attribute.

S101: adopting a machine learning method, and carrying out inference set calculation through an input data set;

the method specifically comprises the following steps:

initializing an inference set tree T, traversing all subsets X of the attribute set, and calculating an inference set Y=L (X) of the attribute subset X through a machine learning algorithm L, wherein X is a subset of D;

repeating the steps, and continuing to traverse the next subset of the attribute set until the traversing of all subsets in the attribute set is completed S102: based on the inference set, converting the privacy security definition into a constraint expression, inputting the constraint expression into a SAT solver to calculate an attribute division scheme, and returning the result to the user;

the method specifically comprises the following steps:

determining the value of the number k of servers and the number n of total servers of the highest allowable leakage;

constructing a constraint expression, and converting the privacy setting into the constraint of the SAT problem, wherein the specific constraint comprises the following steps:

constraint one: each attribute must be assigned to a certain server, expressed as the formula: vpi is a logical proposition atom representing that the property p is stored in server i. Sign->Is a logical not; />Indicating that Vpi is taken as not, i.e. that the attribute p is not stored in server i. The symbol V is a logical OR; vpi Vpj indicates that Vpi or Vpj is satisfied. The symbol Λ is a logical AND; vpi a Vpj indicates that Vpi and Vpj are established simultaneously. a represents the number of attributes p; the range of the attribute p is denoted here as 1 to a.

Constraint II: negating the case where all privacy definitions are not allowed to occur;

constraint three: additional constraints that are user-defined.

The foregoing is merely a preferred embodiment of the invention, and it is to be understood that the invention is not limited to the form disclosed herein but is not to be construed as excluding other embodiments, but is capable of numerous other combinations, modifications and environments and is capable of modifications within the scope of the inventive concept, either as taught or as a matter of routine skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims

1. An efficient reconfigurable personal privacy protection method based on attribute division is characterized in that: the personal privacy protection method comprises a step of calculating an inference set and a step of dividing attributes;

the attribute segmentation step includes: determining the number of servers which are allowed to be leaked and the value of the total number of servers, converting privacy setting into corresponding constraint expressions, inputting the constraint expressions into a SAT solver, and obtaining a final attribute segmentation result through the SAT solver;

repeating the steps, and continuing to traverse the next subset of the attribute set until the traversal of all subsets in the attribute set is completed;

the attribute segmentation step specifically comprises the following steps:

2. The efficient reconfigurable personal privacy protection method based on attribute partitioning of claim 1, wherein: the input data set and sensitive attribute set s= { S1, S2,..sm }, where Si is a sensitive attribute, and the data set includes attribute set d= { A1, A2,..an }, are also required before initializing the inference set tree.

3. The efficient reconfigurable personal privacy protection method based on attribute partitioning of claim 1, wherein: the constraints include:

constraint one: each attribute must be assigned to a certain server;

constraint II: negating all privacy settings from being allowed to occur;

constraint three: additional constraints set by the user themselves.

4. A method for efficient reconfigurable personal privacy protection based on attribute partitioning as in any of claims 1-3 wherein: the personal privacy protection method further includes: setting a sensitive attribute set, an inference set and a security model before performing the step of calculating the inference set;

setting the security module includes: is provided withIs a set of attributes in the data, with a sensitive set of attributes S, will +.>Distributed to servers { X _i In I epsilon {1 … n }, k servers are selected to combine the attributes +.>Representing data set +.>K < n.