CN117235800A - Data query protection method of personalized privacy protection mechanism based on secret specification - Google Patents
Data query protection method of personalized privacy protection mechanism based on secret specification Download PDFInfo
- Publication number
- CN117235800A CN117235800A CN202311416556.4A CN202311416556A CN117235800A CN 117235800 A CN117235800 A CN 117235800A CN 202311416556 A CN202311416556 A CN 202311416556A CN 117235800 A CN117235800 A CN 117235800A
- Authority
- CN
- China
- Prior art keywords
- query
- attribute
- sensitive
- mean
- median
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013500 data storage Methods 0.000 claims abstract description 18
- 230000035945 sensitivity Effects 0.000 claims description 21
- 238000000638 solvent extraction Methods 0.000 claims 1
- 230000006870 function Effects 0.000 description 92
- 101000826116 Homo sapiens Single-stranded DNA-binding protein 3 Proteins 0.000 description 12
- 102100023008 Single-stranded DNA-binding protein 3 Human genes 0.000 description 12
- 238000007405 data analysis Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 4
- 230000036541 health Effects 0.000 description 4
- 230000003862 health status Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- XOFYZVNMUHMLCC-ZPOLXVRWSA-N prednisone Chemical compound O=C1C=C[C@]2(C)[C@H]3C(=O)C[C@](C)([C@@](CC4)(O)C(=O)CO)[C@@H]4[C@@H]3CCC2=C1 XOFYZVNMUHMLCC-ZPOLXVRWSA-N 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Landscapes
- Storage Device Security (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data query protection method of a personalized privacy protection mechanism based on secret specifications, which comprises the following steps: the access device sends a query instruction to the data storage device; the data storage device performs: analyzing the query instruction to obtain a query function and a query attribute; extracting a query attribute dataset; dividing the query attribute dataset into a sensitive subset and a non-sensitive subset based on the set of user secret specifications; obtaining a mean value query result of the query attribute dataset according to a pre-constructed Laplace mechanism based on a secret specification; obtaining a median query result of the query attribute dataset according to a pre-constructed exponential mechanism based on a secret specification; and issuing a mean query result and/or a median query result. The privacy protection scope and the protected entity are accurately defined through the secret specification, strict constraint that all attribute records of data are regarded as sensitive is avoided, less data distortion and more accurate data query results are provided, and balance of privacy protection and data utility is achieved.
Description
Technical Field
The invention relates to the technical field of data security, in particular to a data query protection method of a personalized privacy protection mechanism based on secret specifications.
Background
With the advent of the big data age, a large amount of sensitive data including personal identification information, medical records, financial transactions, etc. are stored in databases, and in order to support various research, business decisions and government policy formulation, the demand for statistical queries for these data is rapidly increasing, and statistical queries such as mean and median are important tools for knowing data distribution and trend, and they provide key insight about data for various industries and industries. For example, a researcher may need to calculate the average age of a patient in a medical study, or a financial institution may need to find a median credit score for a customer, or obtain the average age, median annual revenue, etc. from a census dataset. These queries typically involve sensitive information and therefore require effective privacy protection.
Differential privacy (Differential Privacy, DP for short) is becoming the gold standard for privacy protection due to its theoretical provability and robustness to adversaries with priori knowledge, and can protect individual privacy while meeting data analysis requirements, and can be applied to common data analysis operations such as mean and median query, so as to ensure that sensitive data is properly protected. Legal frameworks such as Health Insurance Portability and Accountability Act (HIPAA) privacy rules, family education rights and privacy act (fera) and general data protection act (GDPR) of the european union are working to ensure that organizations and individuals adhere to transparent, fair and secure principles in collecting, processing and sharing personal information. In addition, recent privacy regulations in california in the united states, including california consumer privacy laws and california privacy laws, also strengthen individuals' control over their personal information and prescribe transparency of the organization in collecting and using data. A common goal of these privacy laws and standards is to protect individual privacy, give individuals rights to how their data is used, and to manage and limit their ability to share and process data. From the perspective of legislation and policy, users have the right to control their own privacy and exhibit personalized privacy requirements. The personalized concept is rooted in a unique cultural background, personal privacy preference or social factors of each person, and reflects the difference of privacy expectations of different users.
However, when it comes to control of an individual's sensitivity to its own data, traditional differential privacy tends to be too strict, which generally treats all of the data in the dataset that is relevant to the individual as inherently sensitive, whereas in practice, not all of the information that is relevant to the individual is treated as sensitive and requires the same level of protection due to differences in individual privacy preferences and attitudes. Consider a scenario like an intelligent building management system that processes a large amount of sensor data and personal information, including user location details and health indicators. Notably, individuals may have different perspectives on the sensitivity of these attributes. Some users may consider their location information sensitive, while health data is not. In contrast, others may treat health data as sensitive data, while treating location information as non-sensitive information. Furthermore, some consider both of these properties to be either sensitive or insensitive. Therefore, the users of the traditional differential privacy data protection method cannot independently define own secret specifications and cannot accurately protect the data protection range, and query result data obtained during data query is large in data distortion and low in utility.
The census data set usually contains information such as age, gender, annual income, telephone number, health status and the like, but the information such as annual income, health status, age and the like relates to personal privacy of users, different privacy setting requirements are needed for different users, for example, for part of individuals, the specific annual income of the individuals may not be wanted to be disclosed, and part of users may not want to disclose the age or health status of the individuals, so as to prevent information leakage from being used by criminals or invalid promotion.
Disclosure of Invention
The invention aims to solve the technical problems existing in the prior art and provides a data query protection method and a census data set query protection method of a personalized privacy protection mechanism based on secret specifications.
To achieve the above object of the present invention, according to a first aspect of the present invention, there is provided a data query protection method of a personalized privacy protection mechanism based on a secret specification, comprising: the access device sends a query instruction to the data storage device; the data storage device performs: receiving and analyzing a query instruction to obtain a query function and a query attribute; extracting a query attribute dataset from the set dataset; acquiring a user secret specification set, and dividing a query attribute data set into a sensitive subset and a non-sensitive subset based on the user secret specification set; when the query function is a mean query function, a mean query result of the query attribute dataset is obtained according to a pre-constructed Laplacian mechanism based on a secret specification; when the query function is a median query function, a median query result of the query attribute dataset is obtained according to a pre-constructed exponential mechanism based on a secret specification; and issuing a mean query result and/or a median query result of the query attribute dataset to the access device.
The technical scheme is as follows: allowing a user individual to set which attribute records are sensitive and which attribute records are not sensitive based on a secret specification, helping to accurately define the scope of privacy protection and protected entities, avoiding that the traditional differential privacy method treats all attribute records of data as sensitive strict constraints, enabling privacy protection to be more flexible and personalized, and simultaneously providing less data distortion and more accurate data query results; the method is characterized in that a Laplace mechanism SSLM based on a secret specification is provided and applied to mean value query, an exponential mechanism SSEM based on the secret specification is provided and applied to median value query, the accuracy of data analysis is improved, meanwhile, data distortion is reduced to the greatest extent, particularly when a large part of data is insensitive, the balance of privacy protection and data utility is achieved, compared with the most advanced differential privacy framework mechanism, the SSLM is improved by about 14 times for mean value query by utilizing insensitive data, and the SSEM is improved by about 6 times for median value query by utilizing insensitive data.
To achieve the above object of the present invention, according to a second aspect of the present invention, there is provided a census data set query protection method, including: the access device sends a query instruction to the data storage device which stores the census data set; the data storage device performs: receiving and analyzing a query instruction to obtain a query function and a query attribute, wherein the query attribute comprises age and annual income; extracting a query attribute dataset from the census dataset; acquiring a user secret specification set, and dividing a query attribute data set into a sensitive subset and a non-sensitive subset based on the user secret specification set; when the query function is a mean query function, a mean query result of the query attribute dataset is obtained according to a pre-constructed Laplacian mechanism based on a secret specification; when the query function is a median query function, a median query result of the query attribute dataset is obtained according to a pre-constructed exponential mechanism based on a secret specification; and issuing a mean query result and/or a median query result of the query attribute dataset to the access device.
The technical scheme is as follows: allowing a user individual to set which attribute records of the population census data set are sensitive and which attribute records are not sensitive based on a secret specification, helping to accurately define the privacy protection range and protected entities, avoiding the traditional differential privacy method from treating all attribute records of the population census data set as sensitive strict constraints, enabling privacy protection to be more flexible and personalized, and simultaneously providing less data distortion and more accurate data query results; a secret-specification-based laplace mechanism (ssmm) is proposed and applied to population census dataset mean value query, a secret-specification-based exponential mechanism (ssmm) is proposed and applied to population census dataset median query, improving data query accuracy while minimizing data distortion, and particularly when a significant portion of the data is non-sensitive, achieving a balance of privacy protection and data utility, the ssmm improving utility by a factor of about 14 for mean value queries and about 6 for median queries by using non-sensitive data, as compared to the most advanced differential privacy framework mechanisms.
Drawings
FIG. 1 is a flow chart of a method for protecting data query of a personalized privacy protection mechanism based on secret specifications in a preferred embodiment of the invention;
FIG. 2 is a first example of the present invention calculating a median score function value;
FIG. 3 is a second example of the present invention calculating a median score function value;
FIG. 4 is a flow chart of a method for protecting a human mouth screening dataset query in another preferred embodiment of the present invention;
FIG. 5 is a graph showing the RMSE variation of SSLM in the mean query result as the scale of the non-sensitive attribute is changed in another preferred embodiment of the invention;
FIG. 6 is a graph showing the variation of the RMSE of the SSLM in the mean value query result as the privacy protection level is changed in another preferred embodiment of the present invention;
FIG. 7 is a graph showing the variation of RMSE for SSEM in median query results as the scale of non-sensitive attributes is varied in accordance with another preferred embodiment of the present invention;
fig. 8 is a diagram showing RMSE variation of SSEM in median query results as the degree of privacy protection varies in another preferred embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present application, it should be understood that the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present application and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present application.
In the description of the present application, unless otherwise specified and defined, it should be noted that the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, mechanical or electrical, or may be in communication with each other between two elements, directly or indirectly through intermediaries, as would be understood by those skilled in the art, in view of the specific meaning of the terms described above.
From the perspective of legislation and policy, users have the right to control their own privacy and exhibit personalized privacy requirements. The personalized concept is rooted in a unique cultural background, personal privacy preference or social factors of each person, and reflects the difference of privacy expectations of different users. The existing research on individuation of differential privacy is mainly reflected in individuation of privacy budget in a Differential Privacy (DP) framework, namely, an individual designates privacy protection intensity of own data, the individual cannot set which data are sensitive to be protected and which data are not sensitive to be protected according to privacy requirements, when the control of the sensitivity of the individual to the own data is involved, the traditional differential privacy is often too strict, and the traditional differential privacy usually regards all data related to the individual in a data set as inherent sensitivity, so that the problems of large data distortion, increased calculation cost, lower accuracy when inquiring access data for later analysis and the like are caused. Based on this, the present application seamlessly integrates the secret specification into the differential privacy framework, introducing differential privacy (SSDP) based on the secret specification.
The relevant definitions and mechanisms of the differential privacy based on Secret Specifications (SSDP) proposed by the present application are explained and illustrated below.
Setting the data set as the data set to be protected, and representing the set data set as D 0 =(r 1 ,…,r i …,r n ) N represents the number of users, user index i ε [1, n],r i Representing user u i Is used for the recording of the number of (c), representing a user set, r i Represents a k-dimensional variable,/->k represents the attribute dimension, r i From the domain-> Representing user u i Attribute A of (2) j Record, attribute index j E [1, k ]];An attribute record is represented, which can be set as a sensitive record or a insensitive record according to the secret specification setting of the user, and the sensitive record, the insensitive record and the sexual record are variables and can be different record values. The set data set may be a census data set.
Definition one (secret specification): user u i Is formally defined as a binary function S i :r i →{0,1} k WhereinIs with user u i A single k-dimensional attribute record of relevance. Function S i Determining record r i Each of the attributes (e.g. A j (1. Ltoreq.j.ltoreq.k)). If the secret canonical function value S of attribute j i (A j ) Let it be regarded as attribute a =1 j Record as sensitive record, if the secret canonical function value S of attribute j i (A j ) =0, consider attribute a j The record is a insensitive record.
By definition one, each user-specified secret specification divides its record (containing the k attribute) into two sub-records. Specifically, function S i Will record r i The attribute values in (a) are divided into two sub-records: sensitive sub-recordsAnd non-sensitive sub-record->In particular, when k=1, i.e. one record is associated with only one attribute, then the user' S secret canonical set S will set up the dataset D 0 Dividing into two subsets, namely sensitive data subset D s And a subset of non-sensitive data D ns 。
Definition of two%-adjacent recordings): let->Representation +.>A set of related secret specifications. User u i Is recorded of (a)And a record->Is->Adjacent, satisfy for any j, when
Definition of three%-adjacent data sets): data set D and data set D' are +.>Adjacent, if and only if one record in D is different from D' and the different record is +.>Adjacent ones.
Definition of four%Sensitivity): for any pair->Adjacent data sets D and D', the global sensitivity of the query function f being denoted Δf s And take L as 1 Norm measurement:
the application starts from the privacy rights of users, realizes individuation of personal privacy requirements by allowing the users to independently specify the sensitivity of self-set records, and proposes differential privacy (Secret Specification-based Differential Privacy, SSDP) based on secret specifications.
Definition five (differential privacy SSDP based on secret specification): in secret specificationIn the background of (a) a random algorithm M satisfies +.>If for any->Neighboring data sets D and D' andany subset o, algorithm in Range (M)>The method meets the following conditions:
range (M) represents a random algorithmOutput space of->Expressed in random algorithm->Action on data set D to obtain results +.>Probability of->Indicating that the result is obtained in the random algorithm M acting on the data set D +.>Is a probability of (2). E represents a preset degree of privacy protection, E > 0.
The main objective of the differential privacy SSDP based on the secret specification proposed by the present application is to protect sensitive records in a set data set, which is significantly characterized in that the sensitivity of the recorded attributes is determined by the user himself, independently of the recorded values of these attributes. Meaning that changing the value of a sensitive record does not affect its sensitivity, thus creating a symmetric neighborhood relationship. As set forth in definition five, for sensitive properties of a record, differential privacy SSDP based on secret specifications ensures the same level of privacy protection as traditional differential privacy DP, thus protecting against powerful attacks.
The application proposes a Laplace mechanism (Secret Specification-based Laplace Mechanism, SSLM) based on a secret specification as a basic mechanism for implementing a differential privacy SSDP scheme based on the secret specification, and applies it to mean value query of a database or a set of data sets.
The laplace mechanism SSLM based on the secret specification has the following theorem:
theorem one (laplace mechanism SSLM based on secret specification): given a function f: D→R and secret Specification of usersDenoted as->SSLM of->Wherein->Represents +.>Sensitivity, R represents the value range of the function f, and R represents the real number range when the function f is a median query function or a mean query function. Lap (. Cndot.) represents the lapalce distribution probability density function.
Theorem two (laplace mechanism SSLM based on secret specification): when the query function f is noted as a mean function, SSLM, i.eSatisfy->
The application also provides an exponential mechanism (Secret Specification-based Exponential Mechanism, SSEM) based on the secret specification as a basic mechanism for realizing the SSDP mechanism, and the exponential mechanism is applied to the median query of the database and the set data set.
Consider a query function f: D→O, where the query function may be a median query function, and the real-valued score function for the output space O of the function f may be expressed asWherein D represents the satisfied f (D) formed by changing the record values of any number of sensitive records in the dataset D * ) Data set of =o;Representing a set of distinct sensitive records in D and D; / >Representation set->The base number of (1), namely the number of the set sensitive records, is also the number of the sensitive records changing the record value in the process of transforming from the data set D to the D; the value of the real-valued score function s (D, o) is negative, the meaning of the calculation formula is expressed in +.>The number of sensitive records is the smallest (corresponding s (D, o) is the largest),>the number of the sensitive records is inverted to obtain a score function value;Is the global sensitivity of the scoring function s. For basic statistical functions such as median (median) query functions, Δs=1.
Definition six (exponential mechanism EM): let O represent a random algorithmThe set of all possible outputs, i.e. output space. For scoring function->If->Probability of producing the output O in O and +.>Proportional thenMeets the E-DP. Wherein D and D' are +.>-adjacent data sets.Representing the real number domain.
According to secret specificationsThe present application modifies the scoring function s (D, o) of the original exponential mechanism and expresses it in another way, namely +.>Wherein r represents->Sensitive record associated with one-dimensional attribute, and +.> Representation of the acquisition of D in the dataset D * During the process of (1), the minimum number of sensitive records is changed to f (D * ) Obtaining data set D when =o * The number of sensitive records changed at this time is multiplied by ∈max +. >Values.
Definition seven (exponential mechanism based on secret specification SSEM): given a function f: D-O and user's secret Specification setSSEMWith probability->Output O, z, o.epsilon.O, +.>Representing the median query result of the output o according to the probability. exp (·) represents the exponential probability distribution density function.
Privacy analysis:
theorem two (laplace mechanism SSLM based on secret specification): when the query function f is noted as a mean function, SSLM, i.eSatisfy->Meets the privacy requirement.
Theorem three: SSEM, when the query function f is noted as a median function, i.eSatisfy->SSDP, complying with privacy requirements.
The application provides a data query protection method of a personalized privacy protection mechanism based on secret specifications, as shown in fig. 1, in a preferred embodiment, the method comprises the following steps:
in step S101, the access device sends a query instruction to the data storage device. The access device is preferably but not limited to a mobile terminal or a PC or a notebook. The data storage device is preferably, but not limited to, a data server or cloud server. The access device and the data storage device communicate via an internet connection.
The data storage device performs:
step S102, receiving and analyzing the query instruction to obtain a query function and a query attribute. The query instruction contains a query function and a query attribute, and because the record of each user in the set data set comprises a multidimensional attribute, if average age or annual income is required to be obtained from the census data set, the age attribute or annual income attribute data set needs to be processed.
Step S103, extracting a query attribute dataset from the set dataset, including:
the set data set is denoted as D 0 =(r 1 ,…,r i …,r n ) N represents the number of users, r i Representing user u i Is recorded with user index i e 1, n],k represents the attribute dimension, r i From the domain-> Representing user u i Attribute A of (2) j Record, attribute index j E [1, k ]]The method comprises the steps of carrying out a first treatment on the surface of the Let the query attribute be attribute A j Query the attribute dataset
Step S104, a user secret specification set is obtained, and the query attribute data set is divided into a sensitive subset and a non-sensitive subset based on the user secret specification set. The user secret specification set comprises binary functions of all attributes of the user, after query attributes are resolved, the secret specification set of the query attributes is extracted from the user secret specification set, and the query attribute data set is divided into a sensitive subset and a non-sensitive subset based on the extracted secret specification set. Specifically, step S104 includes:
step S1041, user secret Specification setComprising setting secret specifications of all users corresponding to the data set, user u i Is defined as a binary function: s is S i :r i →{0,1} k If user u i Defining attribute A j Recording as sensitive recording, user u i The secret specification is in attribute A j Is a number S of i (A j ) =1, if user u i Defining attribute A j Recording as insensitive recording, user u i The secret specification is in attribute A j Is a number S of i (A j )=0。
Step S1042, set attribute A j To query the attributes, perform for all users: if user u i The secret specification is in attribute A j Is a number S of i (A j ) =1, user u will i Attribute A of (2) j RecordingThe recorded values falling into the sensitive subset D s If user u i The secret specification is in attribute A j Is a number S of i (A j ) =0, user u will i Attribute A of (2) j Record->The recorded values falling into a non-sensitive subset D ns 。
The query attribute data set is divided into the sensitive subset and the non-sensitive subset through the step S104, the sensitive subset is protected in the mean query or the median query, the non-sensitive subset is not protected, the distortion of the query data can be reduced while the sensitive data is protected, and the utility of the query data is improved.
Step S105, when the query function is a mean query function, a mean query result of a query attribute dataset is obtained according to a pre-constructed Laplace mechanism based on a secret specification;
when the query function is a median query function, a median query result of the query attribute dataset is obtained according to a pre-constructed exponential mechanism based on the secret specification.
Step S106, issuing a mean query result and/or a median query result of the query attribute dataset to the access device.
In this embodiment, preferably, in step S105, when the query function is a mean query function, the obtaining the mean query result of the query attribute dataset according to the pre-constructed laplace mechanism based on the secret specification includes:
step A1, according to the sensitive subset D s And a non-sensitive subset D ns Calculating the mean value f of a query attribute dataset mean (D):
Wherein, I.I represents the cardinality of the data set, namely the number of the data in the data set; f (f) mean (. Cndot.) represents the mean query function, f mean (D s ) Representing sensitive subset D s Mean value of f mean (D ns ) Representing non-sensitive subset D ns Is a mean value of (c).
Step A2, constructing a query attribute datasetAdjacent data set D', in particular, constructed according to definition three above-adjacent dataset D', global sensitivity of obtaining mean query function +.>
Wherein f mean (D') represents->The mean value of the neighboring data set D', I 1 Represents L 1 Norms.
Step A3, at the mean value f of the query attribute dataset mean (D) Adding noise satisfying Laplace distributionObtaining mean query results->
Wherein Lap (·) represents a lapalce distribution probability density function, ε represents a preset degree of privacy protection, and ε > 0.
It can be seen that non-sensitive subset D is utilized ns The average value query result can be improved Minimizing data distortion while satisfying Laplace distributed noise versus sensitive subset D s Privacy protection is carried out on the average value of the data, personalized privacy protection is achieved, and convenient and effective data analysis is achieved.
In a simplified application scenario of this embodiment, the specific procedure of mean value query is as follows:
step 101: input data set D, user's secret specification setPrivacy budget e, mean query function f mean 。
Step 102: for the mean query, consider the case where each record is associated with only one attribute, i.e., k =1, the sensitivity of the user record is consistent with the sensitivity of the record attribute. The user' S secret specification set S will dataset d= (r 1 ,…,r n ) Dividing into two subsets, namely sensitive data subset D s And a subset of non-sensitive data D ns . Based on this we initialize sensitive subsets of the datasetAnd non-sensitive subset->
Step 103: user-based secret specification setWe can get the sensitive subset D s And a non-sensitive subset D ns . Without loss of generality, assume D s =(r 1 ,r 2 ,…r m ),D ns =(r m+1 ,r 2 ,…r n ) And each record is associated with a numerical attribute. Based on the obtained D s And D ns We can derive the mean query result about dataset D as +.>Where || represents the cardinality of the data set.
Step 104: let data set d=d s ∪D ns And D '=d' s ∪D′ ns Is thatAdjacent, we can get |D s |=|D′ s I, and D ns =D′ ns . First calculate the mean query function +.>Sensitivity, since no insensitive records in the dataset D are present +.>-adjacent recordings, thus->It is worth emphasizing that, in comparison to the global sensitivity Δf of the mean query, the +.>Then, the sensitive subset D is given s The calculated average query result f (D s ) Addition satisfies Laplace distribution->Noise of (i.e.)>
Step 105: returning noise mean query results
In this embodiment, preferably, in step S105, when the query function is a median query function, the median query result of the query attribute dataset is obtained according to a pre-constructed exponential mechanism based on the secret specification, including:
step B1, for query attribute records in query attribute dataset D (i.e.) The recorded values of (2) are ordered from small to large, i.e. the ordered query attribute dataset D satisfies r i ≤r i+1 。
Step B2, setting the base |d|=2m+1 of the ordered query attribute dataset D, a first intermediate parameterLet O denote the median output space of the query attribute dataset D. The median query function returns the record values in the query dataset D as median query results, and therefore ,r i E O, i.e. the recorded value belongs to the median space O.
Constructing a query attribute dataset-adjacent dataset D', in particular +_ constructed according to definition three above>Neighboring data set D', since the data set d=d is assumed s ∪D ns And D '=d' s ∪D′ ns Is->Adjacent, we can get |D s |=|D′ s I, and D ns =D′ ns 。D′ s Representation->Sensitive subsets of adjacent datasets D ', D' ns Representation->A non-sensitive subset of the neighboring data set D'. It has to be emphasized that according to the secret specification S, no non-sensitive records in the data set D are present +.>Adjacent recordings, thus disregarding changes of non-sensitive recordings.
Step B3, for any output median O, O E O, calculating the score function value of the median O according to the following formula
Wherein D represents the satisfaction f formed by changing the record value of any number of sensitive records in the query attribute dataset D med (D * ) A dataset of =o, in which the insensitive record value does not change;Representing a set of distinct sensitive records in D and D; r represents->Sensitive recording of (c);is indicated at->When the number of sensitive records is minimum, namely, when the number of sensitive records obtained by D is minimum,/I>The product obtained by multiplying the sensitive record number of (E) is a score function value of a median o, f med (. Cndot.) represents a median query function, e represents a preset degree of privacy protection, e > 0.
Further preferably, the score function value of the median o is obtained rapidlyFor a pair ofReasoning demonstrates that the score function value can be obtained quickly for any record r as follows i E D and r i E O, calculate->There are three cases:
(1) If i.ltoreq.m, meaning that r is recorded i The m-i sensitive records on the right are changed at this time
(2) If i=m, meaning that no sensitive record is changed, then
(3) If i.gtoreq.m, meaning record r i The i-m sensitive records on the left are changed at this time
Step B4, calculating the output probability of all the medians in the median output space O, and setting the output probability of the medians OThe method comprises the following steps:
step B5, taking the median value output according to the output probabilities of all median values in the median value output space O as a median value query result
In a simplified application scenario of this embodiment, the median query specific process is as follows:
step 101: input data set D, user's secret specification setPrivacy budget e, mean query function f med 。
Step 102: for a median query, consider the case where each record is associated with only one attribute, i.e., k=1, where the sensitivity of the user record and the sensitivity of the record attribute agree. The user' S secret specification set S will dataset d= (r 1 ,…,r n ) Dividing into two subsets, namely sensitive data subset D s And a subset of non-sensitive data D ns . Based on this, we initialize the sensitive children of the datasetCollection setAnd non-sensitive subset->
Step 103: user-based secret specification setWe can get the sensitive subset D s And a non-sensitive subset D ns . Without loss of generality, we assume |d|=2m+1, and r i ≤r i+1 Where |·| represents the cardinality of the dataset. Median function f med Records in dataset D ranked m are returned so we can obtain the true median value of dataset D as f med (D)=r m 。
Step 104: since the data set d=d is assumed s ∪D ns And D '=d' s ∪D′ ns Is thatAdjacent, we can get |D s |=|D′ s I, and D ns =D′ ns . It has to be emphasized that according to the secret specification S, no non-sensitive records in the data set D are present +.>Adjacent recordings, so we do not consider the change of non-sensitive recordings.
For any record r i E D, calculateThere are three cases:
(1) If i.ltoreq.m, meaning that r is recorded i The m-i sensitive records on the right are changed at this time
(2) If i=m, meaning that no record is changed, then
(3) If i.gtoreq.m, meaning record r i The i-m sensitive records on the left are changed at this time
Step 105: returning noise median query results
For the convenience of detailed description, the score function value of the median o in the process of obtaining the median query result Two examples of calculations are shown.
Example 1: d= (1, 2,3,4,5,6, 7), D s =(1,3,5,6),D ns = (2, 4, 7), ∈1D = (1, 2,3,4,5,6, 7), o= (1, 2,3,4,5,6, 7), fig. 2 shows the score function value for each median value that may be output, and the probability for each median value.
Example 2: d= (1, 2,3,4,5,6, 7), D s =(1,3,5),D ns = (2, 4,6, 7), ∈= 1,O = (1, 2,3,4,5,6, 7), and fig. 3 shows the score function value for each median value that may be output, and the probability for each median value.
The tables in FIGS. 2 and 3 illustrateSpecific examples of the calculation process. The two tables differ in that the secret specifications for the query data set D are different, resulting in different proportions of sensitive records (sensitive attribute records) in the query data set D. It is evident from the results in the tables of fig. 2 and 3 that when the proportion of sensitive records is relatively reduced (or less than 50%), the fraction in DThe record cannot be considered as a median output. This phenomenon is illustrated in FIG. 3, wherein +.>This will increase the output probability of other recorded values (possibly median values) and thus enhance the data utility.
The invention also discloses a method for protecting the query of the census data set, and in a preferred embodiment, as shown in fig. 4, the method comprises the following steps:
In step S201, the access device sends a query instruction to a data storage device storing a population census data set. The access device is preferably but not limited to a mobile terminal or a PC or a notebook. The data storage device is preferably, but not limited to, a data server or cloud server for government or public security systems. The access device and the data storage device communicate via an internet connection.
The data storage device performs:
step S202, receiving and analyzing a query instruction to obtain a query function and a query attribute, wherein the query attribute comprises age and annual income; the query attributes may also include health status, academy, identification numbers, etc.
Step S203, extracting a query attribute dataset from the census dataset, comprising:
the set data set is denoted as D 0 =(r 1 ,…,r i …,r n ) N represents the number of users, r i Representing user u i Is recorded with user index i e 1, n],k represents the attribute dimension, r i From the domain-> Representing user u i Attribute A of (2) j Record, attribute index j E [1, k ]]The method comprises the steps of carrying out a first treatment on the surface of the Let the query attribute be attribute A j Then query genusSex data set
Step S204, a user secret specification set is obtained, and the query attribute data set is divided into a sensitive subset and a non-sensitive subset based on the user secret specification set. The user secret specification set comprises binary functions of all attributes of the user, after query attributes are resolved, the secret specification set of the query attributes is extracted from the user secret specification set, and the query attribute data set is divided into a sensitive subset and a non-sensitive subset based on the extracted secret specification set. Specifically, step S204 includes:
Step S2041, user secret Specification setComprising setting secret specifications of all users corresponding to the data set, user u i Is defined as a binary function: s is S i :r i →{0,1} k If user u i Defining attribute A j Recording as sensitive recording, user u i The secret specification is in attribute A j Is a number S of i (A j ) =1, if user u i Defining attribute A j Recording as insensitive recording, user u i The secret specification is in attribute A j Is a number S of i (A j )=0。
Step S2042, set attribute A j To query the attributes, perform for all users: if user u i The secret specification is in attribute A j Is a number S of i (A j ) =1, user u will i Attribute A of (2) j RecordingThe recorded values falling into the sensitive subset D s If user u i The secret specification is in attribute A j Is a number S of i (A j ) =0, user u will i Attribute A of (2) j Record->The recorded values falling into a non-sensitive subset D ns 。
The query attribute data set is divided into the sensitive subset and the non-sensitive subset by step S204, the sensitive subset is protected in the mean query or the median query, the non-sensitive subset is not protected, the distortion of the query data can be reduced while the sensitive data is protected, and the utility of the query data is improved.
Step S205, when the query function is a mean query function, a mean query result of the query attribute dataset is obtained according to a pre-constructed Laplace mechanism based on a secret specification;
When the query function is a median query function, a median query result of the query attribute dataset is obtained according to a pre-constructed exponential mechanism based on a secret specification;
step S206, issuing a mean query result and/or a median query result of the query attribute dataset to the access device.
In this embodiment, preferably, in step S205, when the query function is a mean query function, a mean query result of the query attribute dataset is obtained according to a pre-constructed laplace mechanism based on a secret specification, including:
step C1, according to the sensitive subset D s And a non-sensitive subset D ns Calculating the mean value f of a query attribute dataset mean (D):
Where || represents the cardinality of the data set, f mean (. Cndot.) represents the mean query function, f mean (D s ) Representing sensitive subset D s Mean value of f mean (D ns ) Representing non-sensitive subset D ns Is a mean value of (c).
Step C2, constructing a query attribute datasetAdjacent data set D', in particular, constructed according to definition three above-adjacent dataset D', global sensitivity of obtaining mean query function +.>
Wherein f mean (D') represents->The mean value of the neighboring data set D', I 1 Represents L 1 Norms.
Step C3, at the mean value f of the query attribute dataset mean (D) Adding noise satisfying Laplace distribution Obtaining final mean query results->
Wherein Lap (·) represents a lapalce distribution probability density function, ε represents a preset degree of privacy protection, and ε > 0.
It can be seen that non-sensitive subset D is utilized ns The average value query result can be improvedMinimizing data distortion while satisfying Laplace distributed noise versus sensitive subset D s Privacy protection is carried out on the average value of the data, personalized privacy protection is achieved, and convenient and effective data analysis is achieved.
In this embodiment, preferably, in step S205, when the query function is a median query function, the median query result of the query attribute dataset is obtained according to a pre-constructed exponential mechanism based on the secret specification, including:
step D1, for query attribute records in query attribute dataset D (i.e.) The recorded values of (2) are ordered from small to large, i.e. the ordered query attribute dataset D satisfies r i ≤r i+1 。/>
Step D2, setting the base |d|=2m+1 of the ordered query attribute dataset D, a first intermediate parameterLet O denote the median output space of the query attribute dataset D. The median query function returns the record values in the query dataset D as median query results, therefore r i E O, i.e. the recorded value belongs to the median space O.
Constructing a query attribute dataset-adjacent dataset D', in particular +_ constructed according to definition three above>Neighboring data set D', since the data set d=d is assumed s ∪D ns And D '=d' s ∪D′ ns Is->Adjacent, we can get |D s |=|D′ s I, and D ns =D′ ns 。D′ s Representation->Sensitive subsets of adjacent datasets D ', D' ns Representation->-a non-sensitive subset of the neighboring data set D'. It has to be emphasized that according to the secret specification S, no non-sensitive records in the data set D are present +.>Adjacent recordings, thus disregarding changes of non-sensitive recordings.
Step D3, for any output median O, O E O, calculating the score function value of the median O according to the following formula
Wherein D represents the satisfaction f formed by changing the record value of any number of sensitive records in the query attribute dataset D med (D * ) A dataset of =o, in which the insensitive record value does not change;Representing a set of distinct sensitive records in D and D; r represents->Sensitive recording of (c);is indicated at->When the number of sensitive records is minimum, namely, when the number of sensitive records obtained by D is minimum,/I>The product obtained by multiplying the sensitive record number of (E) is a score function value of a median o, f med (. Cndot.) represents a median query function, e represents a preset degree of privacy protection, e > 0.
Further toPreferably, the score function value for the median o is obtained quicklyFor a pair ofReasoning demonstrates that the score function value can be obtained quickly for any record r as follows i E D and r i E O, calculate->There are three cases:
(1) If i.ltoreq.m, meaning that r is recorded i The m-i sensitive records on the right are changed at this time
(2) If i=m, meaning that no sensitive record is changed, then
(3) If i.gtoreq.m, meaning record r i The i-m sensitive records on the left are changed at this time
Step D4, calculating the output probability of all the medians in the median output space O, and setting the output probability of the medians OThe method comprises the following steps:
step D5, taking the median value output according to the output probabilities of all median values in the median value output space O as a median value query result
The utility experiment analysis is carried out on the population census data set query protection method provided by the application.
Experimental background: experimental verification was performed on 2012 u.s.u.s.census dataset. Specifically, 1000 and 10000 records were randomly selected from 2012 u.s.census dataset, respectively, and evaluation of mean query was performed with record attribute Age. Furthermore, 1001 and 10001 records were randomly selected from 2012 u.s.census dataset, respectively, and evaluation of median queries was performed with the record attribute annu com (Annual revenue). Where the parameter δ represents the duty cycle of the non-sensitive records in the dataset (default δ=0.8), and the parameter e represents the degree of privacy protection.
The data utility of SSLM was evaluated using root mean square error (Root Mean Square Error, RMSE) as an evaluation index. Fig. 5 and fig. 6 show the corresponding variation trend of RMSE of SSLM (mean query scene) proposed by the present invention with variation of parameters δ and e on different scales of 2012 U.S. census data subsets, wherein the data size of sub-graph (a) in fig. 5 and sub-graph (a) in fig. 6 is 1000, and the data size of sub-graph (b) in fig. 5 and sub-graph (b) in fig. 6 is 10000. Fig. 7 and 8 show the corresponding variation trend of RMSE of SSEM (median query scenario) proposed in the present invention with variation of parameters δ and e on different scales of 2012 U.S. census data subsets, wherein the data size of sub-graph (a) in fig. 7 and sub-graph (a) in fig. 8 is 1001, and the data size of sub-graph (b) in fig. 7 and sub-graph (b) in fig. 8 is 10001, respectively.
Fig. 5 shows how the proportion δ of non-sensitive attribute values affects the data utility of our proposed SSLM. It can be seen from fig. 5 (a) and 5 (b) that the RMSE of the sslm over the data sets of different scales has a consistent decreasing trend as the parameter δ increases from 0 to 0.9. In other words, RMSE of SSLM gradually decreases and is increasingly better than baseline methods (classical Laplace Mechanism, LM). In particular, when δ=0, i.e. all attribute values in the dataset are sensitive, SSLM is equivalent to LM, exhibiting the same RMSE as LM. Conversely, when δ=1, meaning that all attribute values are insensitive, RMSE of SSLM will decrease to 0. Furthermore, as shown in fig. 5 (a) and 5 (b), when δ=0.8, the utility of SSLM on 2012 U.S. census dataset subsets with data volumes of 1000 and 10000 was improved by about 6-fold and 5-fold, respectively, compared to LM. When δ=0.9, SSLM's utility on a subset of the 2012 U.S. census dataset with data volumes of 1000 and 10000 was improved by approximately 14-fold and 10-fold, respectively, compared to LM. The utility is remarkably improved because in the SSLM, the non-sensitive attribute value is kept unchanged when the average value query is responded, so that the accuracy of the average value is comprehensively improved.
The parameter e represents the privacy budget of the user associated with the sensitive recording attribute. A higher e value means a lower level of privacy protection, thereby improving utility. As shown in fig. 6 (a) and 6 (b), both lm and SSLM (default delta=0.8) decrease RMSE over different scale sized datasets as e increases from 0.01 to 0.5. The main point conveyed by FIG. 6 is that under any E condition, SSLM is significantly better than baseline mechanism LM. Furthermore, SSLM has approximately a 2-fold increase in utility over a 2012 U.S. census data subset at a data size of 1000 and a 3-fold increase at a data size of 10000 when compared to LM e > 0.2.
Similar to the effect of the parameter delta on the average query volume, as shown in fig. 7, the data utility of SSEM significantly exceeded the baseline method EM as delta increased from 0 to 0.9. Considering the boundary case, we can see that SSEM is equivalent to EM when δ=0. In addition, FIG. 7 (a) and FIG. 7 (b) show that when the parameter delta E (0, 0.5) is equal to the RMSE of SSEM, however, when the parameter delta E (0.5,0.9) is equal to the EM, SSEM exhibits a significant improvement in data utility over the EM.
In FIG. 8, the effect of the parameter ε on the median query is similar to that shown in FIG. 5. As expected, both SSEM and EM reduced RMSE on different scale data sets with increasing E. Notably, in fig. 8 (a) and 8 (b), SSEM shows a significant utility improvement compared to EM. Specifically, when e > 0.2, as shown in fig. 8 (a), SSEM achieves about a 2-fold improvement in utility over EM over a 2012 U.S. census data subset with a data size of 1001.
The data query protection method provided by the application has the following technical effects:
1. we have introduced a new privacy definition SSDP that enables individuals to better control their private information, ensuring that only data marked as sensitive by users are privacy protected.
2. By allowing individuals to independently define secret specifications about their own data, SSDP achieves personalized privacy protection, facilitating efficient data analysis.
3. We provide a specific SSDP mechanism for mean value queries, improving the accuracy of data analysis while minimizing data distortion, especially when a significant portion of the data is non-sensitive while better exploring privacy and utility trade-offs.
4. We evaluate the performance of SSLM and SSEM by comparative experiments on real datasets. Experimental results indicate that SSLM improves utility by approximately 14-fold for mean value queries by utilizing non-sensitive data compared to the most advanced DP mechanism. SSEM improves utility by approximately 6-fold for median queries by using non-sensitive data.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
Claims (10)
1. The data query protection method of the personalized privacy protection mechanism based on the secret specification is characterized by comprising the following steps:
the access device sends a query instruction to the data storage device;
the data storage device performs:
receiving and analyzing a query instruction to obtain a query function and a query attribute;
extracting a query attribute dataset from the set dataset;
acquiring a user secret specification set, and dividing a query attribute data set into a sensitive subset and a non-sensitive subset based on the user secret specification set;
when the query function is a mean query function, a mean query result of the query attribute dataset is obtained according to a pre-constructed Laplacian mechanism based on a secret specification;
When the query function is a median query function, a median query result of the query attribute dataset is obtained according to a pre-constructed exponential mechanism based on a secret specification;
and issuing a mean query result and/or a median query result of the query attribute dataset to the access device.
2. The method for protecting data query based on personalized privacy preserving mechanism of claim 1, wherein extracting query attribute data set from the set data set comprises:
the set data set is denoted as D 0 =(r 1 ,…,r i …,r n ) N represents the number of users, r i Representing user u i Is recorded with user index i e 1, n],k represents the attribute dimension, r i From the domain-> Representing user u i Attribute A of (2) j Record, attribute index j E [1, k ]];
Let the query attribute be attribute A j Query the attribute dataset
3. The method for protecting data query based on personalized privacy preserving mechanism of claim 2, wherein the dividing the query attribute dataset into sensitive subset and non-sensitive subset based on the set of user privacy specifications comprises:
user secret specification setComprising setting secret specifications of all users corresponding to the data set, user u i Is defined as a binary function: s is S i :r i →{0,1} k If user u i Defining attribute A j Recording as sensitive recording, user u i The secret specification is in attribute A j Is a number S of i (A j ) =1, if user u i Defining attribute A j Recording as insensitive recording, user u i The secret specification is in attribute A j Is a number S of i (A j )=0;
Set attribute A j To query the attributes, perform for all users: if user u i The secret specification is in attribute A j Is a number S of i (A j ) =1, user u will i Attribute A of (2) j RecordingThe recorded values falling into the sensitive subset D s If user u i The secret specification is in attribute A j Is a number S of i (A j ) =0, user u will i Attribute A of (2) j Record->The recorded values falling into a non-sensitive subset D ns 。
4. The method for protecting data query by a personalized privacy protecting mechanism based on secret specifications according to claim 3, wherein the obtaining the average query result of the query attribute dataset according to the pre-built laplace mechanism based on secret specifications comprises:
step A1, according to the sensitive subset D s And a non-sensitive subset D ns Calculating the mean value f of a query attribute dataset mean (D):
Where || represents the cardinality of the data set, f mean (. Cndot.) represents the mean query function, f mean (D s ) Representing sensitive subset D s Mean value of f mean (D ns ) Representing non-sensitive subset D ns Is the average value of (2);
step A2, constructing a query attribute dataset -neighboring data set D', obtaining global sensitivity of the mean query function
Wherein f mean (D ') represents the mean of the S-adjacent dataset D', I 1 Represents L 1 A norm;
step A3, at the mean value f of the query attribute dataset mean (D) Adding noise satisfying Laplace distributionObtaining mean query results->
Wherein Lap (·) represents a lapalce distribution probability density function, ε represents a preset degree of privacy protection, and ε > 0.
5. The method for protecting data query by a personalized privacy preserving mechanism based on secret specifications according to claim 3 or 4, wherein the obtaining the median query result of the query attribute dataset according to a pre-built exponential mechanism based on secret specifications comprises:
step B1, sorting the record values of the query attribute records in the query attribute data set D from small to large;
step B2, setting the base |d|=2m+1 of the ordered query attribute dataset D, a first intermediate parameterLet O denote the median output space of the query attribute dataset D;
step B3, for any output median O E O, calculating the score function value of the median O according to the following formula
Wherein D represents the satisfaction f formed by changing the record value of any number of sensitive records in the query attribute dataset D med (D * ) Data set of =o;Representing a set of distinct sensitive records in D and D; r represents->Sensitive recording of (c);Is indicated at->When the number of the sensitive records is the smallest, +.>The product obtained by multiplying the sensitive record number of (E) is a score function value of a median o, f med (. Cndot.) represents a median query function, e represents a preset degree of privacy protection, e > 0;
step B4, calculating the output probability of all the medians in the median output space O, and setting the output probability of the medians OThe method comprises the following steps:
step B5, taking the median value output according to the output probabilities of all median values in the median value output space O as a median value query result
6. A method for protecting a census data set query, comprising:
the access device sends a query instruction to the data storage device which stores the census data set;
the data storage device performs:
receiving and analyzing a query instruction to obtain a query function and a query attribute, wherein the query attribute comprises age and annual income;
extracting a query attribute dataset from the census dataset;
acquiring a user secret specification set, and dividing a query attribute data set into a sensitive subset and a non-sensitive subset based on the user secret specification set;
when the query function is a mean query function, a mean query result of the query attribute dataset is obtained according to a pre-constructed Laplacian mechanism based on a secret specification;
When the query function is a median query function, a median query result of the query attribute dataset is obtained according to a pre-constructed exponential mechanism based on a secret specification;
and issuing a mean query result and/or a median query result of the query attribute dataset to the access device.
7. The census data set query protection method of claim 1, wherein the extracting a query attribute data set from a census data set comprises:
the census data set is denoted as set data set, the set data set is denoted as D 0 =(r 1 ,…,r i …,r n ) N represents the number of users, r i Representing user u i Is recorded with user index i e 1, n],k represents the attribute dimension, r i From the domain-> Representing user u i Attribute A of (2) j Record, attribute index j E [1, k ]];
Let the query attribute be attribute A j Query the attribute dataset
8. The census data set query protection method of claim 7, wherein the partitioning of the query attribute data set into sensitive and non-sensitive subsets based on the set of user secret specifications comprises:
the user secret specification set S comprises secret specifications of all users corresponding to the census data set, and user u i Is defined as a binary function: s is S i :r i →{0,1} k If user u i Defining attribute A j Recording as sensitive recording, user u i The secret specification is in attribute A j Is a number S of i (A j ) =1, if user u i Defining attribute A j Recording as insensitive recording, user u i The secret specification is in attribute A j Is a number S of i (A j )=0;
Set attribute A j To query the attributes, perform for all users: if user u i The secret specification is in attribute A j Is a number S of i (A j ) =1, user u will i Attribute A of (2) j RecordingThe recorded values falling into the sensitive subset D s If user u i The secret specification is in attribute A j Is a number S of i (A j ) =0, user u will i Attribute A of (2) j Record->All recorded values fall into a non-sensitive subset D ns 。
9. The census data set query protection method of claim 8, wherein obtaining the mean query result for the query attribute data set according to a pre-built secret-specification-based laplacian mechanism comprises:
step C1, according to the sensitive subset D s And a non-sensitive subset D ns Calculating the mean value f of a query attribute dataset mean (D):
Where || represents the cardinality of the data set, f mean (. Cndot.) represents the mean query function, f mean (D s ) Representing sensitive subset D s Mean value of f mean (D ns ) Representing non-sensitive subset D ns Is the average value of (2);
step C2, constructing a query attribute dataset-neighboring data set D', obtaining global sensitivity of the mean query function
Wherein f mean (D') represents->The mean value of the neighboring data set D', I 1 Represents L 1 A norm;
step C3, at the mean value f of the query attribute dataset mean (D) Adding noise satisfying Laplace distributionObtaining final mean query results->
Wherein Lap (·) represents a lapalce distribution probability density function, ε represents a preset degree of privacy protection, and ε > 0.
10. The census data set query protection method of claim 8 or 9, wherein the obtaining the median query result of the query attribute data set according to a pre-built secret-specification-based exponential mechanism comprises:
step D1, sorting the record values of the query attribute records in the query attribute data set D from small to large;
step D2, setting the base |d|=2m+1 of the ordered query attribute dataset D, a first intermediate parameterLet O denote the median output space of the query attribute dataset D;
step D3, for any output median O E O, calculating the score function value of the median O according to the following formula
Wherein D represents the satisfaction f formed by changing the record value of any number of sensitive records in the query attribute dataset D med (D * ) Data set of =o;Representing a set of distinct sensitive records in D and D; r represents- >Sensitive recording of (c);Is indicated at->When the number of the sensitive records is the smallest, +.>The product obtained by multiplying the sensitive record number of (E) is a score function value of a median o, f med (. Cndot.) represents a median query function, e represents a preset degree of privacy protection, e > 0;
step D4, calculating the output probability of all the medians in the median output space O, and setting the output probability of the medians OThe method comprises the following steps:
step D5, taking the median value output according to the output probabilities of all median values in the median value output space O as a median value query result
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311416556.4A CN117235800B (en) | 2023-10-27 | 2023-10-27 | Data query protection method of personalized privacy protection mechanism based on secret specification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311416556.4A CN117235800B (en) | 2023-10-27 | 2023-10-27 | Data query protection method of personalized privacy protection mechanism based on secret specification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117235800A true CN117235800A (en) | 2023-12-15 |
CN117235800B CN117235800B (en) | 2024-05-28 |
Family
ID=89082737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311416556.4A Active CN117235800B (en) | 2023-10-27 | 2023-10-27 | Data query protection method of personalized privacy protection mechanism based on secret specification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117235800B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118072928A (en) * | 2024-04-18 | 2024-05-24 | 中南大学 | Medical data integration system based on data warehouse |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090055657A1 (en) * | 2005-03-25 | 2009-02-26 | Rieko Asai | Program Converting Device, Secure Processing Device, Computer Program, and Recording Medium |
CN110704491A (en) * | 2019-09-30 | 2020-01-17 | 京东城市(北京)数字科技有限公司 | Data query method and device |
CN111177213A (en) * | 2019-12-16 | 2020-05-19 | 北京淇瑀信息科技有限公司 | Privacy cluster self-service query platform and method and electronic equipment |
CN114328640A (en) * | 2021-02-07 | 2022-04-12 | 湖南科技学院 | Differential privacy protection and data mining method and system based on mobile user dynamic sensitive data |
US20220277097A1 (en) * | 2019-06-12 | 2022-09-01 | Privitar Limited | Method or system for querying a sensitive dataset |
CN116541883A (en) * | 2023-05-10 | 2023-08-04 | 重庆大学 | Trust-based differential privacy protection method, device, equipment and storage medium |
CN116611101A (en) * | 2023-03-03 | 2023-08-18 | 广州大学 | Differential privacy track data protection method based on interactive query |
-
2023
- 2023-10-27 CN CN202311416556.4A patent/CN117235800B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090055657A1 (en) * | 2005-03-25 | 2009-02-26 | Rieko Asai | Program Converting Device, Secure Processing Device, Computer Program, and Recording Medium |
US20220277097A1 (en) * | 2019-06-12 | 2022-09-01 | Privitar Limited | Method or system for querying a sensitive dataset |
CN110704491A (en) * | 2019-09-30 | 2020-01-17 | 京东城市(北京)数字科技有限公司 | Data query method and device |
CN111177213A (en) * | 2019-12-16 | 2020-05-19 | 北京淇瑀信息科技有限公司 | Privacy cluster self-service query platform and method and electronic equipment |
CN114328640A (en) * | 2021-02-07 | 2022-04-12 | 湖南科技学院 | Differential privacy protection and data mining method and system based on mobile user dynamic sensitive data |
CN116611101A (en) * | 2023-03-03 | 2023-08-18 | 广州大学 | Differential privacy track data protection method based on interactive query |
CN116541883A (en) * | 2023-05-10 | 2023-08-04 | 重庆大学 | Trust-based differential privacy protection method, device, equipment and storage medium |
Non-Patent Citations (3)
Title |
---|
HU, CQ (HU, CHUNQIANG) 等: "A Federated Recommendation System Based on Local Differential Privacy Clustering", 《IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE AND COMPUTING, ADVANCED AND TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI)》, 1 January 2021 (2021-01-01), pages 364 - 369 * |
张文静;李晖;: "差分隐私保护下的数据分级发布机制", 网络与信息安全学报, no. 01, 15 December 2015 (2015-12-15), pages 62 - 69 * |
胡春强: "秘密共享理论及相关应用研究", 《中国博士学位论文电子期刊网》, 15 February 2014 (2014-02-15), pages 136 - 16 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118072928A (en) * | 2024-04-18 | 2024-05-24 | 中南大学 | Medical data integration system based on data warehouse |
Also Published As
Publication number | Publication date |
---|---|
CN117235800B (en) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210365580A1 (en) | Calculating differentially private queries using local sensitivity on time variant databases | |
US20220277097A1 (en) | Method or system for querying a sensitive dataset | |
Soria-Comas et al. | Enhancing data utility in differential privacy via microaggregation-based k-anonymity | |
Prasser et al. | Putting statistical disclosure control into practice: The ARX data anonymization tool | |
US7475085B2 (en) | Method and apparatus for privacy preserving data mining by restricting attribute choice | |
US7653615B2 (en) | Preserving privacy when statistically analyzing a large database | |
EP3736723A1 (en) | Differentially private budget tracking using renyi divergence | |
Sánchez et al. | Utility-preserving differentially private data releases via individual ranking microaggregation | |
US11853329B2 (en) | Metadata classification | |
EP3690677A1 (en) | Differentially private query budget refunding | |
CN117235800B (en) | Data query protection method of personalized privacy protection mechanism based on secret specification | |
US11734452B2 (en) | Systems and methods for obscuring data from a data source | |
US12026281B2 (en) | Method for creating avatars for protecting sensitive data | |
Li et al. | Digression and value concatenation to enable privacy-preserving regression | |
Foster et al. | Variable selection in monotone single‐index models via the adaptive LASSO | |
Gal et al. | A data recipient centered de-identification method to retain statistical attributes | |
Zhang et al. | Differential privacy medical data publishing method based on attribute correlation | |
CN108959956B (en) | Differential privacy data publishing method based on Bayesian network | |
Podlesny et al. | Attribute compartmentation and greedy UCC discovery for high-dimensional data anonymization | |
CN110443068B (en) | Privacy protection method and device | |
Zhang et al. | e-DMDAV: A new privacy preserving algorithm for wearable enterprise information systems | |
CN116167078A (en) | Differential privacy synthetic data publishing method based on maximum weight matching | |
Vadrevu et al. | A hybrid approach for personal differential privacy preservation in homogeneous and heterogeneous health data sharing | |
Maruseac et al. | Differentially-private mining of moderately-frequent high-confidence association rules | |
Van Riper et al. | Differential privacy and racial residential segregation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |