CN111950028A - Differential privacy protection method and system for track time mode - Google Patents

Differential privacy protection method and system for track time mode Download PDF

Info

Publication number
CN111950028A
CN111950028A CN202010858883.5A CN202010858883A CN111950028A CN 111950028 A CN111950028 A CN 111950028A CN 202010858883 A CN202010858883 A CN 202010858883A CN 111950028 A CN111950028 A CN 111950028A
Authority
CN
China
Prior art keywords
timestamp
subset
noise
fine
grained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010858883.5A
Other languages
Chinese (zh)
Other versions
CN111950028B (en
Inventor
王豪
吴婷婷
王昭琨
夏英
张旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010858883.5A priority Critical patent/CN111950028B/en
Publication of CN111950028A publication Critical patent/CN111950028A/en
Application granted granted Critical
Publication of CN111950028B publication Critical patent/CN111950028B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a differential privacy protection method and a differential privacy protection system of a track time mode, and belongs to the field of data mining. Firstly, performing coarse-grained disturbance on track time data by using k-anonymity based on an anonymization idea, and anonymizing and hiding the data of an original single time period within a whole day; then, performing fine-grained disturbance on the timestamp of the track by using a Laplace mechanism; and finally, the noise disturbance error is limited in a fixed range based on a truncation Laplace mechanism, so that the accuracy of the issued data is improved. The problem of individual privacy leakage caused by the periodicity of the track time mode is solved.

Description

Differential privacy protection method and system for track time mode
Technical Field
The invention belongs to the field of data mining, and relates to a differential privacy protection method and a differential privacy protection system of a track time mode.
Background
With the rapid development of GPS and WIFI, a large amount of spatiotemporal data is generated. The track is time-sampling-based spatio-temporal data, and comprises attributes such as sampling point positions, sampling time, sampling speed and the like, and because the spatio-temporal track data has periodicity in a time mode, a large amount of personal privacy information can be obtained by clustering and performing association analysis on the spatio-temporal track data. For example, by observing the fixed location of the user's early peak period from monday to friday, it can be inferred that the location has a high probability of being the user's home address. In recent years, privacy leakage events caused by the track time mode occur frequently, and great threats are brought to the property safety of users, so that the protection of the privacy of the track time mode of the users is of great significance.
At present, the privacy protection technology of the user track time mode is roughly divided into three categories, namely generalization (fuzzification is carried out on accurate time information of a user), suppression (selective release of user time data) and false data (false information is generated by certain transformation on real time information of the user). The k-anonymity protection model based on generalization is most widely applied, and the basic idea is as follows: it is sought to cut off the one-to-one relationship between the quasi-identifier and the privacy attribute so that each time record cannot be distinguished from at least k-1 other time records, thereby achieving the purpose of protecting the time pattern. However, the model has three problems:
(1) the security of the k-anonymous protection model is related to the background knowledge of an attacker, and the security of a privacy protection algorithm can be ensured only under the condition of knowing the background knowledge of the attacker;
(2) the security of the k-anonymous protection model is related to the sparsity degree of data distribution, and when the data distribution is too sparse, an attacker can easily deduce the real information of the user in a distributed attack mode;
(3) the k-anonymous protection model does not provide an efficient and rigorous method to prove its security, and when the k value changes, it cannot be quantitatively analyzed.
To overcome the drawbacks of k-anonymity, researchers have recently begun to protect the temporal pattern privacy of traces using differential privacy protection methods (adding interference noise to data that follows a certain law). Since the differential privacy can resist the attack of any background knowledge and any attack model and has a solid mathematical theoretical basis, the research result of the differential privacy in the aspect of track protection is continuously emerging since the proposal of Dwork in 2006.
Although the research results of applying differential privacy to track data distribution are many, most of the methods based on a track shape mode and an OD (Origin-Destination) stream mode have little research on track time mode privacy protection, and especially have little research on deducing residence points and working points of users in combination with activity areas. The specific attack mode aiming at the track time mode is as follows: according to the law that an individual track follows from home to work to home, firstly, an activity area is extracted by using a data set, and then, the residence point and the working point of a user are deduced by combining the distribution of the track at different times, so that the travel law of the user is found, and even the identity of the user can be deduced. Therefore, how to protect the user privacy from the aspect of time mode on the premise of satisfying the differential privacy for the trace data with the timestamp still remains a problem to be solved urgently.
Based on the background, the invention provides a differential privacy protection method and a differential privacy protection system of a track time mode, which lay a foundation for really solving the problem of individual privacy disclosure caused by the periodicity of the track time mode.
Disclosure of Invention
In view of the above, the present invention provides a differential privacy protection method and system for a track time pattern, which satisfy the protection requirement of differential privacy and protect the time pattern of a user, thereby preventing an attacker from deducing the working and living places of the user by combining the time pattern.
In order to achieve the purpose, the invention provides the following technical scheme:
a differential privacy protection method of a track time mode comprises the following steps:
and step S1, preprocessing and clustering the track data. The method comprises the following substeps:
step S1-1, cleaning and stipulating the track data to be protected, reserving longitude and latitude data of the user and corresponding time stamps as a new track data set, and recording the time stamp data set asT={t1,…,tn},TiIs any subset of T, then there is TiE is T, and Ti∈[tk,tm],tmin≤tk<tm≤tmaxWherein t isk∈T,tm∈T,tminIs the minimum value of T, TmaxIs the maximum value in T;
step S1-2, performing density clustering on the reserved track data set by using a DBSCAN algorithm to obtain a clustering set C ═ C1,c2,...,clAnd the corresponding timestamp subset Tc={T1,T2,...,TlWhere l is the number of cluster clusters.
Step S2, initializing parameters including a parameter k for initializing an anonymization algorithm, a privacy protection strength parameter, and an acceptable distribution error range [ -alpha, alpha [ -alpha, alpha ]]. Wherein the initialized k value needs to be judged whether reasonable or not, the timestamp subset T obtained in step S1-2c={T1,T2,...,TlK 'is calculated, and whether the k value initially defined by the user is reasonable is judged according to the k'. The method comprises the following substeps:
step S2-1, initializing privacy protection intensity parameters and acceptable release error range [ -alpha, alpha ];
step S2-2, initializing a parameter k value of an anonymous algorithm;
step S2-3, selecting one of the timestamp subsets T obtained in step S1-2i∈TcCalculating the k' value:
Figure BDA0002647324450000021
step S2-4, judging the reasonability of the user initially defining the k value in the step S2-2, if k belongs to [1, k' ] and k is an integer, the k is reasonable, and entering the step S2-5; otherwise, returning to the step S2-2;
step S2-5, repeating steps S2-2, S2-3 and S2-4 until all clusters in step S1 correspond to timestamp subset TiAll obtain corresponding calculation results k' and compare with initialized k value to obtain final reasonableSet of K values Kc={k1,k2,...,kl}。
And step S3, performing coarse-grained disturbance on the time stamp by using a k-anonymity algorithm. Assume each time data subset T corresponding to a given cluster CcN time stamps are in total, and all time stamp subsets T are subjected to k anonymity realization methodcAnd carrying out anonymous processing to realize coarse-grained disturbance on the timestamp. The method comprises the following substeps:
step S3-1, selecting a timestamp subset T corresponding to the cluster obtained in step S1-2i∈Tc
Step S3-2, judging K value set KcK in (1)iThe value in the interval, if kiIf it is 1, the process proceeds to step S4 without anonymization; otherwise, for a subset of timestamps TiEach data t in (1)iThe following anonymization treatment is carried out:
Figure BDA0002647324450000031
step S3-3, repeating steps S3-1 and S3-2 until all timestamp subsets TiAll have been k-anonymized, at which point a subset of timestamps T is obtainedcCoarse-grained disturbance result T ofc′={T1′,T2′,...,Tl′}。
And step S4, performing fine-grained disturbance on the timestamp by using the differential privacy Laplace noise. Calculating a Laplace noise probability density function according to the initialized privacy protection intensity parameters, generating corresponding Laplace noise, and performing coarse-grained disturbance result T obtained in the step S3-3c′={T1′,T2′,...,Tl' } carrying out fine-grained disturbance to obtain a fine-grained disturbance result. The method comprises the following substeps:
step S4-1, calculating a probability density function f of Laplace noise according to the initialized privacy protection intensity parameter and the sensitivity function delta f of the timestampLap(z):
Figure BDA0002647324450000032
Wherein
Figure BDA0002647324450000033
Step S4-2, selecting coarse-grained disturbance result Tc′={T1′,T2′,...,Tl' } any one of the subsets Ti′∈Tc', according to the probability density function f of Laplace noiseLap(Z) generating a corresponding number of noise data sets Zi
Step S4-3, generating Laplace noise ZiAdding to any subset T of coarse-grained disturbance resultsiIn' obtaining fine-grained disturbance result Ti″:
Ti″=Ti′+Zi
Step S4-4, repeating steps S4-2 and S4-3 until coarse-grained disturbance result Tc′={T1′,T2′,...,TlFine-grained disturbance is carried out on all the subsets in the' to obtain corresponding fine-grained disturbance results Tc″={T1″,T2″,...,Tc″}。
And step S5, optimizing the disturbance result by utilizing a truncation Laplace mechanism. Solving a probability density function of truncated Laplace noise according to the initialized privacy protection intensity parameters, generating corresponding truncated Laplace noise, and carrying out S4-3 on the fine-grained disturbance result T obtained in the stepc″={T1″,T2″,...,Tc' optimization is carried out to obtain an optimization result
Figure BDA0002647324450000046
The method comprises the following substeps:
step S5-1, calculating to obtain probability density function f of intercepting Laplace noise according to initialized privacy protection intensity parameter and sensitivity function delta f of timestampTLap(z):
Figure BDA0002647324450000041
Wherein
Figure BDA0002647324450000042
Step S5-2, selecting a fine-grained disturbance result Tc″={T1″,T2″,...,TcAny subset of "" (T)i″∈TcAccording to a probability density function f of truncated Laplace noiseTLap(Z) generating a corresponding number of noise data sets Zi′;
Step S5-3, generating Laplacian noise Z'iAdding to any subset T of coarse-grained disturbance resultsi"in, obtaining optimized results
Figure BDA0002647324450000043
Figure BDA0002647324450000044
Step S5-4, repeating steps S5-2 and S5-3 until fine-grained disturbance result Tc′={T1′,T2′,...,TlAll the subsets in the' are optimized to obtain an optimized disturbance data set
Figure BDA0002647324450000045
Meanwhile, the present invention also provides a differential privacy protection system in a track time mode, as shown in fig. 3, including:
the track data preprocessing and clustering module is used for preprocessing and clustering the track data set and comprises the following sub-modules,
the preprocessing submodule is used for cleaning and stipulating the track data to be protected, and reserving longitude and latitude data of a user and a corresponding timestamp as a new track data set;
the clustering submodule is used for carrying out density clustering on the reserved track data set by using a DBSCAN algorithm to obtain a clustering cluster set and a corresponding timestamp subset;
an initialization parameter module for initializing parameters, comprising the following sub-modules,
the initial parameter setting submodule is used for initializing a parameter k of an anonymous algorithm, a privacy protection intensity parameter and an acceptable issuing error range [ -alpha, alpha ];
the judgment submodule is used for judging whether the set parameter value k is reasonable or not;
the anonymous processing module is used for realizing coarse-grained disturbance on the time stamp and comprises the following sub-modules,
a judging submodule for judging the K value set KcK in (1)iThe value in the interval, if kiIf the result is 1, anonymous processing is not carried out, and the result directly enters a differential privacy processing module; otherwise, for a subset of timestamps TiEach data t in (1)iThe following anonymization treatment is carried out:
Figure BDA0002647324450000051
an iteration submodule for iterating the above submodule to generate a preliminary perturbation result Tc′={T1′,T2′,...,Tl′}。
The differential privacy processing module is used for realizing fine-grained disturbance on the timestamp and comprises the following sub-modules,
a Laplace noise generation submodule for generating Laplace noise and calculating the probability density function f of Laplace noise according to the privacy protection intensity parameterLap(Z) generating a corresponding number of noise data sets Zi
A Laplace noise adding submodule for adding the generated Laplace noise ZiAdding to any subset T of coarse-grained disturbance resultsiIn' obtaining fine-grained disturbance result Ti″;
An iteration submodule for iterating the process in said submodule to generateTo fine-grained disturbance results
Figure BDA0002647324450000052
The truncation Laplace optimization module is used for optimizing the fine-grained disturbance result obtained by the difference privacy processing module and comprises the following sub-modules,
a truncated Laplace noise generation submodule for generating truncated Laplace noise and calculating the probability density function f of the Laplace noise according to the privacy protection intensity parameterTLap(Z) generating a corresponding number of noise data sets Zi′;
A truncated Laplace noise adding submodule for adding the generated truncated Laplace noise Zi' addition to any subset T of fine-grained perturbation resultsi"in, obtaining optimized results
Figure BDA0002647324450000053
An iteration submodule for iterating the above submodule to generate fine-grained disturbance result Tc″={T1″,T2″,...,Tc″}。
In order to protect the privacy of a track time mode, the invention firstly uses k-anonymity to carry out coarse-grained anonymization processing on a timestamp, then uses a differential privacy technology to carry out fine-grained disturbance on the timestamp, and finally uses a truncation Laplace mechanism to optimize a disturbance result of distribution, thereby improving the usability of the distributed data, and compared with the prior art, the invention has the following beneficial effects:
(1) compared with the conventional time mode protection method, the method can protect the privacy of the track time mode and also protect accurate time and position data;
(2) the invention utilizes the truncation Laplace mechanism, can limit the error of the issued result within a specific range, greatly improves the data availability, and is simple, effective and easy to realize.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow diagram of an overall method provided by an embodiment of the present invention;
FIG. 2 is a flow chart of specific steps provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a logic structure of a temporal data perturbation generator according to an embodiment of the present invention;
FIG. 4 is a time distribution histogram of an original cluster provided by an embodiment of the present invention;
FIG. 5 is a time distribution histogram of selected clusters provided by an embodiment of the present invention;
FIG. 6 is a time distribution histogram of a selected cluster after k-anonymization according to an embodiment of the present invention;
FIG. 7 is a time distribution histogram of selected clusters after differential privacy processing according to an embodiment of the present invention;
fig. 8 is a time distribution histogram of the selected cluster after the truncated laplacian optimization process according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
The following takes the GPS track big data set Geolife objects 1.3 collected by microsoft asian institute as an example to illustrate the specific implementation steps of the present invention. According to the method, firstly, anonymization processing of coarse granularity is carried out on the timestamp by using k-anonymization, then, noise processing is carried out on the timestamp by using a differential privacy technology, fine-granularity disturbance is carried out, and finally, disturbed timestamp data is optimized by using a truncation Laplace mechanism so as to achieve the purpose of protecting a track time mode.
The method provided by the technical scheme of the invention can adopt a computer software technology to realize an automatic operation flow, fig. 1 and fig. 2 are respectively a general method flow chart and a specific step flow chart of the embodiment of the invention, and in combination with the specific step flow chart of the embodiment of the invention in fig. 2, and a logic structure schematic diagram of a time data disturbance generator of the embodiment of the invention in fig. 3, the specific steps of the embodiment of the differential privacy protection method and the system of the track time mode of the invention comprise:
and step S1, preprocessing and clustering the track data. The method comprises the following substeps:
step S1-1, cleaning and stipulating the track data to be protected, reserving longitude and latitude data of the user and a corresponding timestamp as a new track data set, and recording the timestamp data set as T ═ T1,…,tn},TiIs any subset of T, then there is TiE is T, and Ti∈[tk,tm],tmin≤tk<tm≤tmaxWherein t isk∈T,tm∈T,tminIs the minimum value of T, TmaxIs the maximum value in T;
in the embodiment, data cleaning and specification are carried out on a section of track in a GPS track big data set Geolife tracks 1.3 collected by Microsoft Asian institute of research, longitude and latitude data and a corresponding timestamp of a user are reserved as a new track data set, and the timestamp data set corresponding to the new track data set is {07: 08., 08:09 };
step S1-2, performing density clustering on the reserved track data set by using a DBSCAN algorithm to obtain a clustering set C ═ C1,c2,...,clAnd the corresponding timestamp subset Tc={T1,T2,...,TlWhere l is the number of cluster clusters.
In an embodiment, the two trajectory data sets are subjected to density clustering by using a DBSCAN algorithm to obtain a clustering set C ═ C1,c2,...,clAnd the corresponding timestamp subset Tc={T1,T2,...,Tl}, draw a cluster C1The histogram of the time distribution of (2) is shown in fig. 4.
Step S2, initializing parameters including a parameter k for initializing an anonymization algorithm, a privacy protection strength parameter, and an acceptable distribution error range [ -alpha, alpha [ -alpha, alpha ]]. Wherein the initialized k value needs to be judged whether reasonable or not, the timestamp subset T obtained in step S1-2c={T1,T2,...,TlCalculating k 'and then judging whether the k value initially defined by the user is reasonable according to k'. The method comprises the following substeps:
step S2-1, initializing privacy protection intensity parameters and acceptable release error range [ -alpha, alpha ];
in the embodiment, the initialization parameter is 2, α is 0.5;
step S2-2, initializing a parameter k value of an anonymous algorithm;
in an embodiment, the initialization parameter k is 2;
step S2-3, selecting one of the timestamp subsets T obtained in step S1-2i∈TcCalculating the k' value:
Figure BDA0002647324450000081
in an embodiment, cluster C is selected1Corresponding one time data subset T12To obtain
Figure BDA0002647324450000082
Step S2-4, judging the reasonability of the user initially defining the k value in the step S2-2, if k belongs to [1, k' ] and k is an integer, the k is reasonable, and entering the step S2-5; otherwise, returning to the step S2-2;
in an embodiment of the present invention,
Figure BDA0002647324450000083
and k is an integer, the initial value of k is set reasonably, and the step S2-5 is entered;
step S2-5, repeating steps S2-2, S2-3 and S2-4 until all clusters in step S1 correspond to timestamp subset TiObtaining corresponding calculation results K' and comparing with the initialized K value to obtain the final reasonable K value set Kc={k1,k2,...,kl}。
In the embodiment, only one time data subset T in two clusters is selected respectively12And T21Through calculation and comparison, reasonable results are obtainedSet of K values Kc={k12,k21}={2,2}。
And step S3, performing coarse-grained disturbance on the time stamp by using a k-anonymity algorithm. Assume each time data subset T corresponding to a given cluster CcN time stamps are in total, and all time stamp subsets T are subjected to k anonymity realization methodcAnd carrying out anonymous processing to realize coarse-grained disturbance on the timestamp. The method comprises the following substeps:
step S3-1, selecting a timestamp subset T corresponding to the cluster obtained in step S1-2i∈Tc
In an embodiment, cluster C is selected1A corresponding subset of time stamps T12
Step S3-2, judging K value set KcK in (1)iThe value in the interval, if kiIf it is 1, the process proceeds to step S4 without anonymization; otherwise, for a subset of timestamps TiEach data t in (1)iThe following anonymization treatment is carried out:
Figure BDA0002647324450000084
in the examples, K 122 ≠ 1, for a subset of timestamps T12Each data in (a) is processed anonymously as follows:
Figure BDA0002647324450000091
step S3-3, repeating steps S3-1 and S3-2 until all timestamp subsets TiAll have been k-anonymized, at which point a subset of timestamps T is obtainedcCoarse-grained disturbance result T ofc′={T1′,T2′,...,Tl′}。
In one embodiment, only one timestamp subset T of the two clusters is selected12And T21Drawing the original cluster C12And C21The time distribution histogram of (2) is shown in FIG. 5, and the 2-anonymization processing is performed to obtain a preliminary disturbance nodeFruit T ═ T12′,T21' }, draw a cluster C12And C21The histogram of the time distribution after the 2-anonymization processing is shown in fig. 6.
And step S4, performing fine-grained disturbance on the timestamp by using the differential privacy Laplace noise. Calculating a Laplace noise probability density function according to the initialized privacy protection intensity parameters, generating corresponding Laplace noise, and performing coarse-grained disturbance result T obtained in the step S3-3c′={T1′,T2′,...,Tl' } carrying out fine-grained disturbance to obtain a fine-grained disturbance result. The method comprises the following substeps:
step S4-1, calculating a probability density function f of Laplace noise according to the initialized privacy protection intensity parameter and the sensitivity function delta f of the timestampLap(z):
Figure BDA0002647324450000092
Wherein
Figure BDA0002647324450000093
In an embodiment, step S2 has been given 2 and Δ f 1, so
Figure BDA0002647324450000094
Determining a probability density function of Laplace noise
Figure BDA0002647324450000095
Step S4-2, selecting coarse-grained disturbance result Tc′={T1′,T2′,...,Tl' } any one of the subsets Ti′∈Tc', according to the probability density function f of Laplace noiseLap(Z) generating a corresponding number of noise data sets Zi
In the embodiment, any subset T in coarse-grained disturbance results is selected12', probability density based on Laplace noiseDegree function fLap(Z) generating a corresponding number of noise data sets Z12
Step S4-3, generating Laplace noise ZiAdding to any subset T of coarse-grained disturbance resultsiIn' obtaining fine-grained disturbance result Ti″:
Ti″=Ti′+Zi
In an embodiment, the generated laplacian noise Z12Adding to any subset T of coarse-grained disturbance results12In' obtaining fine-grained disturbance result T12″:
Step S4-4, repeating steps S4-2 and S4-3 until coarse-grained disturbance result Tc′={T1′,T2′,...,TlFine-grained disturbance is carried out on all the subsets in the' to obtain corresponding fine-grained disturbance results Tc″={T1″,T2″,...,Tc″}。
In the embodiment, only two time stamps T in the coarse-grained disturbance result set are respectively selected12' and T21', to this, laplace noise Z corresponding to the step S4-2 is added12And Z21And obtaining a fine-grained disturbance result T ═ T12″,T21", a cluster C is drawn12And C21The histogram of the time distribution after the differential privacy processing is shown in fig. 7.
And step S5, optimizing the disturbance result by utilizing a truncation Laplace mechanism. Solving a probability density function of truncated Laplace noise according to the initialized privacy protection intensity parameters, generating corresponding truncated Laplace noise, and carrying out S4-3 on the fine-grained disturbance result T obtained in the stepc″={T1″,T2″,...,Tc' optimization is carried out to obtain an optimization result
Figure BDA0002647324450000101
The method comprises the following substeps:
step S5-1, calculating the probability of intercepting Laplace noise according to the initialized privacy protection intensity parameter and the sensitivity function delta f of the timestampDensity function fTLap(z):
Figure BDA0002647324450000102
Wherein
Figure BDA0002647324450000103
In the embodiment, step S2 has been given 2, α is 0.5, and Δ f is 1, so
Figure BDA0002647324450000104
Determining a probability density function that truncates Laplace noise
Figure BDA0002647324450000105
Step S5-2, selecting a fine-grained disturbance result Tc″={T1″,T2″,...,TcAny subset of "" (T)i″∈TcAccording to a probability density function f of truncated Laplace noiseTLap(Z) generating a corresponding number of noise data sets Zi′;
In an embodiment, any one timestamp subset T in fine-grained perturbation result set is selected12Using a probability density function f based on truncated laplace noiseTLap(Z) generating a corresponding number of noise data sets Z'12
Step S5-3, generating Laplacian noise Z'iAdding to any subset T of coarse-grained disturbance resultsi"in, obtaining optimized results
Figure BDA0002647324450000111
Figure BDA0002647324450000112
In the examples, generated Laplace noise Z 'will be'12Adding to coarse-grained disturbance resultsSubset T12"in, fine-grained perturbation results are obtained
Figure BDA0002647324450000113
Step S5-4, repeating steps S5-2 and S5-3 until fine-grained disturbance result Tc′={T1′,T2′,...,TlAll the subsets in the' are optimized to obtain an optimized disturbance data set
Figure BDA0002647324450000114
In the embodiment, only two time data subsets T in the fine-grained disturbance result set are respectively selected12"and T21To this, the corresponding truncated laplace noise Z 'is added'12And Z'21To obtain an optimized result
Figure BDA0002647324450000115
Draw Cluster C12And C21The time distribution histogram after the truncated laplacian optimization is shown in fig. 8.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (5)

1. A differential privacy protection method of a track time mode is characterized in that: the method comprises the following steps:
step S1, preprocessing and clustering track data;
the step S1 specifically includes:
step S1-1, cleaning and stipulating the track data to be protected, reserving longitude and latitude data of the user and a corresponding timestamp as a new track data set, and recording the timestamp data set as T ═ T1,…,tn},TiIs any subset of T, then there is TiE is T, and Ti∈[tk,tm],tmin≤tk<tm≤tmaxWherein t isk∈T,tm∈T,tminIs the minimum value of T, TmaxIs the maximum value in T;
step S1-2, performing density clustering on the reserved track data set by using a DBSCAN algorithm to obtain a clustering set C ═ C1,c2,...,clAnd the corresponding timestamp subset Tc={T1,T2,...,Tl}. Wherein l is the number of clusters;
step S2, initializing parameters including a parameter k for initializing an anonymization algorithm, a privacy protection strength parameter, and an acceptable distribution error range [ -alpha, alpha [ -alpha, alpha ]](ii) a Wherein the initialized k value needs to be judged whether reasonable or not, the timestamp subset T obtained in step S1-2c={T1,T2,...,TlCalculating k ', and judging whether the k value initially defined by the user is reasonable or not according to the k';
step S3, coarse-grained disturbance is carried out on the time stamp by using a k-anonymous algorithm; assume each time data subset T corresponding to a given cluster CcN time stamps are in total, and all time stamp subsets T are subjected to k anonymity realization methodcAnonymous processing is carried out, and coarse-grained disturbance of the timestamp is realized;
the step S3 specifically includes:
step S3-1, selecting a timestamp subset T corresponding to the cluster obtained in step S3-2i∈Tc
Step S3-2, judging the K value set KcK in (1)iThe value in the interval, if KiIf it is 1, the process proceeds to step S4 without anonymization; otherwise, for a subset of timestamps TiEach data t in (1)iThe following anonymization treatment is carried out:
Figure FDA0002647324440000011
step S3-3, repeating steps S3-1 and S3-2 until all timestamp subsets TiAll have been k-anonymized, at which point a subset of timestamps T is obtainedcCoarse-grained disturbance result T ofc′={T1′,T2′,...,Tl′};
Step S4, fine-grained disturbance is carried out on the timestamp by using the differential privacy Laplacian noise; calculating a Laplace noise probability density function according to the initialized privacy protection intensity parameters, generating corresponding Laplace noise, and performing coarse-grained disturbance result T obtained in the step S3-3c′={T1′,T2′,...,Tl' } carrying out fine-grained disturbance to obtain a fine-grained disturbance result;
step S5, optimizing a disturbance result by utilizing a truncation Laplace mechanism; obtaining a probability density function of truncated Laplace noise according to the initialized privacy protection intensity parameters, generating corresponding truncated Laplace noise, and performing fine-grained disturbance result T' obtained in the step S4-3c={T1″,T″2,...,T″cOptimizing to obtain an optimized result
Figure FDA0002647324440000021
2. The differential privacy protection method for track time mode according to claim 1, characterized in that: the step S2 specifically includes:
step S2-1, initializing privacy protection intensity parameters and acceptable release error range [ -alpha, alpha ];
step S2-2, initializing a parameter k value of an anonymous algorithm;
step S2-3, selecting one of the timestamp subsets T obtained in step S1-2i∈TcCalculating the k' value:
Figure FDA0002647324440000022
step S2-3, judging the reasonability of the user initially defining the k value in the step S2-2, if k belongs to [1, k' ] and k is an integer, the k is reasonable, and entering the step S2-5; otherwise, returning to the step S2-2;
step S2-5, repeating steps S2-2, S2-3 and S2-4 until all clusters in step S1 correspond to timestamp subset TiObtaining corresponding calculation results K' and comparing with the initialized K value to obtain the final reasonable K value set Kc={k1,k2,...,kl}。
3. The differential privacy protection method for track time mode according to claim 1, characterized in that: the step S4 specifically includes:
step S4-1, calculating a probability density function f of Laplace noise according to the initialized privacy protection intensity parameter and the sensitivity function delta f of the timestampLap(z):
Figure FDA0002647324440000023
Wherein
Figure FDA0002647324440000024
Step S4-2, selecting coarse-grained disturbance result Tc′={T1′,T2′,...,Tl' } any one of the subsets Ti′∈T′cFrom the probability density function f of the Laplace noiseLap(Z) generating a corresponding number of noise data sets Zi
Step S4-3, generating Laplace noise ZiAdding to any subset T of coarse-grained disturbance resultsiIn' obtaining fine-grained disturbance result Ti″:
Ti″=Ti′+Zi
Step S4-4, repeating steps S4-2 and S4-3 until coarse grain perturbation result T'c={T1′,T2′,...,TlAll subsets in' }All the fine-grained disturbance is carried out to obtain a corresponding fine-grained disturbance result Tc={T1″,T″2,...,T″c}。
4. The differential privacy protection method for track time mode according to claim 1, characterized in that: the step S5 specifically includes:
step S5-1, calculating to obtain probability density function f of intercepting Laplace noise according to initialized privacy protection intensity parameter and sensitivity function delta f of timestampTLap(z):
Figure FDA0002647324440000031
Wherein
Figure FDA0002647324440000032
Step S5-2, selecting a fine-grained disturbance result T ″c={T1″,T″2,...,T″cAny one subset T ofi″∈T″cFrom the probability density function f of the truncated laplacian noiseTLap(Z) generating a corresponding number of noise data sets Z'i
Step S5-3, generating Laplacian noise Z'iAdding to any subset T of coarse-grained disturbance resultsi"in, obtaining optimized results
Figure FDA0002647324440000033
Figure FDA0002647324440000034
Step S5-4, repeating steps S5-2 and S5-3 until fine-grained disturbance result T'c={T1′,T2′,...,TlAll the subsets in' } are optimized to obtain optimized disturbanceData set
Figure FDA0002647324440000035
5. A differential privacy protection system in a track time mode, characterized by: the system comprises the following modules:
the track data preprocessing and clustering module is used for preprocessing and clustering the track data set and comprises the following sub-modules,
the preprocessing submodule is used for cleaning and stipulating the track data to be protected, and reserving longitude and latitude data of a user and a corresponding timestamp as a new track data set;
the clustering submodule is used for carrying out density clustering on the reserved track data set by using a DBSCAN algorithm to obtain a clustering cluster set and a corresponding timestamp subset;
the initialization parameter module is used for initializing parameters and comprises the following sub-modules:
the initial parameter setting submodule is used for initializing a parameter k of an anonymous algorithm, a privacy protection intensity parameter and an acceptable issuing error range [ -alpha, alpha ];
the judgment submodule is used for judging whether the set parameter value k is reasonable or not;
the anonymous processing module is used for realizing coarse-grained disturbance on the timestamp and comprises the following sub-modules:
a judging submodule for judging the K value set KcK in (1)iThe value in the interval, if kiIf the result is 1, anonymous processing is not carried out, and the result directly enters a differential privacy processing module; otherwise, for a subset of timestamps TiEach data t in (1)iThe following anonymization treatment is carried out:
Figure FDA0002647324440000041
an iteration submodule for iterating the above submodule to generate a preliminary perturbation result Tc′={T1′,T2′,...,Tl′};
The differential privacy processing module is used for realizing fine-grained disturbance on the timestamp and comprises the following sub-modules:
a Laplace noise generation submodule for generating Laplace noise and calculating the probability density function f of Laplace noise according to the privacy protection intensity parameterLap(Z) generating a corresponding number of noise data sets Zi
A Laplace noise adding submodule for adding the generated Laplace noise ZiAdding to any subset T of coarse-grained disturbance resultsiIn' obtaining fine-grained disturbance result Ti″;
An iteration submodule for iterating the above submodule to generate fine-grained disturbance result
Figure FDA0002647324440000042
The truncation Laplace optimization module is used for optimizing the fine-grained disturbance result obtained by the difference privacy processing module and comprises the following sub-modules,
a truncated Laplace noise generation submodule for generating truncated Laplace noise and calculating the probability density function f of the Laplace noise according to the privacy protection intensity parameterTLap(Z), generating a corresponding number of noise data sets Z'i
Truncating the Laplace noise adding submodule, and generating truncated Laplace noise Z'iAdding into any subset T of fine-grained disturbance resulti"in, obtaining optimized results
Figure FDA0002647324440000043
An iteration submodule for iterating the above submodule to generate fine-grained disturbance result Tc={T1″,T″2,...,T″c}。
CN202010858883.5A 2020-08-24 2020-08-24 Differential privacy protection method and system for track time mode Active CN111950028B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010858883.5A CN111950028B (en) 2020-08-24 2020-08-24 Differential privacy protection method and system for track time mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010858883.5A CN111950028B (en) 2020-08-24 2020-08-24 Differential privacy protection method and system for track time mode

Publications (2)

Publication Number Publication Date
CN111950028A true CN111950028A (en) 2020-11-17
CN111950028B CN111950028B (en) 2021-08-31

Family

ID=73359558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010858883.5A Active CN111950028B (en) 2020-08-24 2020-08-24 Differential privacy protection method and system for track time mode

Country Status (1)

Country Link
CN (1) CN111950028B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032413A (en) * 2021-03-10 2021-06-25 北京嘀嘀无限科技发展有限公司 Data sampling method, device, electronic equipment, storage medium and program product
CN113177166A (en) * 2021-04-25 2021-07-27 重庆邮电大学 Personalized position semantic publishing method and system based on differential privacy
CN113438603A (en) * 2021-03-31 2021-09-24 南京邮电大学 Track data publishing method and system based on differential privacy protection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180322279A1 (en) * 2017-05-02 2018-11-08 Sap Se Providing differentially private data with causality preservation
CN109257385A (en) * 2018-11-16 2019-01-22 重庆邮电大学 A kind of location privacy protection strategy based on difference privacy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180322279A1 (en) * 2017-05-02 2018-11-08 Sap Se Providing differentially private data with causality preservation
CN109257385A (en) * 2018-11-16 2019-01-22 重庆邮电大学 A kind of location privacy protection strategy based on difference privacy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李效光等: "差分隐私综述", 《信息安全学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032413A (en) * 2021-03-10 2021-06-25 北京嘀嘀无限科技发展有限公司 Data sampling method, device, electronic equipment, storage medium and program product
CN113438603A (en) * 2021-03-31 2021-09-24 南京邮电大学 Track data publishing method and system based on differential privacy protection
CN113438603B (en) * 2021-03-31 2024-01-23 南京邮电大学 Track data release method and system based on differential privacy protection
CN113177166A (en) * 2021-04-25 2021-07-27 重庆邮电大学 Personalized position semantic publishing method and system based on differential privacy
CN113177166B (en) * 2021-04-25 2022-10-21 重庆邮电大学 Personalized position semantic publishing method and system based on differential privacy

Also Published As

Publication number Publication date
CN111950028B (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN111950028B (en) Differential privacy protection method and system for track time mode
Park et al. ST-GRAT: A novel spatio-temporal graph attention networks for accurately forecasting dynamically changing road speed
Bandekar et al. Design and analysis of machine learning algorithms for the reduction of crime rates in India
Ho et al. Differential privacy for location pattern mining
Li et al. A differential privacy-based privacy-preserving data publishing algorithm for transit smart card data
Jiang et al. Intelligent UAV identity authentication and safety supervision based on behavior modeling and prediction
Mazzawi et al. Anomaly detection in large databases using behavioral patterning
Su et al. An efficient density-based local outlier detection approach for scattered data
CN111581662B (en) Track privacy protection method and storage medium
Chen et al. RNN-DP: A new differential privacy scheme base on Recurrent Neural Network for Dynamic trajectory privacy protection
Cai et al. A trajectory released scheme for the internet of vehicles based on differential privacy
Duggimpudi et al. Spatio-temporal outlier detection algorithms based on computing behavioral outlierness factor
Ho et al. Preserving Privacy for Interesting Location Pattern Mining from Trajectory Data.
Yan et al. Differential private spatial decomposition and location publishing based on unbalanced quadtree partition algorithm
Srivastava et al. Weighted intra-transactional rule mining for database intrusion detection
Mitra et al. Toward mining of temporal roles
CN116186757A (en) Method for publishing condition feature selection differential privacy data with enhanced utility
Pu et al. STLP-OD: Spatial and temporal label propagation for traffic outlier detection
Wang et al. A k-nearest neighbor medoid-based outlier detection algorithm
CN114861224A (en) Medical data system based on risk and UCON access control model
Xiao et al. Dynamic graph computing: A method of finding companion vehicles from traffic streaming data
Lin et al. PTA: An efficient system for transaction database anonymization
Beulah et al. Towards Improved Detection of Intrusions with Constraint-Based Clustering (CBC)
Machanavajjhala et al. Analyzing your location data with provable privacy guarantees
Rathod et al. Survey on privacy preserving data mining techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant