CN114462093A - Space-time generalized trajectory data publishing method based on differential privacy - Google Patents

Space-time generalized trajectory data publishing method based on differential privacy Download PDF

Info

Publication number
CN114462093A
CN114462093A CN202210256578.8A CN202210256578A CN114462093A CN 114462093 A CN114462093 A CN 114462093A CN 202210256578 A CN202210256578 A CN 202210256578A CN 114462093 A CN114462093 A CN 114462093A
Authority
CN
China
Prior art keywords
time
track
generalization
space
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210256578.8A
Other languages
Chinese (zh)
Inventor
皮德常
邱述媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210256578.8A priority Critical patent/CN114462093A/en
Publication of CN114462093A publication Critical patent/CN114462093A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Remote Sensing (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a space-time generalized trajectory data publishing method based on differential privacy, which comprises the following steps: carrying out time generalization on the track data to generate a time position space set; providing a position space division method based on a density peak value clustering algorithm to generate a candidate division set; combining an index mechanism to probabilistically select the optimal candidate partition for position generalization; random noise is added to the generalized track statistic value through a Laplace mechanism, post-processing is carried out on noise data by combining consistency constraint, and a generalized track and a noise count value are issued. The invention has the advantages that: the clustering algorithm and the differential privacy technology are combined, the problems that the traditional method is low in execution efficiency and difficult to resist background knowledge attacks in a track data publishing scene are solved, and the privacy and the usability of the track data are improved. The invention is suitable for the multi-dimensional data release with complex background knowledge and privacy protection requirements.

Description

Space-time generalized trajectory data publishing method based on differential privacy
Technical Field
The invention relates to a space-time generalized trajectory data publishing method based on differential privacy, which is a privacy protection method aiming at the problem of trajectory data privacy disclosure in location services and belongs to the cross field of engineering application and information science.
Background
In recent years, with the wide application of the internet of things technology and the popularization of mobile terminals with positioning functions, various Location Based Services (LBS) are rapidly developed, and great convenience is brought to the lives of people. At the same time, corresponding position data information is also collected by the trajectory database without the user knowing it. The track data is hidden with rich crowd activity modes, the data is closely related to the production and life of the public and the aspects of the city, and the track data is researched to obtain rich information, so that the track data has extremely important practical significance. However, if the data collector does not properly store the user information or the data owner does not properly distribute the data, serious privacy disclosure problems may be caused. An attacker can gain insight into user sensitive information hidden behind the data representation, such as home address, hobbies, health condition, political tendency and the like, by carrying out deep private information mining on the collected track data. Once acquired by lawbreakers, this sensitive information poses a significant threat to the personal life and personal safety of the user. Therefore, how to publish the collected tracks under the condition of protecting the privacy of the user is a key challenge.
Existing trajectory privacy protection schemes are broadly classified into the following categories: suppression (selectively released based on the sensitivity of the real location), pseudo-trajectory (adding false trajectories to the real trajectory), generalization (generalizing the real location into a region), and perturbation (adding random noise to the real location at each time instant generates perturbed locations). The traditional privacy protection scheme is seriously dependent on the background knowledge owned by an attacker, when new attacks (such as de-anonymization attack and composition attack) occur, the model cannot provide a good protection effect, and the problem is effectively solved by the occurrence of the differential privacy technology. The difference privacy is proposed by Dwork et al, and through random disturbance of the published data, an attacker cannot identify whether a record (such as an ID, a name and the like) is in an original data table even if the attacker has certain background knowledge (such as the gender, a postal code and the like of a user), so that the purpose of privacy protection is achieved. The technology has the advantages that special attack hypothesis is not needed, background knowledge owned by an attacker is not concerned, a strict mathematical theory basis and a controllable privacy protection level are provided, and privacy disclosure risks can be quantified. In recent years, a great deal of research and discussion on differential privacy technology is carried out by many scholars, and some privacy protection methods are proposed according to the track privacy protection requirements under different scenes.
However, the existing track privacy protection work still has the following difficulties: (1) how LBS servers build efficient trajectory processing mechanisms to collect and store user's trajectory data. (2) How effective the trajectory publishing model is to resist adversary attacks with background knowledge. (3) On the basis of protecting the personal privacy of a user from being disclosed, the accuracy of track data publishing is improved, and the usability of published data is enhanced. At present, a track privacy protection method capable of efficiently solving all the problems is not available.
The invention designs an effective and feasible solution, and the space-time generalized trajectory data issuing method based on the differential privacy can effectively protect the user privacy and ensure the data availability, and has higher execution efficiency and very good application prospect.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a space-time generalized trajectory data issuing method based on differential privacy. The method fully utilizes the space-time characteristics of the trajectory data of the moving object, and solves the problems of low data availability and insufficient privacy protection degree of the traditional data publishing method. The track data is subjected to space-time generalization through a clustering algorithm, the risk of track privacy disclosure of a user can be effectively reduced, the privacy protection level of a model is ensured through an exponential mechanism and a Laplace mechanism of differential privacy, and the development of a location-based service industry is better promoted.
The technical scheme is as follows: in order to achieve the purpose, the invention provides a space-time generalized trajectory data publishing method based on differential privacy. First, k time position spaces are generated by temporally generalizing the trajectory data. Secondly, performing space division on the time position space through a density peak value clustering algorithm, and performing position generalization probabilistically by combining an index mechanism to hide the real position information of the user. And finally, adding random noise to the generalization trajectory statistic by using a Laplace mechanism, post-processing the noise data by using consistency constraint, and issuing a generalization trajectory and noise-added statistic. Therefore, privacy protection can be performed on the track data and the execution efficiency of the model can be effectively improved. The specific technical scheme comprises the following steps:
the method comprises the following steps: time-generalization generates a time-position space: clustering the time attributes of the track positions, generalizing the positions with the similar time nodes into the same time period, and generating k moment position spaces.
Step two: and (3) generating a candidate partition set by spatial clustering: and performing density peak value clustering operation for multiple times on each time position space for space division to generate a candidate division set.
Step three: selecting an optimal partition for position generalization: and selecting a candidate group with the best effect by using an index mechanism of differential privacy, carrying out spatial division on position points according to the clustering condition of the group, and generalizing all positions in each group into core positions.
Step four: track data release: and counting the number of the real tracks of each generalized track, and deleting the false tracks. And adding random noise to the statistical value by using a Laplace mechanism of differential privacy, post-processing the noise-added data through consistency constraint, and issuing a generalization track and a noise-added statistical value.
Has the advantages that: the invention provides a space-time generalized trajectory data issuing method based on differential privacy, aiming at trajectory privacy data protection of a moving object. The problems of low data availability and insufficient privacy protection degree of the existing track data publishing model are effectively solved. And performing space-time generalization on the trajectory data through a clustering algorithm to hide the real trajectory information of the user. And the privacy protection level of the model is ensured through an exponential mechanism and a Laplace mechanism. Therefore, the method and the device can effectively protect the track privacy and ensure the data availability, can also effectively improve the execution efficiency of the model, provide reference for the Location Based Service (LBS) industry and promote the development of the industry.
Drawings
FIG. 1 is a general flow chart (system model) of the method of the present invention.
Fig. 2 is a data distribution flow diagram.
FIG. 3 is a diagram of an example set of tracks
Detailed Description
The invention will be further explained with reference to the drawings.
The general flow of the present invention is shown in FIG. 1. Fig. 2 shows a process of publishing track data under privacy protection. Fig. 3 is a simple exemplary diagram of data distribution by the present method.
The invention utilizes space-time generalization and differential privacy to carry out privacy protection on the track data. First, the trajectory data is temporally generalized to generate k time position spaces. The method comprises the steps of carrying out space division on a time position space through a density peak value clustering algorithm, and carrying out position generalization probabilistically by combining an index mechanism to hide real position information of a user. And finally, random noise is added to the generalized track statistic by using a Laplace mechanism, post-processing is carried out on noise data through consistency constraint, the generalized track and the noise statistic are issued, and the usability of issued data is improved while the track privacy is not influenced. The specific implementation steps are as follows, and the general flow is shown in figure 1.
1. Time-generalized generation of a time-position space
Clustering the time attributes of the track positions, generalizing the positions with the similar time nodes into the same time period, and generating k moment position spaces. The specific treatment steps are as follows:
(1) and setting a k value and an initial centroid, wherein k represents that the time attributes of all positions in the track set are divided into k time periods, and the setting of the k value needs to refer to the characteristics of the track data set. Randomly selecting k moments as initial centroids for different moments tiAnd tjThe distance between them is:
Figure BSA0000268798200000041
(2) classifying each moment into a cluster class where the centroid closest to the moment is located, and recalculating the centroid moment according to the newly divided clusters:
Figure BSA0000268798200000042
(3) and (3) repeating the step (2) until the difference between the distance values of the front centroid and the back centroid is smaller than a threshold value or the maximum step number is reached. And selecting the centroid time of each cluster as the time stamps of all track position points in the cluster. The time attribute is divided into k fixed time segments Δ ti(i ═ 1, 2.. k.) to generate k time-of-day location spaces Γi
2. Spatial clustering to generate candidate partition sets
And performing density peak value clustering operation for multiple times on each time position space for space division to generate a candidate division set. The specific process is as follows.
(1) And calculating the local density of each position point, sequencing the position points, and further calculating the relative distance between each position point and a position point with higher density than the position point according to the sequencing result. For an arbitrary position point piBelongs to gamma, i is more than or equal to 1 and less than or equal to | D |, and the local density rho of the matrix isiFrom the relative distance deltaiComprises the following steps:
Figure BSA0000268798200000043
Figure BSA0000268798200000044
wherein d isijIs a position point piAnd pjinter-Euclidean distance, dcTo cut off the distance
(2) And drawing a decision graph according to the local density and the relative distance, selecting a data point with larger rho and delta as a clustering center, and then distributing the rest position points to the clusters to which the nearest neighbors with higher density belong according to the distance between the position point of each non-clustering center and each clustering center.
(3) The time position space gamma is divided into a moment position space gamma by a density peak value clustering algorithmiWherein all positions are mi1Cluster, this partitioning strategy is denoted as Pi1And then deleting different tracks in the x D pieces each time and clustering to obtain | D | type partitions. Generating candidate partition sets τi,τiContains 1+ | D | species division in totalAnd (6) obtaining the result.
3. Selecting optimal partitions for location generalization
And selecting a candidate group with the best effect by using an index mechanism of differential privacy, carrying out spatial division on position points according to the clustering condition of the group, and generalizing all positions in each group into core positions. The method comprises the following specific steps.
(1) An evaluation function U is defined, which calculates an evaluation value for each candidate partition scheme. For the partitioning scheme P, the evaluation function is:
Figure BSA0000268798200000051
where meandist (P) represents the average distance of all tracks divided according to the dividing strategy P.
(2) Given the utility scores of all candidate partitions, one partition is probabilistically selected as the best partitioning scheme by an indexing mechanism. For tauiJ (1. ltoreq. j. ltoreq.g) candidate partition PijThe probability value is:
Figure BSA0000268798200000052
where Δ U denotes the sensitivity of the evaluation function U, where Δ U is 1.
(3) According to the selected optimal partition scheme PijTo time position space gammaiThe position points in (1) are grouped, and the clustering center l of each group is usedid(1≤d≤mij) Instead of all the positions in the group. Similarly, the method carries out the operation of t on the position space of the track data at other momentsiThe same operation at the same time completes the position generalization process.
4. Track data distribution
And counting the number of the real tracks of each generalized track, and deleting the false tracks. And adding random noise to the statistical value by using a Laplace mechanism of differential privacy, post-processing the noise-added data through consistency constraint, and issuing a generalization track and a noise-added statistical value. The specific process is as follows.
(1) And generating generalization tracks according to each core position, counting Real track number Real of each generalization track, and when the Real track number Real is 0, indicating that the new generalization track is a null track, marking the null track as a false track and deleting the null track to inhibit release.
(2) Laplace random noise is added to the statistical value of the generalized locus by using a Laplace mechanism of differential privacy. The definition of Laplace noise is as follows.
Figure BSA0000268798200000053
In the sensitivity of the Δ Q query function Q, in the scenario of trajectory data distribution, Q represents a histogram query for the trajectory, so Δ Q is 1.
(3) Post-processing the noisy data by consistency constraint, if
Figure BSA0000268798200000054
The statistical trace values after noise addition are represented, and the constrained result is as follows:
Figure BSA0000268798200000055
wherein the content of the first and second substances,
Figure BSA0000268798200000056
finally, the generalized track and the noise addition statistic value after the constraint of the generalized track are released
Figure BSA0000268798200000057
Based on the above description, the location generalization procedure in the privacy protection method proposed by the present invention is described as follows:
inputting: time position space Γ ═ p1,p2,...,p|D|}, minimum local density ρminMinimum relative distance deltaminPrivacy budget ε1
And (3) outputting:generalized location set
Figure BSA0000268798200000061
Figure BSA0000268798200000062
The overall description of the space-time generalized trajectory data release method based on the difference privacy provided by the invention is as follows:
inputting: generalized trace set DG ═ tr1,tr2,...,trn}, privacy budget ε2
And (3) outputting: set of release traces DP ═ { tr1,tr2,...,trs}
Figure BSA0000268798200000071

Claims (4)

1. The space-time generalized trajectory data publishing method based on the difference privacy is mainly characterized by comprising the following steps of:
(1) time generalization: clustering the time attributes of the track positions, generalizing the positions with the similar time nodes into the same time period, and generating k moment position spaces;
(2) position generalization: grouping the position points in the position space at each moment through density peak value clustering to generate a candidate partition set, selecting the candidate partition set with the best effect by using an index mechanism of differential privacy, carrying out space partition on the position points, and generalizing all the positions in each group into core positions;
(3) track data release: counting the real track number of each generalization track, deleting false tracks, adding random noise to the statistical value by using a Laplacian mechanism of differential privacy, post-processing the noise-added data by consistency constraint, and issuing the generalization track and the noise-added statistical value.
2. The method for publishing track data based on difference privacy and space-time generalization according to claim 1, wherein the step (1) of time generalization comprises:
(2-1) setting a k value and an initial centroid for time classification: k represents that the time attributes of all the positions in the track set are divided into k time periods, k moments are randomly selected as initial centroids, and each moment is classified into a cluster where the closest centroid moment is located according to Euclidean distances among the moments;
(2-2) recalculating the centroid according to the newly divided clusters, and circularly executing the step (2-1) until the difference between the distance values of the front centroid and the rear centroid is less than a threshold value;
(2-3) generating a time position space: and selecting the centroid time of each cluster as the time stamps of all track position points in the cluster, generalizing the positions with the similar time nodes into the same time period, and generating k time position spaces gamma.
3. The method for publishing track data based on difference privacy and space-time generalization according to claim 1, wherein in the step (2), the location is generalized, and the implementation method comprises:
(3-1) clustering the position space density peak values: calculating the local density rho and the relative distance delta of each position point, selecting a data point with larger rho and delta as a clustering center, and distributing the rest position points to a cluster to which the nearest neighbor with higher density belongs;
for an arbitrary position point piBelongs to gamma, i is more than or equal to 1 and less than or equal to | D |, and the local density rho of the matrix isiFrom the relative distance deltaiComprises the following steps:
Figure FSA0000268798190000011
Figure FSA0000268798190000012
wherein d isijIs a position point piAnd pjinter-Euclidean distance, dcTo cut off the distance;
(3-2) generating a candidate partition set: the time position space gamma is divided into a moment position space gamma by a density peak value clustering algorithmiWherein all positions are mi1Cluster, this partitioning strategy is denoted as Pi1Then deleting different tracks in the x D pieces each time and clustering to obtain | D | type division; generating candidate partition sets τi,τiThe method comprises 1+ | D | division results;
(3-3) selecting a candidate partition having the best utility: defining an evaluation function U through an exponential mechanism to calculate an evaluation value for each candidate division scheme, wherein the evaluation value is tauiIn the probabilistic way, a partition P is selectedijAs the best candidate partition;
for tauiJ (1. ltoreq. j. ltoreq.g) candidate partition PijThe evaluation value is:
Figure FSA0000268798190000021
(3-4) selection of PijFor tiThe division of time instants being denoted asiThe result of the division of (1) is to replace all the positions of the cluster with the cluster center of each cluster.
4. The method for publishing trajectory data based on difference privacy and space-time generalization according to claim 1, wherein the step (3) is a step of publishing trajectory data, and the implementation method comprises the following steps:
(4-1) deleting false tracks: generating generalization tracks according to each core position, counting the number of real tracks of each generalization track, and deleting false tracks with the counting value of zero;
(4-2) noise adding treatment: adding Laplacian random noise to the statistical value of the generalized locus by using a Laplacian mechanism of differential privacy;
(4-3) track data publishing: and post-processing the noise-added data through consistency constraint, and issuing a generalization track and a noise-added statistic value after constraint.
CN202210256578.8A 2022-03-16 2022-03-16 Space-time generalized trajectory data publishing method based on differential privacy Pending CN114462093A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210256578.8A CN114462093A (en) 2022-03-16 2022-03-16 Space-time generalized trajectory data publishing method based on differential privacy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210256578.8A CN114462093A (en) 2022-03-16 2022-03-16 Space-time generalized trajectory data publishing method based on differential privacy

Publications (1)

Publication Number Publication Date
CN114462093A true CN114462093A (en) 2022-05-10

Family

ID=81418081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210256578.8A Pending CN114462093A (en) 2022-03-16 2022-03-16 Space-time generalized trajectory data publishing method based on differential privacy

Country Status (1)

Country Link
CN (1) CN114462093A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118013588A (en) * 2024-03-28 2024-05-10 南京信息工程大学 Differential privacy method of space-time clustering for track privacy protection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118013588A (en) * 2024-03-28 2024-05-10 南京信息工程大学 Differential privacy method of space-time clustering for track privacy protection

Similar Documents

Publication Publication Date Title
Peng et al. Clustering approach based on mini batch kmeans for intrusion detection system over big data
CN107871087B (en) Personalized differential privacy protection method for high-dimensional data release in distributed environment
Chen et al. Differentially private transit data publication: a case study on the montreal transportation system
Ouyang et al. Spatial co-location pattern discovery from fuzzy objects
Koufakou et al. Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
CN114462093A (en) Space-time generalized trajectory data publishing method based on differential privacy
CN106909619B (en) Hybrid social network clustering method and system based on offset adjustment and bidding
Shi et al. Extreme trees network intrusion detection framework based on ensemble learning
Shaham et al. Machine learning aided anonymization of spatiotemporal trajectory datasets
CN112560084A (en) Differential privacy track protection method based on R tree
Ying-hua et al. State-of-the-art in distributed privacy preserving data mining
Li et al. Enhancing federated learning robustness through clustering non-IID features
Wang et al. Local outlier detection based on information entropy weighting
CN108664548B (en) Network access behavior characteristic group dynamic mining method and system under degradation condition
CN113537308B (en) Two-stage k-means clustering processing system and method based on localized differential privacy
Otgonbayar et al. $ X-BAND $: Expiration Band for Anonymizing Varied Data Streams
Bhuyan et al. RODD: An Effective R eference-Based O utlier D etection Technique for Large D atasets
Ando et al. Adaptive isolation model using data clustering for multimodal function optimization
CN112995987B (en) Self-adaptive road network semantic position privacy protection method based on multi-objective optimization problem
CN113268770B (en) Track k anonymous privacy protection method based on user activity
Kumar et al. Mining of spatial co-location pattern from spatial datasets
CN110990869B (en) Power big data desensitization method applied to privacy protection
Li et al. High resolution radar data fusion based on clustering algorithm
Xue et al. KMUL: a user identity linkage method across social networks based on spatiotemporal data
Gurav et al. Hybrid approach for outlier detection in high dimensional dataset

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication