CN110162997B - Anonymous privacy protection method based on interpolation points - Google Patents

Anonymous privacy protection method based on interpolation points Download PDF

Info

Publication number
CN110162997B
CN110162997B CN201910340914.5A CN201910340914A CN110162997B CN 110162997 B CN110162997 B CN 110162997B CN 201910340914 A CN201910340914 A CN 201910340914A CN 110162997 B CN110162997 B CN 110162997B
Authority
CN
China
Prior art keywords
track
distance
tracks
imhdt
anonymous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910340914.5A
Other languages
Chinese (zh)
Other versions
CN110162997A (en
Inventor
汪小寒
张泽培
何增宇
王涛春
孙丽萍
郑孝遥
罗永龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Normal University
Original Assignee
Anhui Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Normal University filed Critical Anhui Normal University
Priority to CN201910340914.5A priority Critical patent/CN110162997B/en
Publication of CN110162997A publication Critical patent/CN110162997A/en
Application granted granted Critical
Publication of CN110162997B publication Critical patent/CN110162997B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/18Network design, e.g. design based on topological or interconnect aspects of utility systems, piping, heating ventilation air conditioning [HVAC] or cabling

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Complex Calculations (AREA)
  • Storage Device Security (AREA)

Abstract

The invention is suitable for the technical field of privacy protection, and provides an anonymous privacy protection method based on interpolation points, which specifically comprises the following steps: s1, preprocessing an original track data set Ts to form a plurality of track equivalence classes Ecs which are consistent on a timestamp; s2, clustering the tracks in each track equivalence class according to IMHDT distance measurement, and forming a plurality of track anonymization groups in each track equivalence class, wherein the number of the tracks in each anonymization group is not less than k groups; s3, disturbing the track in each group to be anonymous, and finally satisfying interpolation track (k,) -anonymity. The track time stamp is taken as constraint, the interpolation point is limited on the track section of the corresponding time stamp, data distortion is reduced in the anonymization process, and the data availability is increased on the premise of meeting the requirement of issuing data privacy protection.

Description

Anonymous privacy protection method based on interpolation points
Technical Field
The invention belongs to the technical field of privacy protection, and provides an anonymous privacy protection method based on interpolation points.
Background
The modern society track information can be conveniently collected and shared by a mobile phone with a GPS, a PDA, a vehicle-mounted navigator, intelligent wearable equipment and the like. User can thereby conveniently use location-based service1(LBS) such as "find nearby gas stations", "record my movement track", etc., the collected track information can be used for business decision-making, e.g., opening a supermarket in a location information intensive area, etc., which typically has a greater business value, thereby maximizing the investor's profit. And can also be used for developing the application of city planning and the like. Trajectory information is of great value because it contains special spatio-temporal information, but this information can also be collected by malicious partiesAnd collecting and analyzing, so that the privacy of the user is revealed.
Therefore, the published data set needs to be processed anonymously, and the problem of privacy disclosure is solved. Meanwhile, track characteristics such as the length and duration of a track of a corresponding user cannot be excessively changed by data output by the privacy protection system, so that the usability of data issuing can be well processed while track information is issued, and the problem that attention needs to be paid to track privacy protection application at present can be solved by ensuring that an individual track is not identified by an attacker. Many methods exist for protecting the privacy of track data distribution, most of which do not consider the availability of data for distribution.
Disclosure of Invention
The embodiment of the invention provides an anonymous privacy protection method based on interpolation points, which takes track time stamps as constraints and limits the interpolation points on track segments of corresponding time stamps, reduces data distortion in an anonymous process and increases data availability on the premise of meeting the requirement of issuing data privacy protection.
In order to achieve the above object, the present invention provides an anonymous privacy protection method based on interpolation points, which specifically includes the following steps:
s1, preprocessing an original track data set Ts to form a plurality of track equivalence classes Ecs which are consistent on a timestamp;
s2, clustering the tracks in each track equivalence class according to IMHDT distance measurement, and forming a plurality of track anonymization groups in each track equivalence class, wherein the number of the tracks in each anonymization group is not less than k groups;
and S3, disturbing the track in each anonymous group, and finally satisfying interpolation track (k,) -anonymity.
Further, the step S1 specifically includes the following steps:
s11, defining track processing fragment value Pi
S12, acquiring the start-stop time stamp { t ] of the original track Trb,te};
S13, the acquisition time is later than the starting time tbAnd a module PiTime stamp t of 0iIn timeBefore the termination time teAnd a module PiTime stamp t of 0j
S14, cutting the original track into ti,tjAnd put into the trajectory equivalence class D { i, j }.
Further, the step S2 specifically includes the following steps:
s21, placing the unclustered tracks in each track equivalence class set into an active set, and randomly selecting one track from the active set;
s22, calculating IMHDT distances from other tracks in the active set to the selected track, and taking the track with the farthest IMHDT distance as a central track;
s23, calculating IMHDT distances from other tracks in the active set to the central track;
s24, forming an anonymous cluster by the nearest k-1 tracks in the IMHDT distance and the central track, and adding the anonymous cluster into an anonymous set;
s25, obtaining the track with the farthest distance from the k-1 tracks with the shortest distance, and if the IMHDT distance between the track and the central track is larger than a threshold value max _ radius, suppressing the anonymous cluster;
the IMHDT distance is the Hausdorff distance of the interpolation point under the time constraint.
Further, the IMHDT distance between the two tracks Tr1 and Tr2 is calculated as follows:
s221, calculating each track sampling point Tr1_ nodet=tiTo the track end
Figure BDA0002040673700000021
The shortest distance therebetween;
s222, calculating a track sampling point Tr1_ nodet=tiTo the track end
Figure BDA0002040673700000031
The shortest distance therebetween;
s223, taking the minimum distance value in the step S221 and the step S222 as a track sampling point Tr1_ nodet=tiThe IMHDT distance of (a);
S224track Tr1The average value of the IMHDT distance sum of each track sampling point is the IMHDT distance between the tracks Tr1 and Tr 2.
Further, the method for calculating the shortest distance between the track sampling point and the track segment is specifically as follows:
judging whether an interpolation point exists on the track segment or not, and enabling a connecting line of the track sampling point and the interpolation point to be perpendicular to the track segment;
if the interpolation points exist, the Euclidean distance between the track sampling points and the interpolation points is the shortest distance between the track sampling points and the track segments;
if not, the minimum distance between the track sampling point and the two end points of the track end is the shortest distance between the track sampling point and the track section.
The anonymous privacy protection method based on the interpolation points, provided by the embodiment of the invention, has the following beneficial effects:
1, in the track preprocessing process, a plurality of preprocessing fragments are adopted to normalize the track, and the remaining quantity of the normalized track under different preprocessing fragments and the quality of the track to be anonymized are comprehensively compared to determine the value of the preprocessing fragments, so that the track processing method is beneficial to reducing the progress of track inhibition and subsequent track anonymization in the preprocessing process.
2, introducing a track uncertainty theory into the interpolation point anonymity model, wherein the uniqueness of track data is that each sampling point can be a quasi-identifier, and directly moving the quasi-identifier can increase the anonymity cost. Therefore, the inherent uncertain region of the track is introduced to serve as the anonymous region of the track, and the anonymous cost is favorably reduced.
3, when measuring the track distance in the track clustering process, adopting Hausdorff distance based on the interpolation points, and theoretically proving that the Hausdorff distance calculation value based on the interpolation points from the anonymous track to the central track is always less than or equal to the Euclidean distance calculation value between the same tracks. Clustering using this distance can therefore result in a cluster that is smaller than the euclidean distance generalization area.
4, an anonymity model of replacing sampling points with interpolation points in adjacent track segments of the sampling points is provided in the track anonymity process, and track disturbance in the anonymity process can be reduced by using the model, so that data distortion is reduced, and the data usability is increased on the premise of meeting the requirement of issuing data privacy protection.
Drawings
Fig. 1 is a schematic diagram of an uncertain region of a trace sampling point according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of three original tracks that do not satisfy track (3,) -anonymity provided by embodiments of the present invention;
fig. 3 is a schematic diagram of three original tracks after anonymization according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for providing anonymous privacy protection based on interpolation points according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an interpolation trajectory similarity measurement without time constraint according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an interpolation trajectory similarity measurement under time constraints according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating a comparison of Euclidean distance between tracks and a Hausdorff distance based on interpolation points according to an embodiment of the present invention;
fig. 8 is a schematic diagram of an anonymization operation based on interpolation points according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Definition of related nouns:
1) track of
A trajectory is generally the course of a moving object over a period of time. The trajectory information is collected by sensor devices with positioning systems, which store and send the coordinates of the moving objects to the trajectory information collector during the corresponding time, in two different ways:
the track Tr is formed by a string of time stampsTriplet (t) of sequencesi,xi,yi) Form a
Tr={(t1,x1,y1),(t2,x2,y2),...,(tn,xn,yn)}
Wherein x isi,yiRepresenting the trace at time stamp tiWhen the coordinate value is more than or equal to 1, i is less than or equal to n.
Another way of representing the trajectory is to use a series of continuous polylines
Figure BDA0002040673700000051
Form a
Figure BDA0002040673700000052
Wherein p isJRepresenting a sample point in the track Tr, plenTrWhich represents the length of the track Tr,
Figure BDA0002040673700000053
the broken line between two track points is approximate to a track route in the simulation reality, when the sampling frequency of a sampling point approaches to 0, the track is closer to the motion route in the reality, but the higher the sampling frequency is, the higher the cost for storing and analyzing the track is.
2) Hausdorff distance
The Hausdorff distance is a distance measurement method between two point sets in the image field, and two point sets T are giveni={a1,a2,...,ai,...,amAnd Tj={b1,b2,...,bj,...,bn},Ti,TjThe Hausdorff distance between is defined as follows:
Figure BDA0002040673700000054
wherein
Figure BDA0002040673700000055
Because the original Hausdorff distance adopts the maximum value and the minimum value to calculate the distance between the point sets, the distance is greatly influenced by some outliers; in order to improve the robustness of the Hausdorff distance to isolated points and noise, an improved Hausdorff distance is provided, the influence caused by outliers is reduced by adopting an averaging mode, and the Hausdorff distance is improved to represent:
Figure BDA0002040673700000056
the Hausdorff distance based on interpolation points is expressed as follows:
Figure BDA0002040673700000057
3) uncertainty of trace sampling point
The existing positioning technology can not accurately position a coordinate point, the actual position is generally generalized to a circular area, the characteristic is the basis for realizing track (k,) -anonymity, the track is anonymized through the characteristic, but the invention is different in that an uncertain area of an original track sampling point is not adopted, but an interpolation point is used for replacing the uncertain area, and therefore, the smaller anonymity cost is obtained.
Due to inaccuracy of the positioning technology in reality, assuming that an uncertain threshold value is represented, a circular area with the track sampling point as a circle center and the uncertain threshold value as a radius is an uncertain area of the track sampling point (as shown in fig. 1):
dist(preal,p)≤
wherein p isrealRepresenting the true position of the trajectory and p the sample point. p is a radical ofrealMay exist at any location in the uncertainty region.
4) Cooperative anonymity of traces (Co-localization)
In the final anonymization, the corresponding sampling points on each track are required to be within an uncertain region of each other in pairs, so that the tracks are subjected to cooperative anonymity. Defining a center track Tr and an anonymous track Tr'
Tr={(t1,x1,y1),(t2,x2,y2),...,(tn,xn,yn)}
Tr′={(t1,x′1,y′1),(t2,x′2,y′2),...,(tn,x′n,y′n)}
Each track sampling point on Tr' is within an uncertainty range of the corresponding center track sampling point. The Euclidean distance adopted by the assumed track measurement function needs to be satisfied
Figure BDA0002040673700000061
Then two tracks are said to satisfy cooperative anonymity (Co-localization), noted Coloc (Tr, Tr')
5) Trajectory (k,) -anonymity group
If any two tracks in the track anonymous group meet the cooperative anonymity of the tracks, the uncertain region is as follows, and the number of the tracks in the anonymous group is more than or equal to k, the anonymous group is a track (k, k) -anonymous group.
The trace (k,) -anonymity is proposed based on the uncertainty of the trace sampling points, as shown in fig. 2, the corresponding sampling points of the anonymous traces Tr1 and Tr2 satisfy the anonymity requirement in the uncertainty region with the first sampling point of the central trace as the center and the uncertainty threshold as the radius, and the anonymous traces Tr1 and Tr2 satisfy the anonymity requirement2And the center trajectory satisfies (3,) -anonymity at the first trajectory sampling point. The second sample point in the graph only satisfies (2,) -anonymity because the corresponding sample point on the anonymous trace Tr2 is not within the uncertainty region. Similarly, the third and fourth tracks only satisfy (2 ') -anonymity, and if a track group formed by the three tracks is to be converted into a track (3') -anonymity group, a moving operation is required for the corresponding track point, and the operation causes data distortion. FIG. 3 shows the three original tracks passing through anonymityAnd (3) -anonymous track sets are formed after the transformation, and gray track sampling points in the graph in FIG. 3 are formed by moving the positions of track points which do not meet the conditions originally, and all tracks meet the cooperative anonymity of the tracks at the moment, so that the track anonymity set is a (3) -anonymity set.
The invention aims to solve the problem that the original track data set is subjected to anonymous operation, so that the privacy disclosure risk of a track owner can be reduced on the premise of the attack assumption, any operation on the original track data set can distort published data, and the use value of the published data set is reduced.
Evaluation index
1) Data distortion
Clustering is to classify data according to certain characteristics, and the inter-cluster data similarity is small while the intra-cluster data similarity is large. Similarity is the key point of the clustering process, track clustering is not an exception, and how to express similarity between tracks becomes the core of a track clustering algorithm. The similarity of the trajectory k-anonymity is calculated by Euclidean distance in the classical algorithm NMA, while the Hausdorff distance based on interpolation calculation is adopted in the invention, and the calculation of the distance is proved to be less than or equal to the widely used Euclidean distance. Thus using the Hausdorff distance allows the clustering process to form smaller clusters with smaller generalization areas. The reduced generalization area may result in reduced data distortion during clustering.
Data distortion
Figure BDA0002040673700000071
Figure BDA0002040673700000072
Where len (ecs) represents the number of clusters after clustering, ClusterArea (Ec)i) Represents the cluster EciMaxArea represents the total area of the track area.
Anonymous cost
The anonymization of the track is to perform data conversion in a cluster formed after the track clustering, namely to move a track sampling point to satisfy (k,) -anonymity. Since the tracks in the cluster meet the requirements of k tracks after the track clustering process, the step needs to make the sampling points in each track meet the requirement that the distance from the sampling points to the central track does not exceed the requirement.
Anonymous cost
Figure BDA0002040673700000081
Figure BDA0002040673700000082
Wherein, transflationNode represents the moving distance of the track sampling point, and maxTranslation represents the moving distance of all points in the track.
Fig. 4 is a flowchart of an anonymous privacy protection method based on interpolation points according to an embodiment of the present invention, where the method specifically includes the following steps:
s1, preprocessing an original track data set Ts to form a plurality of track equivalence classes Ecs which are consistent on a timestamp;
the measurement of the similarity of the trajectories is influenced by the fact that the sampling time of different trajectories in real life is greatly different. The original traces need to be divided into several equivalence classes according to the time stamps, wherein the traces of each equivalence class have a consistent time stamp. However, the number of traces with identical timestamps is small, and the classification directly according to the timestamps inevitably leads to the excessive number of trace equivalence classes, while the number of traces in each equivalence class is small, and if the number is less than k, the equivalence class does not meet the k-anonymity requirement and must be restrained, thereby leading to the poor quality of the anonymous data. The invention adopts a track preprocessing mode in the NWA algorithm, ensures that each track equivalence class has more tracks by inhibiting partial track points, reserves a large number of tracks compared with the mode of inhibiting the whole track equivalence class in the above, and greatly improves the track anonymity quality.
Because the selection of the preprocessing fragments needs to balance between the track reservation quantity and the quality of the data to be anonymized, different preprocessing fragments are adopted to respectively carry out preprocessing in the preprocessing process, so that a plurality of equivalent groups are formed. And comprehensively analyzing each index in the experimental part, selecting a proper preprocessing fragment for experiment, and reducing the track inhibition data volume while maintaining the track quality to be anonymous.
The algorithm 1 is a preprocessing process of an original track, an input value is an original track set Ts, a track preprocessing slicing value Pi is output as a track equivalence class processed by corresponding preprocessing slicing, and each equivalence class keeps a consistent track timestamp. For each track in the data set, a start-stop time stamp t of the track is recorded firstb,te]Taking i as more than tbAnd the modulus Pi is 0, and j is taken as the time stamp which is smaller than the termination time stamp teAnd modulo Pi is a 0 timestamp. Intercept the trace as [ t ]i,tj]And put into the same i, j value trajectory equivalence class.
Finally, all the tracks are put into the equivalence classes of corresponding i, j values in the equivalence class set, the tracks in each equivalence class have the same starting and stopping time, note that: the inventive data set timestamps are consecutive, i.e., if the start-stop timestamps of two traces are consistent, then all of them are consistent.
Figure BDA0002040673700000091
S2, clustering the tracks in each track equivalence class according to IMHDT distance measurement, and forming a plurality of track anonymity groups in each equivalence class, wherein the number of the tracks in each anonymity group is not less than k;
in the clustering process, the tracks in the same timestamp equivalence class need to be measured according to a specific similarity function, the track similarity is higher, the tracks are divided into the same group to be anonymized, and the number of each anonymized group element group is not lower than k. The core of the process is how to determine the similarity of two tracks, i.e. the determination of the track metric function. The classical measurement function is the Euclidean distance, the Euclidean distance of the trace sampling points on the corresponding time stamps is calculated firstly, then the arithmetic mean value is taken for the sampling points on all the time stamps, and the value is the Euclidean distance of the trace. The invention provides a new measurement mode IMHDT, and the Hausdorff distance (IMHDT) calculation process of an interpolation point under time constraint is as follows:
Figure BDA0002040673700000101
wherein, dist (p)a,pb) Representative sample point pa,pbDistance between, interpolation point
Figure BDA0002040673700000102
Let dist (p)a,ato the broken line segment pb-1pbIs measured.
Figure BDA0002040673700000103
Wherein, dist (Tr)a,Trb) Representative track Tra,TrbIMHDT distance between, t number of samples.
The opposite comparison result may be caused by not introducing a time constraint in the track similarity comparison, fig. 5 is an interpolated track similarity measure without the time constraint, in fig. 5, the IMHD calculation distance is very small when the two tracks are searched for an interpolated point without the time constraint, but actually, the two tracks are two tracks with opposite directions, and the track sampling point distances are also very different, especially at times t1 and t4 in the figure. Therefore, the use of IMHD to measure trajectories in this case is not practical; fig. 6 is a time-constrained interpolation trajectory similarity measurement, the search of interpolation points is limited to adjacent trajectory segments at the same sampling time, and the obtained IMHDT distance is relatively large, which meets the actual situation. Meanwhile, the search range of the track interpolation point is greatly reduced, the interpolation point is not required to be searched by the IMHD full-track scale, and the calculation efficiency is improved.
The invention adopts Hausdorff distance (IMHDT) based on interpolation points under time constraint as a track measurement function and carries out track clustering based on greedy clustering. Because the IMHDT distance is less than or equal to the Euclidean distance under the same condition (proved by the following description), the anonymous group formed by clustering has smaller generalization radius than the Euclidean distance, so that the clustered cluster has smaller generalization area, and the track data distortion caused by generalization is reduced.
The Hausdorff distance calculation value from the anonymous track to the central track based on the interpolation points is always less than or equal to the Euclidean distance calculation value between the same tracks, and the proving process is as follows:
the Hausdorff distance calculation of the track sampling points is different in calculation mode under different conditions:
1. as shown at t1 in fig. 7, when an anonymous trace sampling point cannot bisect the perpendicular bisector of the trace segment in the central trace that is adjacent to the timestamp, an interpolation point cannot be made. At the moment, the Hausdorff distance of the track sampling point is consistent with the Euclidean distance; 2. an interpolation point may be obtained when the anonymous trace sampling point may only bisect vertically one end of the trace segment in the central trace that is adjacent to the timestamp, as shown at t3 in fig. 7. At the moment, the Hausdorff distance of the track sampling point is the Euclidean distance from the sampling point to the interpolation point. Because a right-angled triangle is formed between the three points, the Hausdorff distance between sampling points is less than the Euclidean distance as the length of the inclined side of the right-angled triangle is greater than that of any right-angled side; 3. two interpolation points can be obtained when the anonymous trace sampling point can bisect the two ends of the trace segment in the central trace that is adjacent to the timestamp, as shown at t2 in fig. 7. At the moment, the Hausdorff distance of the track sampling point is the shorter one of Euclidean distances from the sampling point to two interpolation points. Similarly, the Hausdorff distance between sampling points is less than the Euclidean distance.
The IMHDT distance of the track is the mean value of the distances of the track points, and the values are less than or equal to Euclidean distance under three conditions. The Hausdorff distance calculation value based on interpolation points of the anonymous track to the central track can be obtained, and the Euclidean distance calculation value between the anonymous track and the central track is always smaller than or equal to the Euclidean distance calculation value between the same tracks. And under the condition that all sampling points cannot perform perpendicular bisector on track segments adjacent to the timestamp in the central track, the two calculated values are equal.
The method judges the tracks in the group to be anonymous, selects the central track, and limits other tracks in the group to be anonymous in an uncertain area, thereby realizing cooperative anonymity of the tracks and achieving the purpose of privacy protection. Moreover, unlike the trajectory (k,) -anonymity model, the present invention perturbs using interpolated points instead of indeterminate points, resulting in less data distortion, less anonymity cost, and higher data availability.
And 2, clustering algorithm, inputting the track equivalence class Ecs and the privacy protection degree k. And outputting the clustered track equivalent class set clusteredEcs. First max _ radius is set and if the clustering results in an IMHDT distance exceeding the threshold, the cluster is suppressed. Secondly, initializing an unclustered track set and setting the unclustered track set to be null, then performing clustering operation on each track equivalence class after preprocessing, initializing a clustered track set, and if the number of tracks in the track equivalence class is less than k, inhibiting the clustered track set. An active set is initialized, and all tracks in the equivalence class are filled into the set to represent an unclustered track set. Then, a central track set is initialized, and a track is randomly selected from the active set. And then performing IMHDT distance calculation between the non-clustered tracks and the tracks, selecting a track with the farthest IMHDT distance value as a central track, and then performing IMHDT distance calculation between the non-clustered tracks and the central track. And initializing an anonymous cluster anonymity, and then taking k-1 tracks with the closest IMHDT distance and the central track to form an anonymous cluster. This anonymous cluster is added to the anonymity set. In this case, the farthest track of the k-1 tracks closest to the center track needs to be calculated first, and if the IMHDT distance between this track and the center track is greater than the previously set threshold max _ radius, these tracks are suppressed. After the step, whether the tracks are classified as anonymous tracks (added into an anonymous set) or not, the tracks need to be deleted from the anonymous track set active, so that the track equivalence class clustering is finished until the active set is empty. And after all the equivalence class clustering is finished, the track clustering process is finished.
Figure BDA0002040673700000121
Algorithm 3 is a specific algorithm that calculates the IMHDT distance in two tracks, the inputs of which are two tracks Tr1, Tr 2. The output is the IMHDT distance between the two traces. Firstly, calculating each track sampling point Tr1_ nodet=tiTo the track section
Figure BDA0002040673700000131
The shortest distance between them, then calculate
Figure BDA0002040673700000132
To the track section
Figure BDA0002040673700000133
The shortest distance therebetween. And finally, taking the minimum value as the IMHDT distance of the point, and after accumulation, taking the average value as the IMHDT distance between the tracks.
Figure BDA0002040673700000134
Algorithm 4 is an algorithm for calculating the shortest distance between a track sampling point and a track segment, and is input as the track sampling point
Figure BDA0002040673700000135
And sampling the points from two tracks
Figure BDA0002040673700000136
The constructed track segment. The output is the shortest distance between the track sampling point and the track segment. Firstly, judging whether an interpolation point exists on a track segment or not, enabling a connecting line of a sampling point and the interpolation point to be vertical to the track segment, if so, returning the Euclidean distance from the sampling point to the interpolation point, and if not, returning the sampling point of the track
Figure BDA0002040673700000137
To the other two end pointsThe distance is the minimum.
Figure BDA0002040673700000138
S3, anonymizing each anonymization group to satisfy the interpolation track (k, k) -anonymity.
This is accomplished by perturbing the traces in the anonymous group to satisfy the interpolation trace (k,) -anonymity requirement, as shown in fig. 8. The specific implementation mode is that the track sampling points which do not meet the requirements are moved, and the distance between the track sampling points and the center track is smaller than or equal to the distance between the track sampling points and the center track.
The anonymization operation is carried out by replacing the track sampling point with the interpolation point, so that the anonymization cost can be reduced, and the proving process is as follows:
Translation(IMHDT)=Eurp(Trp_⊙i,Tri)-
Translation(Eurp)=Eurp(Trpi,Tri)-
the calculated value of the Hausdorff distance based on interpolation points from the anonymous to the central trajectory, which has been demonstrated before, is always less than or equal to the calculated value of the euclidean distance between the same trajectories:
IMHDT(Trpi,Tri)=Eurp(Trp_⊙i,Tri)≤Eurp(Trpi,Tri)
because the uncertain threshold value of the track is determined, the anonymity cost of the track point can be reduced by adopting the interpolation point to replace the track sampling point to carry out anonymization operation.
The anonymous privacy protection method based on the interpolation points, provided by the embodiment of the invention, has the following beneficial effects:
1, in the track preprocessing process, a plurality of preprocessing fragments are adopted to normalize the track, and the remaining quantity of the normalized track under different preprocessing fragments and the quality of the track to be anonymized are comprehensively compared to determine the value of the preprocessing fragments, so that the track processing method is beneficial to reducing the progress of track inhibition and subsequent track anonymization in the preprocessing process.
2, introducing a track uncertainty theory into the interpolation point anonymity model, wherein the uniqueness of track data is that each sampling point can be a quasi-identifier, and directly moving the quasi-identifier can increase the anonymity cost. Therefore, the inherent uncertain region of the track is introduced to serve as the anonymous region of the track, and the anonymous cost is favorably reduced.
3, when measuring the track distance in the track clustering process, adopting Hausdorff distance based on the interpolation points, and theoretically proving that the Hausdorff distance calculation value based on the interpolation points from the anonymous track to the central track is always less than or equal to the Euclidean distance calculation value between the same tracks. Clustering using this distance can therefore result in a cluster that is smaller than the euclidean distance generalization area.
4, an anonymity model of replacing sampling points with interpolation points in adjacent track segments of the sampling points is provided in the track anonymity process, and track disturbance in the anonymity process can be reduced by using the model, so that data distortion is reduced, and the data usability is increased on the premise of meeting the requirement of issuing data privacy protection.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (2)

1. An anonymous privacy protection method based on interpolation points is characterized by comprising the following steps:
s1, preprocessing an original track data set Ts to form a plurality of track equivalence classes Ecs which are consistent on a timestamp;
s2, clustering the tracks in each track equivalence class according to IMHDT distance measurement, and forming a plurality of track anonymization groups in each track equivalence class, wherein the number of the tracks in each anonymization group is not less than k groups;
s3, disturbing the track in each anonymous group to finally satisfy interpolation track (k,) -anonymity;
the step S1 specifically includes the following steps:
s11, defining track processing fragment value Pi
S12, acquiring the start-stop time stamp { t ] of the original track Trb,te};
S13, the acquisition time is later than the starting time tbAnd a module PiTime stamp t of 0iAnd the time is earlier than the termination time teAnd a module PiTime stamp t of 0j
S14, cutting the original track into ti,tjPutting the trajectory equivalence class D { i, j };
the step S2 specifically includes the following steps:
s21, placing the unclustered tracks in each track equivalence class set into an active set, and randomly selecting one track from the active set;
s22, calculating IMHDT distances from other tracks in the active set to the selected track, and taking the track with the farthest IMHDT distance as a central track;
s23, calculating IMHDT distances from other tracks in the active set to the central track;
s24, forming an anonymous cluster by the nearest k-1 tracks in the IMHDT distance and the central track, and adding the anonymous cluster into an anonymous set;
s25, obtaining the track with the farthest distance from the k-1 tracks with the shortest distance, and if the IMHDT distance between the track and the central track is larger than a threshold value max _ radius, suppressing the anonymous cluster;
the IMHDT distance is a Hausdorff distance of an interpolation point under time constraint;
two tracks Tr1、Tr2The IMHDT distance calculation method comprises the following specific steps:
s221, calculating each track sampling point Tr1_ nodet=tiTo the track end
Figure FDA0002655328220000021
The shortest distance therebetween;
s222, calculating a track sampling point Tr1_ nodet=tiTo the track end
Figure FDA0002655328220000022
The shortest distance therebetween;
s223, taking the minimum distance value in the step S221 and the step S222 as a track sampling point Tr1_ nodet=tiThe IMHDT distance of (a);
s224, track Tr1The average value of the IMHDT distance sum of each track sampling point is the IMHDT distance between the tracks Tr1 and Tr 2.
2. The anonymous privacy protection method based on interpolation points as claimed in claim 1, wherein the shortest distance between the trace sampling point and the trace segment is calculated by the following method:
judging whether an interpolation point exists on the track segment or not, and enabling a connecting line of the track sampling point and the interpolation point to be perpendicular to the track segment;
if the interpolation points exist, the Euclidean distance between the track sampling points and the interpolation points is the shortest distance between the track sampling points and the track segments;
and if the distance between the track sampling point and the two end points of the track end does not exist, the minimum value of the distance between the track sampling point and the two end points of the track end is the shortest distance between the track sampling point and the track section.
CN201910340914.5A 2019-04-25 2019-04-25 Anonymous privacy protection method based on interpolation points Active CN110162997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910340914.5A CN110162997B (en) 2019-04-25 2019-04-25 Anonymous privacy protection method based on interpolation points

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910340914.5A CN110162997B (en) 2019-04-25 2019-04-25 Anonymous privacy protection method based on interpolation points

Publications (2)

Publication Number Publication Date
CN110162997A CN110162997A (en) 2019-08-23
CN110162997B true CN110162997B (en) 2021-01-01

Family

ID=67640021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910340914.5A Active CN110162997B (en) 2019-04-25 2019-04-25 Anonymous privacy protection method based on interpolation points

Country Status (1)

Country Link
CN (1) CN110162997B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026930B (en) * 2019-12-02 2021-06-01 东北大学 Track data privacy protection method based on track segmentation
CN111259434B (en) * 2020-01-08 2022-04-12 广西师范大学 Privacy protection method for individual preference position in track data release
CN111625587B (en) * 2020-05-28 2022-02-15 泰康保险集团股份有限公司 Data sharing apparatus
CN112883423B (en) * 2021-02-25 2023-02-17 吉林师范大学 Similarity-based k-anonymous privacy protection method for release track
CN113672975B (en) * 2021-08-03 2024-06-28 支付宝(杭州)信息技术有限公司 Privacy protection method and device for user track

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358113A (en) * 2017-06-01 2017-11-17 徐州医科大学 Based on the anonymous difference method for secret protection of micro- aggregation
CN108733774A (en) * 2018-04-27 2018-11-02 上海世脉信息科技有限公司 A kind of unemployment dynamic monitoring method based on big data
CN109376184A (en) * 2018-10-16 2019-02-22 网链科技集团有限公司 A method of windward driving is taken based on big data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9367707B2 (en) * 2012-02-23 2016-06-14 Tenable Network Security, Inc. System and method for using file hashes to track data leakage and document propagation in a network
CN103605362B (en) * 2013-09-11 2016-03-02 天津工业大学 Based on motor pattern study and the method for detecting abnormality of track of vehicle multiple features
CN105760780B (en) * 2016-02-29 2018-06-08 福建师范大学 Track data method for secret protection based on road network
CN108734022B (en) * 2018-04-03 2021-07-02 安徽师范大学 Privacy protection track data publishing method based on three-dimensional grid division

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358113A (en) * 2017-06-01 2017-11-17 徐州医科大学 Based on the anonymous difference method for secret protection of micro- aggregation
CN108733774A (en) * 2018-04-27 2018-11-02 上海世脉信息科技有限公司 A kind of unemployment dynamic monitoring method based on big data
CN109376184A (en) * 2018-10-16 2019-02-22 网链科技集团有限公司 A method of windward driving is taken based on big data

Also Published As

Publication number Publication date
CN110162997A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
CN110162997B (en) Anonymous privacy protection method based on interpolation points
CN106931974B (en) Method for calculating personal commuting distance based on mobile terminal GPS positioning data record
Wang et al. Anomalous trajectory detection and classification based on difference and intersection set distance
KR100923723B1 (en) Method for clustering similar trajectories of moving objects in road network databases
CN107679558A (en) A kind of user trajectory method for measuring similarity based on metric learning
CN111259444B (en) Track data label clustering method fusing privacy protection
Li et al. Robust inferences of travel paths from GPS trajectories
CN110543539B (en) Method for inquiring track similarity of moving objects in distributed road network environment
CN105512727A (en) Markov-based personal path prediction method
CN106326923A (en) Sign-in position data clustering method in consideration of position repetition and density peak point
CN111460508A (en) Track data protection method based on differential privacy technology
CN106919957A (en) Method and device for processing data
Lyu et al. A partial-Fréchet-distance-based framework for bus route identification
CN111400747B (en) Measurement method based on track privacy protection
Chen et al. An analysis of movement patterns between zones using taxi GPS data
CN113052265A (en) Moving object track simplification algorithm based on feature selection
CN111125925B (en) Terminal area airspace space-time correlation analysis method driven by aircraft track data
CN111259434B (en) Privacy protection method for individual preference position in track data release
Sasaki et al. Road segment interpolation for incomplete road data
CN111967504B (en) Similar trajectory judgment method with important point constraint
CN114063150B (en) ML-KNN algorithm-based 'seismic source-station' speed model selection method
CN113379334B (en) Road section bicycle riding quality identification method based on noisy track data
Jafarlou et al. Improving Fuzzy-logic based map-matching method with trajectory stay-point detection
Wang et al. Grid‐Based Whole Trajectory Clustering in Road Networks Environment
Jiang et al. Time synchronized velocity error for trajectory compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant