CN111400747B - Measurement method based on track privacy protection - Google Patents

Measurement method based on track privacy protection Download PDF

Info

Publication number
CN111400747B
CN111400747B CN202010113193.7A CN202010113193A CN111400747B CN 111400747 B CN111400747 B CN 111400747B CN 202010113193 A CN202010113193 A CN 202010113193A CN 111400747 B CN111400747 B CN 111400747B
Authority
CN
China
Prior art keywords
track
privacy protection
time
user
tracks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010113193.7A
Other languages
Chinese (zh)
Other versions
CN111400747A (en
Inventor
戴慧珺
桂小林
徐盼
滕晓宇
李德福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010113193.7A priority Critical patent/CN111400747B/en
Publication of CN111400747A publication Critical patent/CN111400747A/en
Application granted granted Critical
Publication of CN111400747B publication Critical patent/CN111400747B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Algebra (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a measurement method based on track privacy protection, which provides two measurement methods for two common track privacy protection methods, quantifies the track privacy protection degree of a mobile user and clearly reflects the user privacy safety degree and the privacy disclosure degree. Aiming at a privacy protection method based on generalization, providing a distance measurement index facing to the track, judging the intersection of the track, and calculating the intersection degree; and performing distance measurement and similarity calculation on the tracks which finally form the synchronization; aiming at the privacy protection method based on the confusion zone, providing an information entropy index facing the distribution probability, and calculating a probability vector for the track; and calculating a privacy protection function according to the information entropy, and finishing measurement based on track privacy protection. The two measurement methods for protecting the track privacy can effectively evaluate the strength of two common privacy protection methods and analyze the distortion degree and the effectiveness of the data before and after being protected.

Description

Measurement method based on track privacy protection
Technical Field
The invention belongs to the technical field of track privacy safety of users under a position information service, and particularly relates to a measurement method based on track privacy protection.
Background
In recent years, the location data is used as an extension of the Internet of things tracking and positioning service in the real world, provides massive basic data, and detonates the related novel IoE (Internet over Everything) applications such as BYOD mobile office, everything tracing, virtual social interaction, AR navigation and the like. Successive positions constitute trajectory data (where the trajectory does not contain animal activity and natural phenomena), which are essentially time-sequential samples and snapshots of the trajectory, both based on GPS positioning and time stamping, are sources of data for participatory group awareness in city computation. The method guides social life and production practice through the big data of the position and the track, and has strong practical significance. The location-based service brings a rich personalized experience to the user, also has a latent privacy revealing crisis, and various terminal applications and corresponding sensors such as cameras, microphones, heart rate meters and smart bracelets can excessively acquire user information. The location privacy protection starts from life social service such as location check-in, location inquiry and navigation, and the like, and focuses on protecting the user identity, the current location and the time-space dimension of the position extended by the inquiry content track, and the data relevance and the time-space feature of the location privacy protection enable information such as user behavior features, interest and hobbies and social habits to be more easily mined, and the track privacy function needs stronger protection measures.
Track privacy protection is based on a query request scene and a track big data release scene in real-time location service respectively, a plurality of privacy protection methods have achieved successful application effects at the present stage, and common privacy protection methods comprise generalization and confusion. The track generalization method is that the space K of the track is anonymous, and the track of which a plurality of starting points are close and the content of a travel route in a certain space area is generalized into a certain space grid. The track generalization method sacrifices precision and compromises indistinguishability of multiple tracks, and its variants also include m-invariance,
Figure BDA0002390696650000021
diversity, p-sensitivity; the confusion zone method of the track divides the moving area into a mixed area and an application area, and renaming, blurring and dynamic replacement in the mixed area can enhance the effectiveness of kana replacement. The confusion zone method is that each track, i.e. each user has a user ID when using the location service, and the ID is either set by the user or allocated by the system. The ID is unique and any one of the identifiers refers to a user that is unique. When a user enters the confusion zone, all conditions are met (there are other users in the zone at this time, their number of exchanges ID has not reached the upper limit, etc.), they will perform the ID replacement operation. In the whole movement process of each user, multiple ID substitutions can occur, so that an attacker can hardly link the positions of the users before and after a specific time, and the track information of the users is prevented from being maliciously acquired, thereby realizing track privacy protection.
The research on privacy measurement methods in location services at home and abroad mainly comprises two types of methods, namely a specific measurement method and a moderate measurement method which are associated with a privacy protection method. The specific measurement method associated with the privacy protection method is characterized in that different privacy protection methods have respective risk evaluation systems, and the measurement is different from algorithm to algorithm. The privacy measurement method has the limitation that only one aspect of privacy information leakage risk, privacy protection method strength, cost or data loss, service accuracy reduction and the like is measured, and a user cannot obtain a privacy risk objective index.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a measurement method based on track privacy protection, which quantifies the track privacy protection degree of a mobile user and clearly reflects the user privacy safety degree and the privacy disclosure degree.
The invention adopts the following technical scheme:
a measurement method based on track privacy protection comprises a privacy protection method based on generalization and a privacy protection method based on confusion zone, wherein in the privacy protection method based on generalization, a distance measurement index facing to tracks is provided, the intersection of the tracks is judged, and the intersection degree is calculated; generating a synchronous track for the intersected tracks; performing distance measurement and similarity calculation on the synchronous track; in a privacy protection method based on a confusion zone, providing an information entropy index facing to distribution probability, and calculating a probability vector for a track; calculating information entropy for the probability vector; and calculating a privacy protection function for the information entropy to finish measurement based on track privacy protection, and effectively evaluating the strength of the privacy protection method and analyzing the distortion degree and effectiveness of the data before and after protection.
Specifically, in the privacy protection method based on generalization, the intersection of the tracks is judged, and the calculated intersection degree is specifically:
s101, preprocessing track similarity measurement, wherein a track T is defined as follows:
T={(t 1 ,x 1 ,y 1 ),(t 2 ,x 2 ,y 2 ),...,(t n ,x n ,y n )}
wherein n is equal to or greater than 1, and represents a sampling time sequence
Figure BDA0002390696650000031
x i ,y i Representing coordinates of the track point at the i-th moment;
s102, for any two tracks T i And T j Judging intersection if
Figure BDA0002390696650000032
Then the two tracks are intersected in time, and the intersection degree p of the two tracks is calculated;
further, the intersection degree p of the two tracks is:
Figure BDA0002390696650000036
specifically, in the privacy protection method based on generalization, the generation of the synchronous track for the intersecting track is specifically:
s201, two tracks T are arranged i And T j The trajectories p intersect if in an intersection time interval
Figure BDA0002390696650000034
n > m, track T i And T j With the same number of location points and the same corresponding time, T i And T j Is a synchronous track; if T i And T j Adding a synchronization point to enable the two tracks to meet the synchronization condition if only the time is intersected but not synchronized;
s202, selecting T i Start time earlier than T j I.e.
Figure BDA0002390696650000035
Two indicated coordinates a and b are located at the first of each track, i.e. a=b=1;
s203, a is at T i The method comprises the steps of pushing upwards and backwards until a and b meet, recording meeting time, wherein a=b=2, moving backwards simultaneously when the time is the same, and ending synchronous operation immediately when any coordinate reaches the tail end of the track in the moving process;
s204, in the moving process, a and b meet the next sampling point, if set
Figure BDA0002390696650000041
In track T i Adding a time of +.>
Figure BDA0002390696650000042
Is a data point of (2); the position information is composed of->
Figure BDA0002390696650000043
And->
Figure BDA0002390696650000044
After determining that the addition is completed, the coordinate a points to the newly added data point, and then returns to S203.
Specifically, in the privacy protection method based on generalization, the distance measurement and similarity calculation on the synchronous track specifically includes:
s301, calculating the square root of the sum of all interval variation amounts, dividing the square root by the time intersection degree of the track, and calculating the track shape distance d shape
S302, in the overlapping time interval of the synchronous track, firstly obtaining the sum of the squares of the distances of corresponding points of each time, then taking the mean value of the squares, dividing the mean value by the intersection degree of the track, and calculating the track position distance d loc (T i ,T j );
S303, the track distance is obtained by weighting and summing the track shape distance and the position distance, alpha is a weight adjustment coefficient, and the track distance d (T) i ,T j ) The following are provided:
d(T i ,T j )=αd shape (T i ,T j )+(1-α)d loc (T i ,T j )
further, the track shape distance d shape The deformation is as follows:
Figure BDA0002390696650000045
wherein t is k For a specific time at time k,
Figure BDA0002390696650000046
from (k-1) to kThe change quantity of the current point at moment in the x coordinate, delta t k For the time change from (k-1) to time k, < >>
Figure BDA0002390696650000047
The change quantity of the current point from (k-1) to the moment k on the y coordinate is obtained, and p is the intersection degree of the tracks;
the track position distance d is as follows:
Figure BDA0002390696650000051
wherein R is the earth radius of 6371 km, t s For a specific time at the time s,
Figure BDA0002390696650000052
and->
Figure BDA0002390696650000053
The x and y coordinates of the two tracks at the current moment i and j are respectively.
Specifically, in the privacy protection method based on the confusion zone, the probability vector is calculated for the track, and the probability vector is set to enter the confusion zone S k The previous user ID set is I (S k )={i 1 ,i 2 ,...,i n ) The user ID set exiting the confusion zone is O (S k )={o 1 ,o 2 ,...,o n ) The method comprises the steps of carrying out a first treatment on the surface of the At any moment, each user has only one ID, so that the user IDs belonging to two sets have a one-to-one mapping relationship; the attacker reasoning the relation between the in-out position and the out-in position and expressing the relation by using conditional probability; the probability matrix represents the association between the two sets of identifiers.
Further, if the number of users whose IDs are replaced in the confusion area is n, the entry/exit matrix size of the replaced IDs is n×n.
Figure BDA0002390696650000054
Wherein the element p (u j |o i ) Indicating user IDu entering confusion zone j And user o leaving the confusion zone i I.e. the probability vectors that two ids are the same user; the j-th line element is the same user IDu entering the confusion zone j
Figure BDA0002390696650000055
The element of column k is the same user IDo that leaves the confusion zone k ,/>
Figure BDA0002390696650000056
Specifically, in the privacy protection method based on the confusion zone, each track o is obtained after all ID replacement operations are completed in the information entropy calculated on the probability vector i The corresponding probability vectors are provided, and the obtained track information entropy is as follows:
Figure BDA0002390696650000057
the maximum value of the information entropy is as follows:
Figure BDA0002390696650000061
wherein u is j To enter the user ID of the confusion zone, o i To leave the user ID of the confusion zone, p is the probability, p (u j |o i ) User u j And user o i I.e. the probability vector that two IDs are the same person, n is the number of users in the confusion zone where ID permutation occurs.
Specifically, in the privacy protection method based on the confusion zone, the privacy protection function G (o i ) The method comprises the following steps:
Figure BDA0002390696650000062
wherein H (o) i ) For track information entropy, H max (o i ) Is the maximum value of information entropy.
Compared with the prior art, the invention has at least the following beneficial effects:
the invention discloses a measurement method based on track privacy protection, which aims at providing two kinds of measurement indexes for two different track privacy protection methods: in a privacy protection method based on generalization, providing a distance measurement index facing to a track; in the privacy protection method based on the confusion zone, information entropy indexes facing the distribution probability are provided. The two different indexes fully consider the space time sequence data characteristics of the track, and on one hand, the disturbance of the track data based on the position before and after privacy protection and the influence of the disturbance on the privacy protection are measured; on one hand, the distribution and relevance of data are measured from the aspects of attack and risk leakage. Based on the user position and track privacy protection method in the existing position service, the invention expands and analyzes from the service platform perspective, and provides scientific and accurate privacy measurement quantization indexes for two typical track privacy protection methods.
Further, for the track, the surface is seen, the track definition is normalized, and for any two tracks T i And T j Determining whether to intersect in time gives formalized criteria and defines the degree of intersection.
Further, the operation of synchronizing the tracks provides reference point coordinates for calculating the track shape and position distance, and the synchronous tracks need to traverse the tracks simultaneously, and sampling points are inserted according to whether the time is consistent or not.
Further, the time axis of the synchronous track is divided into small time intervals, and each small time interval calculates the square sum of the differences of the position variation amounts of the user on the two tracks in the x and y directions of the coordinate axes to reflect the shape distance variation in a short time, and the calculation represents the similarity of the tracks, as shown in fig. 3. And further obtaining the sum of squares of the distances of the corresponding points between the cells, squaring to obtain a mean value, dividing the mean value by the intersection degree of the tracks to obtain the track position distance, and taking the final track distortion as a mixed value of the shape distance and the position distance.
Further, probability vectors are used for describing the association degree of users entering and leaving the confusion zone and a certain track, and probability matrixes are used for representing the association between two groups of identifiers.
Further, the information entropy is set to judge the confusion degree of the system, and the larger the entropy value is, the more stable the system is, and the better the privacy protection effect is;
further, a privacy protection function is defined based on the information gain, and the privacy protection function G (o i ) The higher the value, i.e. the higher the degree of privacy protection, the lower the privacy disclosure rate and the relatively weaker the attacker's ability.
In summary, the invention mainly aims at two types of privacy protection methods of track generalization and confusion in a track big data release scene, and provides two different metrics comprising two different types of indexes of distance metric indexes and information entropy indexes. The two indexes provided by the invention provide universal measurement of privacy protection level, have good adaptability in various scenes, quantify the track privacy protection degree of the mobile user, and clearly reflect the privacy security degree and the privacy risk disclosure degree of the user through scientific evaluation.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a flow chart of a metrology method of the present invention;
FIG. 2 is a timing diagram of a trace and an intersection trace, wherein (a) is a timing diagram of original data of the trace, coordinates on points indicate longitude and latitude of the trace point, and the direction in which the trace extends is T in the timing direction (b) 1 And T 2 As an intersecting track, two tracks are overlapped in time sequence according to the definition of the intersecting track;
FIG. 3 is a process diagram of a synchronous track, wherein (a) the selected arrow represents two indication coordinates at the head of each track, respectively, and (b) the indication coordinates are shifted backward on the two tracks, respectively, until the second sampling point, due to T 1 No third sample point is on the trace, thus indicating that the coordinates move to a fourth sample point, (c) represents a time according to the fourth sample point; t is as follows 1 The front and back positions of the sampling point are at T 1 Adding a third sampling point, after the addition is completed, the coordinate a points to the newly added point, and (d) is the third pointContinuing to synchronize the fourth point after the completion of the insertion and synchronization, (e) is performed on the track T 2 Adding a fourth synchronization point, wherein (f) after the third synchronization point and the fourth synchronization point are completed, the track continues to move backwards;
FIG. 4 is a graph of the shape-distance similarity of two tracks and the x-t plane coordinates of two tracks, wherein (a) is the coordinates of three tracks and (b) is the specific coordinates;
FIG. 5 is an illustration of a process for exchanging IDs for users in a confusion zone;
fig. 6 is a simplified process for calculating information entropy and privacy preserving functions.
Detailed Description
The invention provides a measurement method based on track privacy protection, which mainly processes that a target is a GPS track data set, wherein the track is an ordered set formed by a group of GPS data points, reflects the position change of a user, and can further observe the motion rule of a device carrier, so that the measurement method can be regarded as very sensitive privacy information. The track includes time, latitude, longitude. Each track set is associated with each user ID, and the two track privacy preserving metrics are capable of effectively evaluating the strengths of the two common privacy preserving methods and analyzing the degree of distortion and the effectiveness of the data before and after being protected.
Referring to fig. 1, the track privacy protection-based measurement method of the present invention comprises the following specific steps:
A. in the privacy protection method based on generalization, a distance measurement index facing to the track is provided:
s1, normalizing the tracks and calculating the intersection degree;
s101, preprocessing track similarity measurement, wherein the track is defined as follows:
T={(t 1 ,x 1 ,y 1 ),(t 2 ,x 2 ,y 2 ),...,(t n ,x n ,y n )}
wherein n.gtoreq.1, represents a sampling time series, and t i ≤t j
Figure BDA0002390696650000091
x i ,y i Represent the firstCoordinates of the track points at the moment i;
s102, for any two tracks T i And T j Judging the intersection, wherein the intersection of the track intersection expression refers to the intersection in time rather than space; assuming 2 tracks (as shown in figure 2 b),
Figure BDA0002390696650000092
Figure BDA0002390696650000093
FIG. 2 is a normalized two trace result, shown below:
T 1 ={(1,5,3),(2,6,2),(4,4,6),(5,4,3),(6,2,4),(9,5,7)}
T 2 ={(2,3,0),(2,3,1),(5,7,7),(7,5,3),(8,4,6),(9,0,2)}
for any two tracks T i And T j If (if)
Figure BDA0002390696650000094
The two tracks intersect in time; the parameter I is set, and is defined as follows:
Figure BDA0002390696650000095
Figure BDA0002390696650000096
the intersection judgment is carried out to calculate that the intersection is p=87.5% -intersection.
Specific operation as shown in fig. 3, definition p represents the degree of intersection of two tracks, and there is,
Figure BDA0002390696650000097
s2, generating a synchronous track, and two tracks T i And T j The trace has p%-intersection (p)>0) The degree, change two tracks into the synchronous track, then need to carry on the point-supplementing operation;
the method comprises the following specific steps:
s201, two tracks T are arranged i And T j The trajectories p intersect if in an intersection time interval
Figure BDA0002390696650000101
(n > m), track T i And T j With the same number of location points and the same corresponding time, T i And T j Is a synchronous trajectory (the obtainable intersection degree p is 100%). If T i And T j Only intersecting the time, but adding a synchronization point if the positions are not synchronous (the number of the positions is inconsistent or the corresponding time of the positions is not the same), so that the two tracks meet the synchronization condition;
s202, selecting T for simplifying operation i Make its start time earlier than T j I.e.
Figure BDA0002390696650000102
Two indicated coordinates a and b are located at the first of each track, i.e. a=b=1;
s203, indicating that the smaller of the coordinates a, b is a, a is T i And move upward and backward until a and b meet, the meeting time is recorded, at which time a=b=2. When they are at the same time, they move backward together. In the moving process, any coordinate reaches the tail end of the track, and the synchronous operation is ended immediately;
s204, in the moving process, a and b meet the next sampling point, if set
Figure BDA0002390696650000103
Then at track T i Adding a time of +.>
Figure BDA0002390696650000104
Is a data point of (c). The position information is composed of->
Figure BDA0002390696650000105
And->
Figure BDA0002390696650000106
To simplify the calculation, we default that the user makes uniform linear motion between the two. As shown in fig. 3, the red sampling point with coordinates (5, 4) is a newly added synchronization point. After the addition is completed, the coordinate a points to the newly added data point, and then returns to S203; />
Figure BDA0002390696650000107
And so on.
S3, track distance calculation
Referring to fig. 4, fig. 4 shows the shape distances of the tracks, three tracks are shown, and the tracks are assumed to be all in the x-t plane (the y-axis coordinates refer to the values of the right table), if only the space distances in the conventional sense are considered, the distance between the track 2 and the track 3 is smaller than the distance between the track 1 and the track 2. But from a shape point of view track 1 is significantly closer to track 2 than track 3. Therefore, in calculating the track distance, measuring similarity, the track form factor needs to be taken into consideration.
S301, a time axis of the synchronous track is composed of time intervals with different lengths, and the sum of squares of differences of position change amounts of two users in the x and y directions in the time intervals reflects shape distance change in a short time. Calculating the square root of the sum of all interval variations, dividing by the time intersection of the tracks, and calculating the track shape distance d shape The following are provided:
Figure BDA0002390696650000111
wherein t is k For a specific time at time k,
Figure BDA0002390696650000112
for the change in the x-coordinate of the current point from (k-1) to time k, Δt k For the time change from (k-1) to time k, < >>
Figure BDA0002390696650000113
For (k-1) to the moment k the current point is at the y-coordinateThe variation of the upper part, p is the intersection degree of the tracks;
in connection with the simulation data in fig. 4, it is possible to obtain:
d shape (T 1 ,T 2 )=0
Figure BDA0002390696650000114
after the track is synchronized, the sampling points on different tracks are consistent, namely
Figure BDA0002390696650000115
/>
The track shape distance d shape The deformation is as follows:
Figure BDA0002390696650000116
wherein t is k For a specific time at time k,
Figure BDA0002390696650000117
for the change in the x-coordinate of the current point from (k-1) to time k, Δt k For the time change from (k-1) to time k, < >>
Figure BDA0002390696650000118
The change quantity of the current point from (k-1) to the moment k on the y coordinate is obtained, and p is the intersection degree of the tracks;
s302, in the overlapping time interval of the synchronous track, firstly obtaining the sum of the squares of the distances of corresponding points of each time, then taking the mean value of the squares, dividing the mean value by the intersection degree of the track, and calculating the track position distance d as follows:
Figure BDA0002390696650000119
wherein R is 6371 km of the earth radius, and d and R are consistent. t is t s For a specific time at the time s,
Figure BDA0002390696650000121
and->
Figure BDA0002390696650000122
The calculation mode of the track position distance is mainly that the coordinates on track points are based on longitude and latitude, and the position distance and the longitude and latitude are closely related, so that the geographical distance between the two points cannot be directly used for calculating the plane Euclidean distance by longitude and latitude, but the spherical surface distance can be calculated.
The distances d of the tracks 1, 2 and 3 are calculated from the above loc (T 1 ,T 2 )、d loc (T 2 ,T 3 ) The method comprises the following steps:
d loc (T 1 ,T 2 )=0.5
Figure BDA0002390696650000123
s303, the track distance is obtained by weighting and summing the track shape distance and the position distance, alpha is a weight adjustment coefficient, and the track distance d (T) i ,T j ) The following are provided:
d(T i ,T j )=αd shape (T i ,T j )+(1-α)d loc (T i ,T j )
track distance d (T) i ,T j ) Reflecting the space-time distance between tracks and also reflecting the similarity.
Typically α=0.5, from which:
d(T i ,T j )=αd shape (T i ,T j )+(1-α)d loc (T i ,T j )=0.5×0+0.5×0.5=0.25
Figure BDA0002390696650000124
indicating that the distance between the tracks 1, 2 is smaller than the distance between 2, 3, consistent with the observed results,
B. in a privacy protection method based on a confusion zone, providing information entropy indexes facing to distribution probability:
s4, calculating a privacy protection probability matrix and a probability vector of the confusion zone;
the user has identifiers with their own identity before entering and exiting the confusion zone S k The previous user ID set is I (S k )={i 1 ,i 2 ,...,i n ) The user ID set exiting the confusion zone is O (S k )={o 1 ,o 2 ,...,o n ) The method comprises the steps of carrying out a first treatment on the surface of the At any moment, each user has only one ID, so that the user IDs belonging to two sets have a one-to-one mapping relationship; the attacker reasoning the relation between the in-out position and the out-in position and expressing the relation by using conditional probability; the probability matrix represents the association between the two sets of identifiers.
Assuming that the number of users with ID substitution in the confusion zone is n, the size of the access matrix of the substitution ID is n×n, and the probability matrix is:
Figure BDA0002390696650000131
wherein the element p (u j |o i ) Indicating user IDu entering confusion zone j And user o leaving the confusion zone i I.e. the probability vectors that two ids are the same user; the j-th line element is the same user IDu entering the confusion zone j
Figure BDA0002390696650000132
The element of column k is the same user IDo that leaves the confusion zone k
Figure BDA0002390696650000133
Referring to FIG. 5, to demonstrate the operation of exchanging user IDs, ID is i 1 、i 2 Is entered into the confusion zone S 1 The users after leaving are O respectively 1 ,O 2 And enter into the confusion zone S respectively 2 ,S 3 . An attacker combines real time or place information with a model, such as S 2 Is a dessert store S 3 Is a hospital and the attacker is through the user i 1 Is informed that he is not loved to eat the dessert. Then i 1 Exit S 1 The new pseudonym after is o 2 The probability of (a) is very large (e.g., p (u) 1 |o 2 )=0.8)。
Under the condition that an attacker does not have background knowledge, the ID corresponding to the user leaving the confusion zone is unknown, and the probability matrix is
Figure BDA0002390696650000134
To enter the confusion zone S 1 Before, if user i cannot be determined 1 And identifier u 1 Can be represented by a probability vector: />
Figure BDA0002390696650000135
In the initial state, the value is +.>
Figure BDA0002390696650000136
After leaving the confusion zone, o 1 The probability vector value of (2) is +.>
Figure BDA0002390696650000137
S5, calculating track information entropy
S501, after all ID replacement operations are completed, each track o is obtained i Have corresponding probability vectors, as shown in FIG. 6, where o 1 The entropy is:
Figure BDA0002390696650000141
s502, if the user ID replacement operation is not performed, wherein a certain probability vector value is 1, other values are 0, and the calculated entropy value is 0, which indicates that privacy protection is worst; if the user ID replacement operation is ingenious enough, the relevance between the entering user and the leaving user cannot be judged, and all probability component values in the corresponding relationship are the same
Figure BDA0002390696650000142
When the association between the track leaving the confusion zone and the real track entering the confusion zone cannot be guessed, the information entropy can reach the maximum value:
Figure BDA0002390696650000143
wherein u is j To enter the user ID of the confusion zone, o i To leave the user ID of the confusion zone, p is the probability, p (u j |o i ) User u j And user o i I.e. the probability vector that two IDs are the same person, n is the number of users in the confusion zone where ID permutation occurs.
S6, calculating privacy protection function
The privacy protection function G (o i ) To measure, privacy preserving function G (o i ) The higher the value, i.e. the higher the degree of privacy protection, the lower the privacy leakage rate, the weaker the attacker's ability, the calculation of the privacy protection function G (o i ):
Figure BDA0002390696650000144
The invention mainly aims at track generalization and confusion privacy protection methods in a track big data release scene, and provides different universal measures to scientifically evaluate the privacy leakage risk degree of a user.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The experiment is based on taxi GPS movement track data of san francisco and Microsoft institute GeoLife GPS Trajectories, the development environment is IntelliJ IDEA 15 for OS X, and track and measurement data of multiple drivers and about 1000 users under the two privacy protection methods are calculated. The experimental result shows that the measurement value can objectively quantify the deformation degree and the association degree of the track data.
In summary, the invention mainly aims at track generalization and confusion privacy protection methods in a track big data release scene, provides different universal measures to scientifically evaluate the privacy leakage risk degree of a user, and has good adaptability in various scenes. The two measurement indexes are based on different methods, so that the track privacy protection degree of the mobile user is quantified, and the privacy security degree and the privacy disclosure degree of the user are clearly reflected.
The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (7)

1. The measurement method based on track privacy protection is characterized by comprising a privacy protection method based on generalization and a privacy protection method based on confusion zone, wherein in the privacy protection method based on generalization, a track-oriented distance measurement index is provided, the intersection of tracks is judged, and the intersection degree is calculated; generating a synchronous track for the intersected tracks; performing distance measurement and similarity calculation on the synchronous track; in a privacy protection method based on a confusion zone, providing an information entropy index facing to distribution probability, and calculating a probability vector for a track; calculating information entropy for the probability vector; the privacy protection function is calculated for the information entropy, the measurement based on track privacy protection is completed, the strength of the privacy protection method can be effectively evaluated, and the distortion degree and the effectiveness of the data before and after the data are protected can be analyzed;
in the privacy protection method based on generalization, the generation of the synchronous track for the intersecting track is specifically as follows:
s201, two tracks T are arranged i And T j The trajectories p intersect if in an intersection time interval
Figure FDA0004051615360000011
n > m, track T i And T j With the same number of location points and the same corresponding time, T i And T j Is a synchronous track; if T i And T j Adding a synchronization point to enable the two tracks to meet the synchronization condition if only the time is intersected but not synchronized;
s202, selecting T i Start time earlier than T j I.e.
Figure FDA0004051615360000012
Two indicated coordinates a and b are located at the first of each track, i.e. a=b=1;
s203, a is at T i The method comprises the steps of pushing upwards and backwards until a and b meet, recording meeting time, wherein a=b=2, moving backwards simultaneously when the time is the same, and ending synchronous operation immediately when any coordinate reaches the tail end of the track in the moving process;
s204, in the moving process, a and b meet the next sampling point, if set
Figure FDA0004051615360000013
In track T i Add a time of
Figure FDA0004051615360000014
Is a data point of (2); the position information is composed of->
Figure FDA0004051615360000015
And->
Figure FDA0004051615360000016
After determining that the addition is completed, the coordinate a points to the newly added data point, and then returning to S203;
in the privacy protection method based on generalization, the distance measurement and similarity calculation on the synchronous track are specifically as follows:
s301, calculating the square root of the sum of all interval variation amounts, dividing the square root by the time intersection degree of the track, and calculating the track shape distance d shape Track shape distance d shape The deformation is as follows:
Figure FDA0004051615360000017
wherein t is k For a specific time at time k,
Figure FDA0004051615360000018
for the change in the x-coordinate of the current point from (k-1) to time k, Δt k For the time change from (k-1) to time k, < >>
Figure FDA0004051615360000021
The change quantity of the current point from (k-1) to the moment k on the y coordinate is obtained, and p is the intersection degree of the tracks;
the track position distance d is as follows:
Figure FDA0004051615360000022
wherein R is the earth radius of 6371 km, t s For a specific time at the time s,
Figure FDA0004051615360000023
and->
Figure FDA0004051615360000024
The x and y coordinates of the two tracks at the current moment i and j respectively;
s302, in the overlapping time interval of the synchronous track, firstly obtaining the sum of the squares of the distances of corresponding points of each time, then taking the mean value of the squares, dividing the mean value by the intersection degree of the track, and calculating the track position distance d loc (T i ,T j );
S303, the track distance is obtained by weighting and summing the track shape distance and the position distance, alpha is a weight adjustment coefficient, and the track distance d (T) i ,T j ) The following are provided:
d(T i ,T j )=αd shape (T i ,T j )+(1-α)d loc (T i ,T j )。
2. the track privacy protection-based measurement method according to claim 1, wherein in the generalization-based privacy protection method, the intersection of the tracks is judged, and the degree of the intersection is calculated specifically as follows:
s101, preprocessing track similarity measurement, wherein a track T is defined as follows:
T={(t 1 ,x 1 ,y 1 ),(t 2 ,x 2 ,y 2 ),...,(t n ,x n ,y n )}
wherein n.gtoreq.1, represents a sampling time series, and t i ≤t j
Figure FDA0004051615360000025
x i ,y i Representing coordinates of the track point at the i-th moment;
s102, for any two tracks T i And T j Judging intersection if
Figure FDA0004051615360000026
The two tracks intersect in time and the degree of intersection p of the two tracks is calculated.
3. The track privacy protection-based measurement method according to claim 2, wherein the intersection degree p of two tracks is:
Figure FDA0004051615360000038
4. the track-based privacy preserving method of claim 1, wherein in the track-based privacy preserving method, the track-computing probability vector is set to enter the confusion zone S k The previous user ID set is I (S k )={i 1 ,i 2 ,...,i n ) The user ID set exiting the confusion zone is O (S k )={o 1 ,o 2 ,...,o n ) The method comprises the steps of carrying out a first treatment on the surface of the At any moment, each user has only one ID, so that the user IDs belonging to two sets have a one-to-one mapping relationship; the attacker reasoning the relation between the in-out position and the out-in position and expressing the relation by using conditional probability; the probability matrix represents the association between the two sets of identifiers.
5. The track privacy protection based measurement method according to claim 4, wherein if the number of users with ID substitution in the confusion zone is n, the size of the entry/exit matrix of the substitution ID is n×n, and the probability matrix is:
Figure FDA0004051615360000032
wherein the element p (u j |o i ) Indicating user IDu entering confusion zone j And user o leaving the confusion zone i I.e. the probability vectors that two ids are the same user; the j-th line element is the same user IDu entering the confusion zone j
Figure FDA0004051615360000033
Figure FDA0004051615360000034
The element of column k is the same user IDo that leaves the confusion zone k ,/>
Figure FDA0004051615360000035
6. The track-based privacy preserving method as claimed in claim 1, wherein in the confusion zone-based privacy preserving method, each track o is obtained after all ID substitution operations are completed in calculating information entropy for probability vectors i The corresponding probability vectors are provided, and the obtained track information entropy is as follows:
Figure FDA0004051615360000036
the maximum value of the information entropy is as follows:
Figure FDA0004051615360000037
wherein u is j To enter the user ID of the confusion zone, o i To leave the user ID of the confusion zone, p is the probability, p (u j |o i ) User u j And user o i I.e. the probability vector that two IDs are the same person, n is the number of users in the confusion zone where ID permutation occurs.
7. The track-based privacy preserving metric method of claim 1, wherein in the confusion zone-based privacy preserving method, the privacy preserving function G (o i ) The method comprises the following steps:
Figure FDA0004051615360000041
wherein H (o) i ) For track information entropy, H max (o i ) Is the maximum value of information entropy.
CN202010113193.7A 2020-02-24 2020-02-24 Measurement method based on track privacy protection Active CN111400747B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010113193.7A CN111400747B (en) 2020-02-24 2020-02-24 Measurement method based on track privacy protection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010113193.7A CN111400747B (en) 2020-02-24 2020-02-24 Measurement method based on track privacy protection

Publications (2)

Publication Number Publication Date
CN111400747A CN111400747A (en) 2020-07-10
CN111400747B true CN111400747B (en) 2023-04-28

Family

ID=71428520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010113193.7A Active CN111400747B (en) 2020-02-24 2020-02-24 Measurement method based on track privacy protection

Country Status (1)

Country Link
CN (1) CN111400747B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069532B (en) * 2020-07-22 2023-09-26 安徽工业大学 Track privacy protection method and device based on differential privacy
CN112613068B (en) * 2020-12-15 2024-03-08 国家超级计算深圳中心(深圳云计算中心) Multiple data confusion privacy protection method and system and storage medium
CN112883423B (en) * 2021-02-25 2023-02-17 吉林师范大学 Similarity-based k-anonymous privacy protection method for release track

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8248294B2 (en) * 2010-04-13 2012-08-21 The Boeing Company Method for protecting location privacy of air traffic communications
CN101895866B (en) * 2010-04-16 2012-11-21 华中师范大学 Method for measuring track privacy in location-based service
CN109379718A (en) * 2018-12-10 2019-02-22 南京理工大学 Complete anonymous method for secret protection based on continuous-query location-based service

Also Published As

Publication number Publication date
CN111400747A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111400747B (en) Measurement method based on track privacy protection
Mohamed et al. Accurate real-time map matching for challenging environments
Li et al. T-DesP: Destination prediction based on big trajectory data
Zhang et al. TICRec: A probabilistic framework to utilize temporal influence correlations for time-aware location recommendations
Zhang et al. On reliable task assignment for spatial crowdsourcing
Xu et al. A survey for mobility big data analytics for geolocation prediction
Do et al. The places of our lives: Visiting patterns and automatic labeling from longitudinal smartphone data
Huang et al. Robust localization algorithm based on the RSSI ranging scope
CN110414732B (en) Travel future trajectory prediction method and device, storage medium and electronic equipment
EP3471374B1 (en) Method and device for identifying type of geographic location at where user is located
EP3028105B1 (en) Inferring a current location based on a user location history
CN106102163B (en) WLAN fingerprint positioning method based on RSS linear correlation Yu secondary weighted centroid algorithm
CN108668249B (en) Indoor positioning method and device for mobile terminal
KR20150035745A (en) System, method and computer program for dynamic generation of a radio map
CN105554704A (en) Fake-locus-based location privacy protection method for use in recommendation system
Mohamed et al. Accurate and efficient map matching for challenging environments
Xu et al. Self-adapting multi-fingerprints joint indoor positioning algorithm in WLAN based on database of AP ID
CN104661306A (en) Passive positioning method and system for mobile terminal
CN104507097A (en) Semi-supervised training method based on WiFi (wireless fidelity) position fingerprints
CN112954594A (en) Wireless sensor network node positioning algorithm based on artificial bee colony
Jiang et al. Predicting human mobility based on location data modeled by Markov chains
Lin et al. Noise filtering, trajectory compression and trajectory segmentation on GPS data
CN109977324A (en) A kind of point of interest method for digging and system
Huang et al. STPR: a personalized next point-of-interest recommendation model with spatio-temporal effects based on purpose ranking
Wang et al. High-accuracy localization for indoor group users based on extended Kalman filter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant