CN111400747B - Measurement method based on track privacy protection - Google Patents
Measurement method based on track privacy protection Download PDFInfo
- Publication number
- CN111400747B CN111400747B CN202010113193.7A CN202010113193A CN111400747B CN 111400747 B CN111400747 B CN 111400747B CN 202010113193 A CN202010113193 A CN 202010113193A CN 111400747 B CN111400747 B CN 111400747B
- Authority
- CN
- China
- Prior art keywords
- track
- privacy protection
- time
- user
- tracks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000691 measurement method Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 75
- 239000013598 vector Substances 0.000 claims abstract description 24
- 238000005259 measurement Methods 0.000 claims abstract description 23
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 230000001360 synchronised effect Effects 0.000 claims description 27
- 230000008859 change Effects 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000006467 substitution reaction Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 235000021185 dessert Nutrition 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 235000016496 Panda oleosa Nutrition 0.000 description 1
- 240000000220 Panda oleosa Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Mathematical Optimization (AREA)
- Software Systems (AREA)
- Mathematical Analysis (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Algebra (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a measurement method based on track privacy protection, which provides two measurement methods for two common track privacy protection methods, quantifies the track privacy protection degree of a mobile user and clearly reflects the user privacy safety degree and the privacy disclosure degree. Aiming at a privacy protection method based on generalization, providing a distance measurement index facing to the track, judging the intersection of the track, and calculating the intersection degree; and performing distance measurement and similarity calculation on the tracks which finally form the synchronization; aiming at the privacy protection method based on the confusion zone, providing an information entropy index facing the distribution probability, and calculating a probability vector for the track; and calculating a privacy protection function according to the information entropy, and finishing measurement based on track privacy protection. The two measurement methods for protecting the track privacy can effectively evaluate the strength of two common privacy protection methods and analyze the distortion degree and the effectiveness of the data before and after being protected.
Description
Technical Field
The invention belongs to the technical field of track privacy safety of users under a position information service, and particularly relates to a measurement method based on track privacy protection.
Background
In recent years, the location data is used as an extension of the Internet of things tracking and positioning service in the real world, provides massive basic data, and detonates the related novel IoE (Internet over Everything) applications such as BYOD mobile office, everything tracing, virtual social interaction, AR navigation and the like. Successive positions constitute trajectory data (where the trajectory does not contain animal activity and natural phenomena), which are essentially time-sequential samples and snapshots of the trajectory, both based on GPS positioning and time stamping, are sources of data for participatory group awareness in city computation. The method guides social life and production practice through the big data of the position and the track, and has strong practical significance. The location-based service brings a rich personalized experience to the user, also has a latent privacy revealing crisis, and various terminal applications and corresponding sensors such as cameras, microphones, heart rate meters and smart bracelets can excessively acquire user information. The location privacy protection starts from life social service such as location check-in, location inquiry and navigation, and the like, and focuses on protecting the user identity, the current location and the time-space dimension of the position extended by the inquiry content track, and the data relevance and the time-space feature of the location privacy protection enable information such as user behavior features, interest and hobbies and social habits to be more easily mined, and the track privacy function needs stronger protection measures.
Track privacy protection is based on a query request scene and a track big data release scene in real-time location service respectively, a plurality of privacy protection methods have achieved successful application effects at the present stage, and common privacy protection methods comprise generalization and confusion. The track generalization method is that the space K of the track is anonymous, and the track of which a plurality of starting points are close and the content of a travel route in a certain space area is generalized into a certain space grid. The track generalization method sacrifices precision and compromises indistinguishability of multiple tracks, and its variants also include m-invariance,diversity, p-sensitivity; the confusion zone method of the track divides the moving area into a mixed area and an application area, and renaming, blurring and dynamic replacement in the mixed area can enhance the effectiveness of kana replacement. The confusion zone method is that each track, i.e. each user has a user ID when using the location service, and the ID is either set by the user or allocated by the system. The ID is unique and any one of the identifiers refers to a user that is unique. When a user enters the confusion zone, all conditions are met (there are other users in the zone at this time, their number of exchanges ID has not reached the upper limit, etc.), they will perform the ID replacement operation. In the whole movement process of each user, multiple ID substitutions can occur, so that an attacker can hardly link the positions of the users before and after a specific time, and the track information of the users is prevented from being maliciously acquired, thereby realizing track privacy protection.
The research on privacy measurement methods in location services at home and abroad mainly comprises two types of methods, namely a specific measurement method and a moderate measurement method which are associated with a privacy protection method. The specific measurement method associated with the privacy protection method is characterized in that different privacy protection methods have respective risk evaluation systems, and the measurement is different from algorithm to algorithm. The privacy measurement method has the limitation that only one aspect of privacy information leakage risk, privacy protection method strength, cost or data loss, service accuracy reduction and the like is measured, and a user cannot obtain a privacy risk objective index.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a measurement method based on track privacy protection, which quantifies the track privacy protection degree of a mobile user and clearly reflects the user privacy safety degree and the privacy disclosure degree.
The invention adopts the following technical scheme:
a measurement method based on track privacy protection comprises a privacy protection method based on generalization and a privacy protection method based on confusion zone, wherein in the privacy protection method based on generalization, a distance measurement index facing to tracks is provided, the intersection of the tracks is judged, and the intersection degree is calculated; generating a synchronous track for the intersected tracks; performing distance measurement and similarity calculation on the synchronous track; in a privacy protection method based on a confusion zone, providing an information entropy index facing to distribution probability, and calculating a probability vector for a track; calculating information entropy for the probability vector; and calculating a privacy protection function for the information entropy to finish measurement based on track privacy protection, and effectively evaluating the strength of the privacy protection method and analyzing the distortion degree and effectiveness of the data before and after protection.
Specifically, in the privacy protection method based on generalization, the intersection of the tracks is judged, and the calculated intersection degree is specifically:
s101, preprocessing track similarity measurement, wherein a track T is defined as follows:
T={(t 1 ,x 1 ,y 1 ),(t 2 ,x 2 ,y 2 ),...,(t n ,x n ,y n )}
wherein n is equal to or greater than 1, and represents a sampling time sequencex i ,y i Representing coordinates of the track point at the i-th moment;
s102, for any two tracks T i And T j Judging intersection ifThen the two tracks are intersected in time, and the intersection degree p of the two tracks is calculated;
further, the intersection degree p of the two tracks is:
specifically, in the privacy protection method based on generalization, the generation of the synchronous track for the intersecting track is specifically:
s201, two tracks T are arranged i And T j The trajectories p intersect if in an intersection time intervaln > m, track T i And T j With the same number of location points and the same corresponding time, T i And T j Is a synchronous track; if T i And T j Adding a synchronization point to enable the two tracks to meet the synchronization condition if only the time is intersected but not synchronized;
s202, selecting T i Start time earlier than T j I.e.Two indicated coordinates a and b are located at the first of each track, i.e. a=b=1;
s203, a is at T i The method comprises the steps of pushing upwards and backwards until a and b meet, recording meeting time, wherein a=b=2, moving backwards simultaneously when the time is the same, and ending synchronous operation immediately when any coordinate reaches the tail end of the track in the moving process;
s204, in the moving process, a and b meet the next sampling point, if setIn track T i Adding a time of +.>Is a data point of (2); the position information is composed of->And->After determining that the addition is completed, the coordinate a points to the newly added data point, and then returns to S203.
Specifically, in the privacy protection method based on generalization, the distance measurement and similarity calculation on the synchronous track specifically includes:
s301, calculating the square root of the sum of all interval variation amounts, dividing the square root by the time intersection degree of the track, and calculating the track shape distance d shape ;
S302, in the overlapping time interval of the synchronous track, firstly obtaining the sum of the squares of the distances of corresponding points of each time, then taking the mean value of the squares, dividing the mean value by the intersection degree of the track, and calculating the track position distance d loc (T i ,T j );
S303, the track distance is obtained by weighting and summing the track shape distance and the position distance, alpha is a weight adjustment coefficient, and the track distance d (T) i ,T j ) The following are provided:
d(T i ,T j )=αd shape (T i ,T j )+(1-α)d loc (T i ,T j )
further, the track shape distance d shape The deformation is as follows:
wherein t is k For a specific time at time k,from (k-1) to kThe change quantity of the current point at moment in the x coordinate, delta t k For the time change from (k-1) to time k, < >>The change quantity of the current point from (k-1) to the moment k on the y coordinate is obtained, and p is the intersection degree of the tracks;
the track position distance d is as follows:
wherein R is the earth radius of 6371 km, t s For a specific time at the time s,and->The x and y coordinates of the two tracks at the current moment i and j are respectively.
Specifically, in the privacy protection method based on the confusion zone, the probability vector is calculated for the track, and the probability vector is set to enter the confusion zone S k The previous user ID set is I (S k )={i 1 ,i 2 ,...,i n ) The user ID set exiting the confusion zone is O (S k )={o 1 ,o 2 ,...,o n ) The method comprises the steps of carrying out a first treatment on the surface of the At any moment, each user has only one ID, so that the user IDs belonging to two sets have a one-to-one mapping relationship; the attacker reasoning the relation between the in-out position and the out-in position and expressing the relation by using conditional probability; the probability matrix represents the association between the two sets of identifiers.
Further, if the number of users whose IDs are replaced in the confusion area is n, the entry/exit matrix size of the replaced IDs is n×n.
Wherein the element p (u j |o i ) Indicating user IDu entering confusion zone j And user o leaving the confusion zone i I.e. the probability vectors that two ids are the same user; the j-th line element is the same user IDu entering the confusion zone j ,The element of column k is the same user IDo that leaves the confusion zone k ,/>
Specifically, in the privacy protection method based on the confusion zone, each track o is obtained after all ID replacement operations are completed in the information entropy calculated on the probability vector i The corresponding probability vectors are provided, and the obtained track information entropy is as follows:
the maximum value of the information entropy is as follows:
wherein u is j To enter the user ID of the confusion zone, o i To leave the user ID of the confusion zone, p is the probability, p (u j |o i ) User u j And user o i I.e. the probability vector that two IDs are the same person, n is the number of users in the confusion zone where ID permutation occurs.
Specifically, in the privacy protection method based on the confusion zone, the privacy protection function G (o i ) The method comprises the following steps:
wherein H (o) i ) For track information entropy, H max (o i ) Is the maximum value of information entropy.
Compared with the prior art, the invention has at least the following beneficial effects:
the invention discloses a measurement method based on track privacy protection, which aims at providing two kinds of measurement indexes for two different track privacy protection methods: in a privacy protection method based on generalization, providing a distance measurement index facing to a track; in the privacy protection method based on the confusion zone, information entropy indexes facing the distribution probability are provided. The two different indexes fully consider the space time sequence data characteristics of the track, and on one hand, the disturbance of the track data based on the position before and after privacy protection and the influence of the disturbance on the privacy protection are measured; on one hand, the distribution and relevance of data are measured from the aspects of attack and risk leakage. Based on the user position and track privacy protection method in the existing position service, the invention expands and analyzes from the service platform perspective, and provides scientific and accurate privacy measurement quantization indexes for two typical track privacy protection methods.
Further, for the track, the surface is seen, the track definition is normalized, and for any two tracks T i And T j Determining whether to intersect in time gives formalized criteria and defines the degree of intersection.
Further, the operation of synchronizing the tracks provides reference point coordinates for calculating the track shape and position distance, and the synchronous tracks need to traverse the tracks simultaneously, and sampling points are inserted according to whether the time is consistent or not.
Further, the time axis of the synchronous track is divided into small time intervals, and each small time interval calculates the square sum of the differences of the position variation amounts of the user on the two tracks in the x and y directions of the coordinate axes to reflect the shape distance variation in a short time, and the calculation represents the similarity of the tracks, as shown in fig. 3. And further obtaining the sum of squares of the distances of the corresponding points between the cells, squaring to obtain a mean value, dividing the mean value by the intersection degree of the tracks to obtain the track position distance, and taking the final track distortion as a mixed value of the shape distance and the position distance.
Further, probability vectors are used for describing the association degree of users entering and leaving the confusion zone and a certain track, and probability matrixes are used for representing the association between two groups of identifiers.
Further, the information entropy is set to judge the confusion degree of the system, and the larger the entropy value is, the more stable the system is, and the better the privacy protection effect is;
further, a privacy protection function is defined based on the information gain, and the privacy protection function G (o i ) The higher the value, i.e. the higher the degree of privacy protection, the lower the privacy disclosure rate and the relatively weaker the attacker's ability.
In summary, the invention mainly aims at two types of privacy protection methods of track generalization and confusion in a track big data release scene, and provides two different metrics comprising two different types of indexes of distance metric indexes and information entropy indexes. The two indexes provided by the invention provide universal measurement of privacy protection level, have good adaptability in various scenes, quantify the track privacy protection degree of the mobile user, and clearly reflect the privacy security degree and the privacy risk disclosure degree of the user through scientific evaluation.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a flow chart of a metrology method of the present invention;
FIG. 2 is a timing diagram of a trace and an intersection trace, wherein (a) is a timing diagram of original data of the trace, coordinates on points indicate longitude and latitude of the trace point, and the direction in which the trace extends is T in the timing direction (b) 1 And T 2 As an intersecting track, two tracks are overlapped in time sequence according to the definition of the intersecting track;
FIG. 3 is a process diagram of a synchronous track, wherein (a) the selected arrow represents two indication coordinates at the head of each track, respectively, and (b) the indication coordinates are shifted backward on the two tracks, respectively, until the second sampling point, due to T 1 No third sample point is on the trace, thus indicating that the coordinates move to a fourth sample point, (c) represents a time according to the fourth sample point; t is as follows 1 The front and back positions of the sampling point are at T 1 Adding a third sampling point, after the addition is completed, the coordinate a points to the newly added point, and (d) is the third pointContinuing to synchronize the fourth point after the completion of the insertion and synchronization, (e) is performed on the track T 2 Adding a fourth synchronization point, wherein (f) after the third synchronization point and the fourth synchronization point are completed, the track continues to move backwards;
FIG. 4 is a graph of the shape-distance similarity of two tracks and the x-t plane coordinates of two tracks, wherein (a) is the coordinates of three tracks and (b) is the specific coordinates;
FIG. 5 is an illustration of a process for exchanging IDs for users in a confusion zone;
fig. 6 is a simplified process for calculating information entropy and privacy preserving functions.
Detailed Description
The invention provides a measurement method based on track privacy protection, which mainly processes that a target is a GPS track data set, wherein the track is an ordered set formed by a group of GPS data points, reflects the position change of a user, and can further observe the motion rule of a device carrier, so that the measurement method can be regarded as very sensitive privacy information. The track includes time, latitude, longitude. Each track set is associated with each user ID, and the two track privacy preserving metrics are capable of effectively evaluating the strengths of the two common privacy preserving methods and analyzing the degree of distortion and the effectiveness of the data before and after being protected.
Referring to fig. 1, the track privacy protection-based measurement method of the present invention comprises the following specific steps:
A. in the privacy protection method based on generalization, a distance measurement index facing to the track is provided:
s1, normalizing the tracks and calculating the intersection degree;
s101, preprocessing track similarity measurement, wherein the track is defined as follows:
T={(t 1 ,x 1 ,y 1 ),(t 2 ,x 2 ,y 2 ),...,(t n ,x n ,y n )}
wherein n.gtoreq.1, represents a sampling time series, and t i ≤t j ,x i ,y i Represent the firstCoordinates of the track points at the moment i;
s102, for any two tracks T i And T j Judging the intersection, wherein the intersection of the track intersection expression refers to the intersection in time rather than space; assuming 2 tracks (as shown in figure 2 b),
FIG. 2 is a normalized two trace result, shown below:
T 1 ={(1,5,3),(2,6,2),(4,4,6),(5,4,3),(6,2,4),(9,5,7)}
T 2 ={(2,3,0),(2,3,1),(5,7,7),(7,5,3),(8,4,6),(9,0,2)}
for any two tracks T i And T j If (if)The two tracks intersect in time; the parameter I is set, and is defined as follows:
the intersection judgment is carried out to calculate that the intersection is p=87.5% -intersection.
Specific operation as shown in fig. 3, definition p represents the degree of intersection of two tracks, and there is,
s2, generating a synchronous track, and two tracks T i And T j The trace has p%-intersection (p)>0) The degree, change two tracks into the synchronous track, then need to carry on the point-supplementing operation;
the method comprises the following specific steps:
s201, two tracks T are arranged i And T j The trajectories p intersect if in an intersection time interval(n > m), track T i And T j With the same number of location points and the same corresponding time, T i And T j Is a synchronous trajectory (the obtainable intersection degree p is 100%). If T i And T j Only intersecting the time, but adding a synchronization point if the positions are not synchronous (the number of the positions is inconsistent or the corresponding time of the positions is not the same), so that the two tracks meet the synchronization condition;
s202, selecting T for simplifying operation i Make its start time earlier than T j I.e.Two indicated coordinates a and b are located at the first of each track, i.e. a=b=1;
s203, indicating that the smaller of the coordinates a, b is a, a is T i And move upward and backward until a and b meet, the meeting time is recorded, at which time a=b=2. When they are at the same time, they move backward together. In the moving process, any coordinate reaches the tail end of the track, and the synchronous operation is ended immediately;
s204, in the moving process, a and b meet the next sampling point, if setThen at track T i Adding a time of +.>Is a data point of (c). The position information is composed of->And->To simplify the calculation, we default that the user makes uniform linear motion between the two. As shown in fig. 3, the red sampling point with coordinates (5, 4) is a newly added synchronization point. After the addition is completed, the coordinate a points to the newly added data point, and then returns to S203; />And so on.
S3, track distance calculation
Referring to fig. 4, fig. 4 shows the shape distances of the tracks, three tracks are shown, and the tracks are assumed to be all in the x-t plane (the y-axis coordinates refer to the values of the right table), if only the space distances in the conventional sense are considered, the distance between the track 2 and the track 3 is smaller than the distance between the track 1 and the track 2. But from a shape point of view track 1 is significantly closer to track 2 than track 3. Therefore, in calculating the track distance, measuring similarity, the track form factor needs to be taken into consideration.
S301, a time axis of the synchronous track is composed of time intervals with different lengths, and the sum of squares of differences of position change amounts of two users in the x and y directions in the time intervals reflects shape distance change in a short time. Calculating the square root of the sum of all interval variations, dividing by the time intersection of the tracks, and calculating the track shape distance d shape The following are provided:
wherein t is k For a specific time at time k,for the change in the x-coordinate of the current point from (k-1) to time k, Δt k For the time change from (k-1) to time k, < >>For (k-1) to the moment k the current point is at the y-coordinateThe variation of the upper part, p is the intersection degree of the tracks;
in connection with the simulation data in fig. 4, it is possible to obtain:
d shape (T 1 ,T 2 )=0
The track shape distance d shape The deformation is as follows:
wherein t is k For a specific time at time k,for the change in the x-coordinate of the current point from (k-1) to time k, Δt k For the time change from (k-1) to time k, < >>The change quantity of the current point from (k-1) to the moment k on the y coordinate is obtained, and p is the intersection degree of the tracks;
s302, in the overlapping time interval of the synchronous track, firstly obtaining the sum of the squares of the distances of corresponding points of each time, then taking the mean value of the squares, dividing the mean value by the intersection degree of the track, and calculating the track position distance d as follows:
wherein R is 6371 km of the earth radius, and d and R are consistent. t is t s For a specific time at the time s,and->The calculation mode of the track position distance is mainly that the coordinates on track points are based on longitude and latitude, and the position distance and the longitude and latitude are closely related, so that the geographical distance between the two points cannot be directly used for calculating the plane Euclidean distance by longitude and latitude, but the spherical surface distance can be calculated.
The distances d of the tracks 1, 2 and 3 are calculated from the above loc (T 1 ,T 2 )、d loc (T 2 ,T 3 ) The method comprises the following steps:
d loc (T 1 ,T 2 )=0.5
s303, the track distance is obtained by weighting and summing the track shape distance and the position distance, alpha is a weight adjustment coefficient, and the track distance d (T) i ,T j ) The following are provided:
d(T i ,T j )=αd shape (T i ,T j )+(1-α)d loc (T i ,T j )
track distance d (T) i ,T j ) Reflecting the space-time distance between tracks and also reflecting the similarity.
Typically α=0.5, from which:
d(T i ,T j )=αd shape (T i ,T j )+(1-α)d loc (T i ,T j )=0.5×0+0.5×0.5=0.25
indicating that the distance between the tracks 1, 2 is smaller than the distance between 2, 3, consistent with the observed results,
B. in a privacy protection method based on a confusion zone, providing information entropy indexes facing to distribution probability:
s4, calculating a privacy protection probability matrix and a probability vector of the confusion zone;
the user has identifiers with their own identity before entering and exiting the confusion zone S k The previous user ID set is I (S k )={i 1 ,i 2 ,...,i n ) The user ID set exiting the confusion zone is O (S k )={o 1 ,o 2 ,...,o n ) The method comprises the steps of carrying out a first treatment on the surface of the At any moment, each user has only one ID, so that the user IDs belonging to two sets have a one-to-one mapping relationship; the attacker reasoning the relation between the in-out position and the out-in position and expressing the relation by using conditional probability; the probability matrix represents the association between the two sets of identifiers.
Assuming that the number of users with ID substitution in the confusion zone is n, the size of the access matrix of the substitution ID is n×n, and the probability matrix is:
wherein the element p (u j |o i ) Indicating user IDu entering confusion zone j And user o leaving the confusion zone i I.e. the probability vectors that two ids are the same user; the j-th line element is the same user IDu entering the confusion zone j ,The element of column k is the same user IDo that leaves the confusion zone k ,
Referring to FIG. 5, to demonstrate the operation of exchanging user IDs, ID is i 1 、i 2 Is entered into the confusion zone S 1 The users after leaving are O respectively 1 ,O 2 And enter into the confusion zone S respectively 2 ,S 3 . An attacker combines real time or place information with a model, such as S 2 Is a dessert store S 3 Is a hospital and the attacker is through the user i 1 Is informed that he is not loved to eat the dessert. Then i 1 Exit S 1 The new pseudonym after is o 2 The probability of (a) is very large (e.g., p (u) 1 |o 2 )=0.8)。
Under the condition that an attacker does not have background knowledge, the ID corresponding to the user leaving the confusion zone is unknown, and the probability matrix isTo enter the confusion zone S 1 Before, if user i cannot be determined 1 And identifier u 1 Can be represented by a probability vector: />In the initial state, the value is +.>After leaving the confusion zone, o 1 The probability vector value of (2) is +.>
S5, calculating track information entropy
S501, after all ID replacement operations are completed, each track o is obtained i Have corresponding probability vectors, as shown in FIG. 6, where o 1 The entropy is:
s502, if the user ID replacement operation is not performed, wherein a certain probability vector value is 1, other values are 0, and the calculated entropy value is 0, which indicates that privacy protection is worst; if the user ID replacement operation is ingenious enough, the relevance between the entering user and the leaving user cannot be judged, and all probability component values in the corresponding relationship are the sameWhen the association between the track leaving the confusion zone and the real track entering the confusion zone cannot be guessed, the information entropy can reach the maximum value:
wherein u is j To enter the user ID of the confusion zone, o i To leave the user ID of the confusion zone, p is the probability, p (u j |o i ) User u j And user o i I.e. the probability vector that two IDs are the same person, n is the number of users in the confusion zone where ID permutation occurs.
S6, calculating privacy protection function
The privacy protection function G (o i ) To measure, privacy preserving function G (o i ) The higher the value, i.e. the higher the degree of privacy protection, the lower the privacy leakage rate, the weaker the attacker's ability, the calculation of the privacy protection function G (o i ):
The invention mainly aims at track generalization and confusion privacy protection methods in a track big data release scene, and provides different universal measures to scientifically evaluate the privacy leakage risk degree of a user.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The experiment is based on taxi GPS movement track data of san francisco and Microsoft institute GeoLife GPS Trajectories, the development environment is IntelliJ IDEA 15 for OS X, and track and measurement data of multiple drivers and about 1000 users under the two privacy protection methods are calculated. The experimental result shows that the measurement value can objectively quantify the deformation degree and the association degree of the track data.
In summary, the invention mainly aims at track generalization and confusion privacy protection methods in a track big data release scene, provides different universal measures to scientifically evaluate the privacy leakage risk degree of a user, and has good adaptability in various scenes. The two measurement indexes are based on different methods, so that the track privacy protection degree of the mobile user is quantified, and the privacy security degree and the privacy disclosure degree of the user are clearly reflected.
The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.
Claims (7)
1. The measurement method based on track privacy protection is characterized by comprising a privacy protection method based on generalization and a privacy protection method based on confusion zone, wherein in the privacy protection method based on generalization, a track-oriented distance measurement index is provided, the intersection of tracks is judged, and the intersection degree is calculated; generating a synchronous track for the intersected tracks; performing distance measurement and similarity calculation on the synchronous track; in a privacy protection method based on a confusion zone, providing an information entropy index facing to distribution probability, and calculating a probability vector for a track; calculating information entropy for the probability vector; the privacy protection function is calculated for the information entropy, the measurement based on track privacy protection is completed, the strength of the privacy protection method can be effectively evaluated, and the distortion degree and the effectiveness of the data before and after the data are protected can be analyzed;
in the privacy protection method based on generalization, the generation of the synchronous track for the intersecting track is specifically as follows:
s201, two tracks T are arranged i And T j The trajectories p intersect if in an intersection time intervaln > m, track T i And T j With the same number of location points and the same corresponding time, T i And T j Is a synchronous track; if T i And T j Adding a synchronization point to enable the two tracks to meet the synchronization condition if only the time is intersected but not synchronized;
s202, selecting T i Start time earlier than T j I.e.Two indicated coordinates a and b are located at the first of each track, i.e. a=b=1;
s203, a is at T i The method comprises the steps of pushing upwards and backwards until a and b meet, recording meeting time, wherein a=b=2, moving backwards simultaneously when the time is the same, and ending synchronous operation immediately when any coordinate reaches the tail end of the track in the moving process;
s204, in the moving process, a and b meet the next sampling point, if setIn track T i Add a time ofIs a data point of (2); the position information is composed of->And->After determining that the addition is completed, the coordinate a points to the newly added data point, and then returning to S203;
in the privacy protection method based on generalization, the distance measurement and similarity calculation on the synchronous track are specifically as follows:
s301, calculating the square root of the sum of all interval variation amounts, dividing the square root by the time intersection degree of the track, and calculating the track shape distance d shape Track shape distance d shape The deformation is as follows:
wherein t is k For a specific time at time k,for the change in the x-coordinate of the current point from (k-1) to time k, Δt k For the time change from (k-1) to time k, < >>The change quantity of the current point from (k-1) to the moment k on the y coordinate is obtained, and p is the intersection degree of the tracks;
the track position distance d is as follows:
wherein R is the earth radius of 6371 km, t s For a specific time at the time s,and->The x and y coordinates of the two tracks at the current moment i and j respectively;
s302, in the overlapping time interval of the synchronous track, firstly obtaining the sum of the squares of the distances of corresponding points of each time, then taking the mean value of the squares, dividing the mean value by the intersection degree of the track, and calculating the track position distance d loc (T i ,T j );
S303, the track distance is obtained by weighting and summing the track shape distance and the position distance, alpha is a weight adjustment coefficient, and the track distance d (T) i ,T j ) The following are provided:
d(T i ,T j )=αd shape (T i ,T j )+(1-α)d loc (T i ,T j )。
2. the track privacy protection-based measurement method according to claim 1, wherein in the generalization-based privacy protection method, the intersection of the tracks is judged, and the degree of the intersection is calculated specifically as follows:
s101, preprocessing track similarity measurement, wherein a track T is defined as follows:
T={(t 1 ,x 1 ,y 1 ),(t 2 ,x 2 ,y 2 ),...,(t n ,x n ,y n )}
wherein n.gtoreq.1, represents a sampling time series, and t i ≤t j ,x i ,y i Representing coordinates of the track point at the i-th moment;
4. the track-based privacy preserving method of claim 1, wherein in the track-based privacy preserving method, the track-computing probability vector is set to enter the confusion zone S k The previous user ID set is I (S k )={i 1 ,i 2 ,...,i n ) The user ID set exiting the confusion zone is O (S k )={o 1 ,o 2 ,...,o n ) The method comprises the steps of carrying out a first treatment on the surface of the At any moment, each user has only one ID, so that the user IDs belonging to two sets have a one-to-one mapping relationship; the attacker reasoning the relation between the in-out position and the out-in position and expressing the relation by using conditional probability; the probability matrix represents the association between the two sets of identifiers.
5. The track privacy protection based measurement method according to claim 4, wherein if the number of users with ID substitution in the confusion zone is n, the size of the entry/exit matrix of the substitution ID is n×n, and the probability matrix is:
wherein the element p (u j |o i ) Indicating user IDu entering confusion zone j And user o leaving the confusion zone i I.e. the probability vectors that two ids are the same user; the j-th line element is the same user IDu entering the confusion zone j , The element of column k is the same user IDo that leaves the confusion zone k ,/>
6. The track-based privacy preserving method as claimed in claim 1, wherein in the confusion zone-based privacy preserving method, each track o is obtained after all ID substitution operations are completed in calculating information entropy for probability vectors i The corresponding probability vectors are provided, and the obtained track information entropy is as follows:
the maximum value of the information entropy is as follows:
wherein u is j To enter the user ID of the confusion zone, o i To leave the user ID of the confusion zone, p is the probability, p (u j |o i ) User u j And user o i I.e. the probability vector that two IDs are the same person, n is the number of users in the confusion zone where ID permutation occurs.
7. The track-based privacy preserving metric method of claim 1, wherein in the confusion zone-based privacy preserving method, the privacy preserving function G (o i ) The method comprises the following steps:
wherein H (o) i ) For track information entropy, H max (o i ) Is the maximum value of information entropy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010113193.7A CN111400747B (en) | 2020-02-24 | 2020-02-24 | Measurement method based on track privacy protection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010113193.7A CN111400747B (en) | 2020-02-24 | 2020-02-24 | Measurement method based on track privacy protection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111400747A CN111400747A (en) | 2020-07-10 |
CN111400747B true CN111400747B (en) | 2023-04-28 |
Family
ID=71428520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010113193.7A Active CN111400747B (en) | 2020-02-24 | 2020-02-24 | Measurement method based on track privacy protection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111400747B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112069532B (en) * | 2020-07-22 | 2023-09-26 | 安徽工业大学 | Track privacy protection method and device based on differential privacy |
CN112613068B (en) * | 2020-12-15 | 2024-03-08 | 国家超级计算深圳中心(深圳云计算中心) | Multiple data confusion privacy protection method and system and storage medium |
CN112883423B (en) * | 2021-02-25 | 2023-02-17 | 吉林师范大学 | Similarity-based k-anonymous privacy protection method for release track |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8248294B2 (en) * | 2010-04-13 | 2012-08-21 | The Boeing Company | Method for protecting location privacy of air traffic communications |
CN101895866B (en) * | 2010-04-16 | 2012-11-21 | 华中师范大学 | Method for measuring track privacy in location-based service |
CN109379718A (en) * | 2018-12-10 | 2019-02-22 | 南京理工大学 | Complete anonymous method for secret protection based on continuous-query location-based service |
-
2020
- 2020-02-24 CN CN202010113193.7A patent/CN111400747B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111400747A (en) | 2020-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111400747B (en) | Measurement method based on track privacy protection | |
Mohamed et al. | Accurate real-time map matching for challenging environments | |
Li et al. | T-DesP: Destination prediction based on big trajectory data | |
Zhang et al. | TICRec: A probabilistic framework to utilize temporal influence correlations for time-aware location recommendations | |
Zhang et al. | On reliable task assignment for spatial crowdsourcing | |
Xu et al. | A survey for mobility big data analytics for geolocation prediction | |
Do et al. | The places of our lives: Visiting patterns and automatic labeling from longitudinal smartphone data | |
Huang et al. | Robust localization algorithm based on the RSSI ranging scope | |
CN110414732B (en) | Travel future trajectory prediction method and device, storage medium and electronic equipment | |
EP3471374B1 (en) | Method and device for identifying type of geographic location at where user is located | |
EP3028105B1 (en) | Inferring a current location based on a user location history | |
CN106102163B (en) | WLAN fingerprint positioning method based on RSS linear correlation Yu secondary weighted centroid algorithm | |
CN108668249B (en) | Indoor positioning method and device for mobile terminal | |
KR20150035745A (en) | System, method and computer program for dynamic generation of a radio map | |
CN105554704A (en) | Fake-locus-based location privacy protection method for use in recommendation system | |
Mohamed et al. | Accurate and efficient map matching for challenging environments | |
Xu et al. | Self-adapting multi-fingerprints joint indoor positioning algorithm in WLAN based on database of AP ID | |
CN104661306A (en) | Passive positioning method and system for mobile terminal | |
CN104507097A (en) | Semi-supervised training method based on WiFi (wireless fidelity) position fingerprints | |
CN112954594A (en) | Wireless sensor network node positioning algorithm based on artificial bee colony | |
Jiang et al. | Predicting human mobility based on location data modeled by Markov chains | |
Lin et al. | Noise filtering, trajectory compression and trajectory segmentation on GPS data | |
CN109977324A (en) | A kind of point of interest method for digging and system | |
Huang et al. | STPR: a personalized next point-of-interest recommendation model with spatio-temporal effects based on purpose ranking | |
Wang et al. | High-accuracy localization for indoor group users based on extended Kalman filter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |