CN112632614A - Preference perception track anonymization method and system - Google Patents
Preference perception track anonymization method and system Download PDFInfo
- Publication number
- CN112632614A CN112632614A CN202011599257.5A CN202011599257A CN112632614A CN 112632614 A CN112632614 A CN 112632614A CN 202011599257 A CN202011599257 A CN 202011599257A CN 112632614 A CN112632614 A CN 112632614A
- Authority
- CN
- China
- Prior art keywords
- user
- track
- sequence
- privacy
- location
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 230000008447 perception Effects 0.000 title abstract description 10
- 230000033001 locomotion Effects 0.000 claims abstract description 24
- 230000006399 behavior Effects 0.000 claims description 27
- 239000000126 substance Substances 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 230000001629 suppression Effects 0.000 claims description 5
- 238000005065 mining Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- GOLXNESZZPUPJE-UHFFFAOYSA-N spiromesifen Chemical compound CC1=CC(C)=CC(C)=C1C(C(O1)=O)=C(OC(=O)CC(C)(C)C)C11CCCC1 GOLXNESZZPUPJE-UHFFFAOYSA-N 0.000 claims description 2
- 238000011002 quantification Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2111—Location-sensitive, e.g. geographical location, GPS
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Remote Sensing (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a preference perception track anonymization method and a preference perception track anonymization system, which are used for solving the problems of customizable and quantifiable privacy protection strength in position data privacy protection. The method comprises the following steps: firstly, acquiring movement track data of a user, and generalizing original points contained in the movement track data of the user by using a position space anonymization method to obtain a position sequence of the user; secondly, acquiring the familiarity of the user with the semantic types and the popularity of each position in the semantic types; then setting a familiarity threshold value and a popularity threshold value to obtain privacy classification of the position, and adopting different position anonymity methods to obtain an anonymity track sequence of the user; and finally, obtaining the track privacy degree of the user by calculating the information entropy of the anonymous track sequence. The invention provides customizable privacy protection for the user, and through analyzing the interest and preference characteristics of the user, the sensitive information of the user is hidden in a personalized manner, the usability of data is improved, and the quantification of the privacy protection intensity is realized.
Description
Technical Field
The invention relates to the technical field of network communication, in particular to a preference perception track anonymization method and a preference perception track anonymization system.
Background
Location-based social networks (LBSSNs) (such as Foursquare, faceBookplace, Twitter, roadside, and the like) are characterized in that online social networks and physical locations are combined by using check-in information of users to realize sharing and spreading of location-based service resources in the virtual world. In recent years, lbs ns have been unprecedentedly developed due to the wide application of smart mobile devices in which a large number of sensors are embedded. However, the privacy disclosure problem is an important issue that needs to be considered at present. The user issues the real position data to the LBNS server, and the untrusted third party may steal the position data of the user and make some illegal activities. For the user, only some incomplete GPS track data need to be issued to protect the privacy of the user. Nevertheless, the attacker still can deduce sensitive personal information (such as family address, work place or living habits and the like) of the victim by adopting a relevant technology of data analysis through the spatiotemporal relation between the geographic positions. Even, the attacker excavates the moving behavior pattern of the victim through the GPS track data, and predicts the position to be visited by the victim at the next moment, so that the personal safety of the user is seriously affected. Therefore, once a user finds that a certain privacy threat exists in the location social network, the user can no longer use the service provided by the location social network, and the credibility of the service is reduced.
The trace privacy protection is a new privacy protection form in the LBSs. Unlike the location privacy protection method, the track protection method aims to protect sensitive location information of a user from being leaked, and the sensitive information can reflect personalized interests or preferences of the user. The traditional track privacy protection mainly comprises: spurious data, spatial anonymity, and suppression techniques. The track privacy protection method based on the false data is characterized in that some wrong position data are added in original GPS track data, so that an attacker cannot acquire real position information of a user from uploaded track data; the track privacy protection method based on the space hiding is to generalize sensitive position data in original GPS track data so as to reduce the probability of attackers obtaining real position information; a track privacy protection method based on a suppression technology is used for prohibiting the release of some sensitive position data in GPS track data, so that the personal privacy of a user is protected.
Therefore, the prior art has at least the following disadvantages:
firstly, under the actual condition, only considering position anonymity can not effectively realize track privacy protection, and an attacker can use relevant technologies such as association attack, data analysis and the like to reason out sensitive information of a user; secondly, the track anonymization method does not consider the preference and background knowledge of the user, so that the loss of effective data is caused, and the user cannot enjoy personalized service experience; thirdly, different track privacy protection methods cannot be adopted in a self-adaptive manner according to different privacy risk degrees, and the service accuracy is reduced.
Disclosure of Invention
Aiming at the defects in the background technology, the invention provides a preference perception track anonymization method and a preference perception track anonymization system, and solves the technical problem that the privacy protection service accuracy is low due to the fact that different privacy protection methods cannot be customized according to privacy protection strength in the existing position data privacy protection.
The technical scheme of the invention is realized as follows:
a preference-aware track anonymization method comprises the following steps:
s1, obtaining semantic information and movement track data of the position accessed by the user;
s2, sequentially carrying out stay region generalization and position region generalization on original points contained in the movement track data of the user by using a position space anonymization method to obtain a position sequence of the user;
s3, acquiring familiarity of the user with semantic types and popularity of each position in the semantic types by analyzing the position sequence of the user;
s4, setting a user familiarity threshold value and a position popularity threshold value, obtaining privacy classification of the position of the user according to the familiarity of the user to semantic types and the relationship between the popularity of each position in the semantic types and the user familiarity threshold value and the position popularity threshold value, and obtaining an anonymous track sequence of the user by adopting different position anonymity methods according to the privacy classification of the position of the user;
and S5, acquiring the track privacy degree of the user by calculating the information entropy of the anonymous track sequence of the user.
The method for sequentially carrying out dwell region generalization and position region generalization on the original points contained in the movement track data of the user by using the position space anonymity method comprises the following steps:
s21, the trusted third party extracts the stop points from the moving track data of the user, the stop points reflect the moving behavior of the user, and one stop point can be expressed as:
wherein the content of the first and second substances,denotes the i-th dwell-point anonymous region, px(lon) represents origin pxLongitudinal coordinate of (a), px(lat) indicates an origin point pxLatitude coordinate of (S)i(lon) denotes the stopping point SiLongitude coordinate of (1), Si(lat) denotes the dwell point SiM represents the starting point of the user moving track in the anonymous area of the stop point, n represents the end point of the user moving track in the anonymous area of the stop point, x represents the serial number of the original point in the moving track, and i represents the serial number of the stop point in the moving track;
s22, reconstructing a generalized dwell point sequence Tra _ S by connecting the extracted dwell points according to the sequence of the original points in the movement track data of the user: tra _ S ═ S1→S2→…→SnWherein S isnAn nth dwell point representing a user;
s23, the trusted third party extracts positions from the generalized dwell point sequence, and the positions reflect the personalized behaviors and preferences of the user, wherein one position can be expressed as:
wherein the content of the first and second substances,denotes the j-th location-anonymous region, Lj(lon) represents the position LjLongitude coordinate of (1), Lj(lat) represents the position LjThe latitude coordinate of (a) is determined,indicating anonymous locationSet of medium dwell points, j ═ 1,2, …, n;
s24, reconstructing a generalized position by connecting the extracted positions according to the sequence of the stop points in the stop point sequenceSequence Tra _ L: tra _ L ═ L1→L2→…→LnWherein L isnRepresenting the nth position of the user.
The method for acquiring the familiarity of the user with the semantic types and the popularity of each position in the semantic types by analyzing the position sequence of the user comprises the following steps:
s31, calculating the geographic similarity between the two positions of the user by using a Gaussian formula:
wherein, Simgeo(Li',Lj') Indicates the position Li'And position Lj'Geographical similarity between them, D (L)i',Lj') Indicates the position Li'And position Lj'The euclidean distance between i '═ 1,2, …, n, j' ═ 1,2, …, n;
s32, let His (u)k)={L1,L2,…,LnDenotes the sequence of positions of users, user ukVisited location Li'Then, user u is calculatedkAccessing location Lj'Probability of (c):
wherein, Pgeo(Lj'|Li',uk) Representing user ukVisited location Li'Followed by location Lj'A represents a weight value, 0. ltoreq. a.ltoreq.1, LkRepresenting user ukThe visited historical location of;
s33, constructing a position transition probability matrix according to the formula in the step S32Wherein the content of the first and second substances, indicating user slave position Li'Transferred to the position Lj'The probability of (d);
s34, according to the user slave position Li'Transferred to the position Lj'Probability of (2)Calculating user familiarity with semantic typesAnd popularity of each location in semantic types
Wherein the content of the first and second substances,indicating the popularity of the n-1 th round of calculation,indicating the familiarity of the n-1 th round of calculation, C indicating the semantic type of the location, and u indicating the user.
The method for obtaining the anonymous track sequence of the user comprises the following steps:
let λ represent the user familiarity threshold, τ represent the location popularity threshold;
when the familiarity of the user to the semantic type is less than lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user is classified into an unfamiliar and popular class, and a trusted third party does not need to carry out privacy protection on the position of the user;
when the familiarity of a user to the semantic type is less than lambda and the popularity of each position in the semantic type is less than tau, the privacy classification of the position of the user belongs to the unfamiliar and non-popular classes, and a trusted third party needs to adopt a fake data method to protect the sensitive position anonymous space of the user;
when the familiarity of a user to a semantic type is greater than or equal to lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user belongs to the familiar and popular classes, and a trusted third party needs to adopt a space hiding method to protect the sensitive position anonymous space of the user;
when the familiarity of the user with the semantic type is greater than or equal to lambda and the popularity of each position in the semantic type is less than tau, the privacy classification of the position of the user belongs to the familiar and unpopular classes, and a trusted third party needs to adopt suppression technology to forbid the anonymous space of the position of the user from being published to a position social network server so as to protect the personal privacy of the user;
according to the four privacy classifications, a trusted third party selects different position anonymization methods in a self-adaptive mode, and finally an anonymization track sequence of the user is generated.
The method for acquiring the track privacy degree of the user by calculating the information entropy of the anonymous track sequence of the user comprises the following steps:
calculating the information entropy H of the anonymous track sequence of the user in the (t, t +1) time interval(t,t+1):
Wherein p isi"is the probability that the user visited location i at time t + 1", p0I ″, which is the probability that the user remains at the location at time t +1, at time t, …, k';
calculating the probability of the user accessing all candidate positions at the moment of t +1 when the probability of the user accessing all candidate positions is the sameMaximum information entropy MaxH of anonymous track sequence in (t, t +1) time interval(t,t+1):
Entropy of information H(t,t+1)And maximum information entropy MaxH(t,t+1)Is taken as the track privacy degree H of the user%:
Thus, the track privacy degree H%The greater the value of (c), the greater the strength of the track privacy protection.
A track anonymization system adopted by a preference perception track anonymization method comprises a position space generating module, a semantic description module, a behavior pattern extracting module, a privacy risk rating module and a track anonymization module; the privacy risk rating module is connected with the track anonymity module; the position space generation module is connected with the semantic description module, the semantic description module is connected with the behavior pattern extraction module, and the behavior pattern extraction module is connected with the track anonymization module; the track anonymization module converts the original track sequence into an anonymity track sequence, and the privacy risk rating module adjusts the original track sequence according to the anonymity track sequence.
The position space generation module is used for clustering original points in the historical data set into positions so as to construct a position space set of the user;
the semantic description module is used for converting the geographical position information of the user into semantic position information;
the behavior pattern extraction module is used for mining the moving behavior habits and the motion patterns of the user;
the privacy risk rating module is used for dividing different privacy risk ratings according to behavior preference and familiarity of the user;
and the track anonymization module is used for adopting a position anonymization method of response in a self-adaptive manner according to different privacy risk grades so as to construct an anonymized track sequence.
Compared with the prior art, the invention has the following beneficial effects: according to the method, a sensitive position attack model is built, a position space anonymization method is utilized, original points are generalized into position areas, semantic description is carried out on the position areas, the familiarity of a user on semantic types and the popularity of positions in the semantic types are calculated, different privacy risk ratings are divided, and different position anonymization methods are adopted in a self-adaptive mode according to different privacy risk degrees corresponding to the positions in a user track; the method provides customizable privacy protection for the user, personally hides the sensitive information of the user by analyzing the interest and preference characteristics of the user, improves the usability of data, and simultaneously divides four privacy risk ratings according to the familiarity of the user and the popularity of the position, realizes the quantifiability of the privacy protection strength, and provides a beneficial solution idea for the anonymization of personalized tracks in the future position data release.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the present invention of a sensitive location attack;
FIG. 3 is a schematic diagram of the trace generalization process of the present invention;
fig. 4 is a schematic diagram of the system architecture of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Embodiment 1, as shown in fig. 1, a preference-aware trajectory anonymization method includes the following specific steps:
s1, obtaining semantic information and movement track data of the position accessed by the user;
assuming that the attacker can obtain some a priori knowledge of the victim, including: semantic information of the location visited by the victim and a sequence of movement trajectories arranged in time. From the semantic description of each location, the sequence of semantic trajectories of the victim can be represented as: c1→C2→...→Cn. Therefore, according to the semantic track sequence, an attacker can analyze the individual interest or preference of the victim by using a frequent subsequence mining algorithm. For example: if the attacker finds that the frequent subsequence in the semantic track sequence of victim a is "school → stadium → restaurant", when victim a has visited the "school → stadium" position sequence, the attacker can reason out with a high probability that the position that victim a will visit at the next moment is "restaurant".
Fig. 2 is a schematic diagram of a sensitive location attack. As can be seen from fig. 2, user a, user B, and user C each have a respective movement pattern. Wherein, the moving mode of the user A is the same as that of the user B, namely: school → gymnasium → restaurant. Suppose that for user a, "restaurant" is his sensitive location information and does not want others to know it. However, if the attacker knows the movement pattern of the user B and knows that the user a and the user B have similarities, when the user a visits "school → gymnasium", the attacker will reason out with a high probability that the user a will visit "restaurant", thereby revealing the personal privacy of the user a.
S2, sequentially carrying out stay region generalization and position region generalization on original points contained in the movement track data of the user by using a position space anonymization method to obtain a position sequence of the user; and (3) generalizing the original points into a position area by using a position space anonymity method, and performing semantic description on the position area.
Definition 1: each location anonymous spaceFrom a doublet<k0,l0>Is formed of (i) k0Indicating the number of positions, l, contained in the anonymous space0Indicating the strength of privacy protection.
In the location anonymous space, the trusted third party realizes privacy protection with different strengths by adjusting the size of the anonymous space. And, trusted third parties can measure their privacy-preserving data utility using anonymous space, namely: the amount of information lost during the track anonymization phase.
Hiding the original point of the user through two generalization processes, comprising: a dwell region generalization and a location region generalization. As shown in fig. 3, which is a schematic diagram of track generalization processing, as can be seen from fig. 3, the trusted third party reconstructs the original point to the stop point, then reconstructs the stop point to the position, and finally connects the positions in time sequence to generate a generalized track sequence.
S21, regarding the generalization of the stay area, the trusted third party extracts stay points from the moving track data of the user, and the movement behavior of the user is reflected through the stay points, wherein one stay point can be expressed as:
wherein the content of the first and second substances,representing the dwell point anonymous region, i.e.: set of origin points, px(lon) represents origin pxLongitudinal coordinate of (a), px(lat) indicates an origin point pxLatitude coordinate of (S)i(lon) denotes the stopping point SiLongitude coordinate of (1), Si(lat) denotes the dwell point SiM denotes the start of the user's movement trajectory in the region where the dwell point is anonymousAnd n represents the end point of the user moving track in the anonymous area of the stop point, x represents the serial number of the original point in the moving track, and i represents the serial number of the stop point in the moving track.
S22, reconstructing a generalized dwell point sequence Tra _ S by connecting the extracted dwell points according to the sequence of the original points in the movement track data of the user: tra _ S ═ S1→S2→…→SnWherein S isnAn nth dwell point representing a user;
s23, for the generalization of the position area, the trusted third party extracts the position from the generalized stop point sequence, and reflects the personalized behavior and preference of the user through the position, wherein one position can be expressed as:
wherein the content of the first and second substances,denotes the j-th location-anonymous region, Lj(lon) represents the position LjLongitude coordinate of (1), Lj(lat) represents the position LjThe latitude coordinate of (a) is determined,indicating anonymous locationJ is 1,2, …, n.
S24, reconstructing a generalized position sequence Tra _ L by connecting the extracted positions according to the sequence of the stop points in the stop point sequence: tra _ L ═ L1→L2→…→LnWherein L isnRepresenting the nth position of the user.
According to the two generalization processes, even if an attacker acquires the position sequence of the user, the attacker cannot deduce the original point information of the user. However, through the position sequence, the attacker can use the background knowledge information to mine the frequent movement patterns of the user, so as to deduce the daily activity law or behavior preference of the victim. Therefore, in the track privacy protection, the leakage problem of the frequent movement pattern of the user needs to be considered.
S3, acquiring familiarity of the user with semantic types and popularity of each position in the semantic types by analyzing the position sequence of the user; the specific method comprises the following steps:
s31, the distance between the two locations in the geographic space can reflect the browsing behavior of the user. In general, two Li'And position Lj'The greater the distance between, the user has visited Li'Then followed by accessing Lj'The smaller the probability of (c). Thus, the correlation between the two positions decreases with increasing distance. Calculating the geographical similarity between two locations of the user using the gaussian formula:
wherein, Simgeo(Li',Lj') Indicates the position Li'And position Lj'Geographical similarity between them, D (L)i',Lj') Indicates the position Li'And position Lj'The euclidean distance between i '1, 2, …, and n, j' 1,2, …, n.
The movement patterns and preferences of the user can be mined by analyzing the historical location sequence of the user. Therefore, on the premise of acquiring the user historical position sequence, the user ukAfter the current position is visited, the position to be visited at the next time can be predicted.
S32, let His (u)k)={L1,L2,…,LnDenotes the sequence of positions of users, user ukVisited location Li'Then, user u is calculatedkAccessing location Lj'Probability of (c):
wherein, Pgeo(Lj'|Li',uk) Representing user ukVisited location Li'Followed by location Lj'A represents a weight value, 0. ltoreq. a.ltoreq.1, LkRepresenting user ukThe visited historical location.
In the semantic space, semantic description needs to be carried out on the constructed location anonymous space so as to mine interest or preference information of the user. And marking semantic information of each position anonymous space. First, each semantic type i is in a region where the dwell point is anonymousThe weight value within may be calculated as:
wherein N represents the total number of interest points in the anonymous region of the stop point, and NiIndicating the number of interest points of type k in the anonymous area,indicating the number of anonymous regions of the stop point,indicating the number of dwell anonymized regions where there is a point of interest of type k.
Thus, the feature vector for each dwell point anonymous region may be represented as fs=<w1,w2,...,wn>. In a position anonymous area consisting of all the dwell points, utilizing the number of nonzero weights of the position anonymous area to each semantic type, wherein each semantic type i is in the position anonymous areaThe weight value within may be calculated as:
after normalization processing is performed on the weighted values, the following results can be obtained:
thus, the feature vector for each location-anonymous region may be represented as fL=<W1,W2,…,Wk>。
The semantic description method provided by the invention selects the feature vector f of the position anonymous areaLW with the largest medium weight valueiAs its semantic information, and thus corresponds to a position sequence in the position anonymity space, its semantic sequence can be expressed as: tra _ C ═ C1→C2→…→Cn。
S33, constructing a position transition probability matrix according to the formula in the step S32Wherein the content of the first and second substances, indicating user slave position Li'Transferred to the position Lj'The probability of (d); meanwhile, according to the semantic information calculated by the formula (7), each position anonymous area has certain semantic description.
S34, the number of the central nodes represents the familiarity of the user, so that the familiarity of the user to semantic types can be calculated through the sum of the values of the authority nodes, and the user slave position L is determined according to the familiarity of the useri'Transferred to the position Lj'Probability of (2)Calculating user familiarity with semantic types
The number of authoritative nodes represents the popularity of the location, so that the popularity of the location in the semantic type can be calculated through the sum of the values of the central nodes, and the popularity of the location in the semantic type can be calculated according to the user slave location Li'Transferred to the position Lj'Probability of (2)Calculating the popularity of each location in semantic types
Wherein the content of the first and second substances,indicating the popularity of the n-1 th round of calculation,indicating the familiarity of the n-1 th round of calculation, C indicating the semantic type of the location, and u indicating the user.
where n represents the number of iterations.
Through the process, the preference model of the user to the location anonymous space is constructed.
S4, setting a user familiarity threshold value and a position popularity threshold value, obtaining privacy classification of the position of the user according to the familiarity of the user to semantic types and the relationship between the popularity of each position in the semantic types and the user familiarity threshold value and the position popularity threshold value, and obtaining an anonymous track sequence of the user by adopting different position anonymity methods according to the privacy classification of the position of the user;
the method for obtaining the anonymous track sequence of the user comprises the following steps:
let λ represent the user familiarity threshold, τ represent the location popularity threshold;
(1) familiarity and Prevalence (NFP)
When the familiarity of the user to the semantic type is less than lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user is classified into an unfamiliar and popular class, the class refers to the fact that the user is not an expert of the semantic type to which the position anonymous space belongs, and if an attacker acquires the position anonymous space, preference information of the user cannot be deduced. Moreover, because the popularity of the location is high, the location anonymous area is referred to by a plurality of users, and therefore, the trusted third party does not need to protect the privacy of the location of the user.
(2) Familiarity and non-prevalence (NFNP)
When the familiarity of the user to the semantic type is less than lambda and the popularity of each position in the semantic type is less than tau, the privacy classification of the position of the user belongs to a non-familiar and non-popular class, the class refers to that the user is not an expert of the semantic type to which the position anonymous space belongs, but the position popularity is low, and an attacker can deduce the user identity information of accessing the position anonymous space through background knowledge information, so that a trusted third party needs to adopt a false data method to protect the sensitive position anonymous space of the user.
(3) Familiar and Popular (FP)
When the familiarity of a user with a semantic type is greater than or equal to lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user belongs to a familiar and popular class, the class refers to that the user is an expert of the semantic type to which the position anonymous space belongs, and an attacker can deduce the preference information of the user according to the position anonymous space accessed by the user, so that a trusted third party needs to adopt a space hiding method to protect the sensitive position anonymous space of the user;
(4) familiar and non-popular (FNP)
When the familiarity of a user with a semantic type is greater than or equal to lambda and the popularity of each location in the semantic type is less than tau, the privacy classification of the location of the user belongs to a familiar and non-popular class, in the class, since the location anonymous space has high user familiarity to the user and low location popularity in the semantic type thereof, an attacker can not only deduce the preference information of the user from the location anonymous space accessed by the user, but also can identify the identity information of the user, and therefore, a trusted third party needs to adopt a suppression technology to prohibit the location anonymous space of the user from being published to a location social network server so as to protect the personal privacy of the user.
According to the four privacy classifications, a trusted third party selects different position anonymization methods in a self-adaptive mode, and finally an anonymization track sequence of the user is generated.
And S5, acquiring the track privacy degree of the user by calculating the information entropy of the anonymous track sequence of the user.
Definition of reference information entropy, for a set of probability distributions p1,p2,...,pnThe information entropy can be calculated as:
H=-∑pilog2pi (13)
assuming that the position visited by the user at the time t +1 is a sensitive position, k-1 candidate positions are selected for the sensitive position at the time t +1 by the proposed privacy protection algorithm. We define the probability that a user visits one of the k locations at time t +1 as p1,p2,...,pkAnd the probability that the user remains at the position at time t +1 is p0. Thus, the information entropy H of the anonymous track sequence of the user in the (t, t +1) time interval is calculated(t,t+1):
Wherein p isi"is the probability that the user visited location i at time t + 1", p0I ″, which is the probability that the user remains at the position at time t +1, at time t, …, k'.
According to the characteristics of entropy, when the probability that the user visits all candidate positions at the moment of t +1 is the same, calculating the maximum information entropy MaxH of the anonymous track sequence of the user in the (t, t +1) time interval(t,t+1):
Entropy of information H(t,t+1)And maximum information entropy MaxH(t,t+1)Is taken as the track privacy degree H of the user%:
Thus, the track privacy degree H%The greater the value of (c), the greater the strength of the track privacy protection.
The preference perception track anonymity method can self-adaptively customize the personalized privacy protection method for the user according to the preference of the user to the position, so that the personal privacy of the user is prevented from being leaked, and the usability of track data is improved.
The position space generation module is used for clustering original points in the historical data set into positions so as to construct a position space set of the user.
And the semantic description module is used for converting the geographical position information of the user into semantic position information.
And the behavior pattern extraction module is used for mining the mobile behavior habits and the motion patterns of the user.
And the privacy risk rating module is used for dividing different privacy risk ratings according to the behavior preference and familiarity of the user.
And the track anonymization module is used for adopting a position anonymization method of response in a self-adaptive manner according to different privacy risk grades so as to construct an anonymized track sequence.
In the preference-aware track anonymization system provided in embodiment 2, when performing track privacy protection, only the division of the above function modules is used for illustration, and in practical applications, the function distribution may be completed by different function modules according to needs, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above. In addition, the data transmission device provided in embodiment 2 and the data transmission method belong to the same concept, and the specific implementation process is described in embodiment 1.
In summary, in the embodiment of the present invention, a sensitive location attack model is constructed, a location space anonymization method is used to generalize an origin point into a location area, semantic description is performed on the location area, the familiarity of a user with a semantic type and the popularity of a location in the semantic type are calculated, different privacy risk ratings are divided, and different location anonymization methods are adaptively adopted according to different privacy risk degrees corresponding to locations in a user trajectory. The scheme provided by the embodiment of the invention provides customizable privacy protection for the user, and through analyzing the interest and preference characteristics of the user, the sensitive information of the user is hidden in a personalized manner, so that the usability of data is improved, and meanwhile, four privacy risk grades are divided according to the familiarity and the position popularity of the user, so that the quantification of the privacy protection strength is realized, and a beneficial solution thought is provided for the anonymization of a personalized track in the future position data release.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (7)
1. A preference-aware track anonymization method is characterized by comprising the following steps:
s1, obtaining semantic information and movement track data of the position accessed by the user;
s2, sequentially carrying out stay region generalization and position region generalization on original points contained in the movement track data of the user by using a position space anonymization method to obtain a position sequence of the user;
s3, acquiring familiarity of the user with semantic types and popularity of each position in the semantic types by analyzing the position sequence of the user;
s4, setting a user familiarity threshold value and a position popularity threshold value, obtaining privacy classification of the position of the user according to the familiarity of the user to semantic types and the relationship between the popularity of each position in the semantic types and the user familiarity threshold value and the position popularity threshold value, and obtaining an anonymous track sequence of the user by adopting different position anonymity methods according to the privacy classification of the position of the user;
and S5, acquiring the track privacy degree of the user by calculating the information entropy of the anonymous track sequence of the user.
2. The preference-aware track anonymization method according to claim 1, wherein the method for successively performing dwell region generalization and location region generalization on the original point included in the movement track data of the user by using the location space anonymization method comprises:
s21, the trusted third party extracts the stop points from the moving track data of the user, the stop points reflect the moving behavior of the user, and one stop point can be expressed as:
wherein the content of the first and second substances,denotes the i-th dwell-point anonymous region, px(lon) represents origin pxLongitudinal coordinate of (a), px(lat) indicates an origin point pxLatitude coordinate of (S)i(lon) denotes the stopping point SiLongitude coordinate of (1), Si(lat) denotes the dwell point SiM represents the starting point of the user moving track in the anonymous area of the stop point, n represents the end point of the user moving track in the anonymous area of the stop point, x represents the serial number of the original point in the moving track, and i represents the serial number of the stop point in the moving track;
s22, reconstructing a generalized dwell point sequence Tra _ S by connecting the extracted dwell points according to the sequence of the original points in the movement track data of the user: tra _ S ═ S1→S2→…→SnWherein S isnAn nth dwell point representing a user;
s23, the trusted third party extracts positions from the generalized dwell point sequence, and the positions reflect the personalized behaviors and preferences of the user, wherein one position can be expressed as:
wherein the content of the first and second substances,denotes the j-th location-anonymous region, Lj(lon) represents the position LjLongitude coordinate of (1), Lj(lat) represents the position LjThe latitude coordinate of (a) is determined,indicating anonymous locationSet of medium dwell points, j ═ 1,2, …, n;
s24, connecting the extracted positions according to the sequence of the stop points in the stop point sequence, and repeatingConstructing a generalized position sequence Tra _ L: tra _ L ═ L1→L2→…→LnWherein L isnRepresenting the nth position of the user.
3. The preference-aware track anonymization method of claim 2, wherein the method for obtaining the familiarity of the user with the semantic type and the popularity of each location in the semantic type by analyzing the location sequence of the user is as follows:
s31, calculating the geographic similarity between the two positions of the user by using a Gaussian formula:
wherein, Simgeo(Li',Lj') Indicates the position Li'And position Lj'Geographical similarity between them, D (L)i',Lj') Indicates the position Li'And position Lj'The euclidean distance between i '═ 1,2, …, n, j' ═ 1,2, …, n;
s32, let His (u)k)={L1,L2,…,LnDenotes the sequence of positions of users, user ukVisited location Li'Then, user u is calculatedkAccessing location Lj'Probability of (c):
wherein, Pgeo(Lj'|Li',uk) Representing user ukVisited location Li'Followed by location Lj'A represents a weight value, 0. ltoreq. a.ltoreq.1, LkRepresenting user ukThe visited historical location of;
s33, constructing a position transition probability matrix according to the formula in the step S32Wherein the content of the first and second substances, indicating user slave position Li'Transferred to the position Lj'The probability of (d);
s34, according to the user slave position Li'Transferred to the position Lj'Probability of (2)Calculating user familiarity with semantic typesAnd popularity of each location in semantic types
4. The preference-aware track anonymization method of claim 1, wherein the anonymous track sequence of the user is obtained by:
let λ represent the user familiarity threshold, τ represent the location popularity threshold;
when the familiarity of the user to the semantic type is less than lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user is classified into an unfamiliar and popular class, and a trusted third party does not need to carry out privacy protection on the position of the user;
when the familiarity of a user to the semantic type is less than lambda and the popularity of each position in the semantic type is less than tau, the privacy classification of the position of the user belongs to the unfamiliar and non-popular classes, and a trusted third party needs to adopt a fake data method to protect the sensitive position anonymous space of the user;
when the familiarity of a user to a semantic type is greater than or equal to lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user belongs to the familiar and popular classes, and a trusted third party needs to adopt a space hiding method to protect the sensitive position anonymous space of the user;
when the familiarity of the user with the semantic type is greater than or equal to lambda and the popularity of each position in the semantic type is less than tau, the privacy classification of the position of the user belongs to the familiar and unpopular classes, and a trusted third party needs to adopt suppression technology to forbid the anonymous space of the position of the user from being published to a position social network server so as to protect the personal privacy of the user;
according to the four privacy classifications, a trusted third party selects different position anonymization methods in a self-adaptive mode, and finally an anonymization track sequence of the user is generated.
5. The preference-aware track anonymization method according to claim 1, wherein the method for obtaining the track privacy degree of the user by calculating the information entropy of the anonymous track sequence of the user comprises:
calculating the anonymous track sequence of the user at (t, t +1)Entropy H of information within intervals(t,t+1):
Wherein p isi"is the probability that the user visited location i at time t + 1", p0I ″, which is the probability that the user remains at the location at time t +1, at time t, …, k';
when the probability that the user visits all candidate positions at the moment of t +1 is the same, calculating the maximum information entropy MaxH of the anonymous track sequence of the user in the (t, t +1) time interval(t,t+1):
Entropy of information H(t,t+1)And maximum information entropy MaxH(t,t+1)Is taken as the track privacy degree H of the user%:
Thus, the track privacy degree H%The greater the value of (c), the greater the strength of the track privacy protection.
6. The track anonymization system used in the preference-aware track anonymization method according to any one of claims 1 to 5, comprising a location space generating module, a semantic description module, a behavior pattern extracting module, a privacy risk rating module and a track anonymization module; the privacy risk rating module is connected with the track anonymity module; the position space generation module is connected with the semantic description module, the semantic description module is connected with the behavior pattern extraction module, and the behavior pattern extraction module is connected with the track anonymization module; the track anonymization module converts the original track sequence into an anonymity track sequence, and the privacy risk rating module adjusts the original track sequence according to the anonymity track sequence.
7. The preference-aware track anonymity system of claim 6, wherein the location space generating module is configured to cluster raw points in the historical dataset into locations, thereby constructing a set of location spaces for the user;
the semantic description module is used for converting the geographical position information of the user into semantic position information;
the behavior pattern extraction module is used for mining the moving behavior habits and the motion patterns of the user;
the privacy risk rating module is used for dividing different privacy risk ratings according to behavior preference and familiarity of the user;
and the track anonymization module is used for adopting a position anonymization method of response in a self-adaptive manner according to different privacy risk grades so as to construct an anonymized track sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011599257.5A CN112632614A (en) | 2020-12-30 | 2020-12-30 | Preference perception track anonymization method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011599257.5A CN112632614A (en) | 2020-12-30 | 2020-12-30 | Preference perception track anonymization method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112632614A true CN112632614A (en) | 2021-04-09 |
Family
ID=75287614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011599257.5A Pending CN112632614A (en) | 2020-12-30 | 2020-12-30 | Preference perception track anonymization method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112632614A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112035880A (en) * | 2020-09-10 | 2020-12-04 | 辽宁工业大学 | Track privacy protection service recommendation method based on preference perception |
CN113946867A (en) * | 2021-10-21 | 2022-01-18 | 福建工程学院 | Position privacy protection method based on space influence |
CN114760146A (en) * | 2022-05-05 | 2022-07-15 | 郑州轻工业大学 | Customizable location privacy protection method and system based on user portrait |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016145595A1 (en) * | 2015-03-16 | 2016-09-22 | Nokia Technologies Oy | Method and apparatus for discovering social ties based on cloaked trajectories |
CN112035880A (en) * | 2020-09-10 | 2020-12-04 | 辽宁工业大学 | Track privacy protection service recommendation method based on preference perception |
CN112118531A (en) * | 2020-09-12 | 2020-12-22 | 上海大学 | Privacy protection method of crowd sensing application based on position |
-
2020
- 2020-12-30 CN CN202011599257.5A patent/CN112632614A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016145595A1 (en) * | 2015-03-16 | 2016-09-22 | Nokia Technologies Oy | Method and apparatus for discovering social ties based on cloaked trajectories |
CN112035880A (en) * | 2020-09-10 | 2020-12-04 | 辽宁工业大学 | Track privacy protection service recommendation method based on preference perception |
CN112118531A (en) * | 2020-09-12 | 2020-12-22 | 上海大学 | Privacy protection method of crowd sensing application based on position |
Non-Patent Citations (1)
Title |
---|
LIANG ZHU ET AL.: "PTPP: Preference-Aware Trajectory Privacy-Preserving over Location-Based Social Networks", 《HTTPS://WWW.ACADEMIA.EDU》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112035880A (en) * | 2020-09-10 | 2020-12-04 | 辽宁工业大学 | Track privacy protection service recommendation method based on preference perception |
CN112035880B (en) * | 2020-09-10 | 2024-02-09 | 辽宁工业大学 | Track privacy protection service recommendation method based on preference perception |
CN113946867A (en) * | 2021-10-21 | 2022-01-18 | 福建工程学院 | Position privacy protection method based on space influence |
CN113946867B (en) * | 2021-10-21 | 2024-05-31 | 福建工程学院 | Position privacy protection method based on space influence |
CN114760146A (en) * | 2022-05-05 | 2022-07-15 | 郑州轻工业大学 | Customizable location privacy protection method and system based on user portrait |
CN114760146B (en) * | 2022-05-05 | 2024-03-29 | 郑州轻工业大学 | Customizable position privacy protection method and system based on user portrait |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jiang et al. | A utility-aware general framework with quantifiable privacy preservation for destination prediction in LBSs | |
Primault et al. | The long road to computational location privacy: A survey | |
Shaham et al. | Privacy preservation in location-based services: A novel metric and attack model | |
Albouq et al. | A double obfuscation approach for protecting the privacy of IoT location based applications | |
Sun et al. | Location privacy preservation for mobile users in location-based services | |
CN112632614A (en) | Preference perception track anonymization method and system | |
Primault et al. | Time distortion anonymization for the publication of mobility data with high utility | |
Jin et al. | A survey and experimental study on privacy-preserving trajectory data publishing | |
Beg et al. | A privacy-preserving protocol for continuous and dynamic data collection in IoT enabled mobile app recommendation system (MARS) | |
Yin et al. | GANs Based Density Distribution Privacy‐Preservation on Mobility Data | |
Wu et al. | A novel dummy-based mechanism to protect privacy on trajectories | |
CN113254999A (en) | User community mining method and system based on differential privacy | |
CN110602631A (en) | Processing method and processing device for location data for resisting conjecture attack in LBS | |
CN105578412A (en) | Position anonymization method based on position service and system | |
Khazbak et al. | Deanonymizing mobility traces with co-location information | |
CN110502919B (en) | Track data de-anonymization method based on deep learning | |
Zhang et al. | RPAR: location privacy preserving via repartitioning anonymous region in mobile social network | |
Li et al. | Quantifying location privacy risks under heterogeneous correlations | |
Wang et al. | Privacy preserving for continuous query in location based services | |
Zhao et al. | A Privacy‐Preserving Trajectory Publication Method Based on Secure Start‐Points and End‐Points | |
Qiu et al. | Behavioral-semantic privacy protection for continual social mobility in mobile-internet services | |
Xing et al. | An optimized algorithm for protecting privacy based on coordinates mean value for cognitive radio networks | |
Gupta et al. | Mobility-Aware prefetching and replacement scheme for location-based services: MOPAR | |
Zhao et al. | EPLA: efficient personal location anonymity | |
Domingues et al. | Social Mix-zones: Anonymizing Personal Information on Contact Tracing Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210409 |